Apache Spark – IT GIZZ

spark

Introduction to Big Data and Spark

Introduction to Big Data
Challenges with Big Data
Batch Vs. Real Time Big Data Analytics
Batch Analytics – Hadoop Ecosystem Overview
Real Time Analytics Options, Streaming Data – Storm
In Memory Data – Spark
What is Spark?
Modes of Spark
Spark Installation Demo
Overview of Spark on a cluster
Spark Standalone Cluster

Spark Baby Steps

Invoking Spark Shell
Loading a File in Shell
Performing Some Basic Operations on Files in Spark Shell
Building a Spark Project with sbt, Building and Running Spark Project with sbt
Caching Overview, Distributed Persistence
Spark Streaming Overview
Example: Streaming Word Count

Playing with RDDs

RDDs
Transformations in RDD
Actions in RDD
Loading Data in RDD
Saving Data through RDD
Scala and Hadoop Integration Hands on

Shark – When Spark meets Hive ( Spark SQL)

Why Shark?
Installing Shark
Running Shark
Loading of Data
Hive Queries through Spark
Testing Tips in Scala
Performance Tuning Tips in Spark
Shared Variables: Broadcast Variables
Shared Variables: Accumulators

Spark Streaming

Spark Streaming Architecture
First Spark Streaming Program
Transformations in Spark Streaming
Fault tolerance in Spark Streaming
Check pointing
Parallelism level

Spark Mlib

Classification Algorithm
Clustering Algorithm
Sequence Mining Algorithm
Collbrative filtering

Spark GraphX

Graph analysis with Spark
GraphX for graphs
Graph-parallel computation
Installation of Spark and Scala
Discussion of real time use cases using Spark
Mini project implementation in Spark

Copyright © 2024 ITGIZZ. Powerd by VSHAWS.

→
Enquiry

Enquiry Form

Name Phone Email Message
WhatsApp
Instagram