BI & Datawarehousing Training

Apache Spark & Scala

(0 review)
Free
spark_scala

Apache Spark & Scala Training

Overview

Spark is a unique framework for big data analytics which gives one unique integrated API by developers for the purpose of data scientists and analysts to perform separate tasks. It supports a wide range of popular languages like Python, R, SQL, Java and Scala. Apache Spark main aim is to provide hands-on experience to create real-time Data Stream Analysis and large-scale learning solutions for data scientists, data analysts and software developers.

 

What you will Learn In This Course

  • Hands-on knowledge exploring, running and deploying Apache Spark
  • Access to numerous and wide variety of Spark with Scala, Spark SQL, Spark Streaming and Spark MLLib source code examples
  • Create hands-on Spark environments for experimenting with course examples
  • Participate in course discussion boards with instructor and other students
  • Know when and how Spark with Scala, Spark SQL, Spark Streaming and Spark MLLibr may be an appropriate solution

 

Who can learn Apache Spark and Scala

There is a huge demand for Apache Spark and Scala professionals in IT industry. Big data professionals,Analytics professionals,Research professionals,IT developers and testers,Data scientists,BI and reporting professionals can do this course.

 

COURSE SYLLABUS

 

Introduction to Spark

  • Overview of BigData and Spark
  • MapReduce limitations
  • Spark History
  • Spark Architecture
  • Limitations of MapReduce in Hadoop Objectives
  • Batch vs. Real-time analytics
  • Application of stream processing
  • Spark and Hadoop Advantages
  • Benefits of Spark and Hadoop
  • Introduction to Spark Eco-system
  • Spark Installation

 

Introduction to Programming in Scala

  • Features of Scala
  • Basic data types and literals used
  • List the operators and methods used in Scala
  • Concepts of Scala
  • Scala foundation
  • Features of Scala
  • Setup Spark and Scala on Unbuntu and Windows OS
  • Install IDE’s for Scala
  • Run Scala Codes on Scala Shell
  • Understanding Data types in Scala
  • Implementing Lazy Values
  • Control Structures
  • Looping Structures
  • Functions
  • Procedures
  • Collections
  • Arrays and Array Buffers
  • Map’s, Tuples and Lists

 

OOPS and Functional Programming in Scala

  • Implementing Classes
  • Getters and Setters
  • Properties with only Getters
  • Object & Object Private Fields
  • Implementing Nested Classes
  • Abstract Classes
  • Constructor
  • Auxiliary and Primary Constructor
  • Singletons
  • Companion Objects
  • Extending a Class
  • Understanding Packages
  • Override Methods
  • Type Checking
  • Casting
  • Overriding Methods
  • Traits as Interfaces
  • Layered Traits
  • Functional Programming
  • Higher Order Functions
  • Anonymous Functions
  • Closures and Currying
  • Performing File Processing

 

Foundation to Spark

  • park Shell and PySpark
  • Creating the Spark Context
  • Invoking Spark Shell
  • Loading a file in Shell
  • Basic operations on Shell
  • Spark Java projects
  • Spark Context and Spark Properties
  • Overview of SBT
  • Building a Spark project with SBT
  • Running Spark project with SBT
  • Local mode
  • Spark mode
  • Caching overview
  • Persistance in Spark
  • HDFS data from Spark
  • Implementing Server Log Analysis using Spark

 

Working with RDD

  • Understanding RDD
  • How to create RDDs
  • RDD operations and methods
  • Transformations in RDD
  • Actions in RDD
  • Loading data into RDD
  • Saving data through RDD
  • Key-Value Pair RDD
  • MapReduce and Pair RDD Operations
  • Spark and Hadoop Integration-HDFS
  • Key-Value Pair RDD
  • Scala RDD, Paired RDD, Double RDD & General RDD Functions
  • Implementing HadoopRDD, Filtered RDD, Joined RDD
  • Transformations, Actions and Shared Variables
  • Spark Operations on YARN
  • Sequence File Processing
  • Partitioner and its role in Performance improvement

 

Spark Streaming & Spark SQL

  • Introduction to Spark Streaming
  • Introduction to Spark SQL
  • Querying Files as Tables
  • Text file Format
  • JSON file Format
  • Parquet file Format
  • Hive and Spark SQL Architecture
  • Integrating Spark & Apache Hive
  • Spark SQL performance optimization
  • Implementing Data visualization in Spark

 

Spark ML Programming

  • Explain the use cases and techniques of Machine Learning (ML)
  • Describe the key concepts of Spark ML
  • Explain the concept of an ML Dataset, and ML algorithm, model selection via cross validation

 

Spark GraphX Programming

  • Explain the key concepts of Spark GraphX programming
  • Limitations of the Graph Parallel system
  • Describe the operations with a graph
  • Graph system optimizations

 

Course Features

  • Students 0 student
  • Max Students1000
  • Duration10 week
  • Skill levelall
  • LanguageEnglish
  • Re-take courseN/A
Curriculum is empty

Instructor

0.00 average based on 0 ratings

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

Related Courses