BI & Datawarehousing Training

Apache Spark & Scala

(0 review)

Apache Spark & Scala Training


Spark is a unique framework for big data analytics which gives one unique integrated API by developers for the purpose of data scientists and analysts to perform separate tasks. It supports a wide range of popular languages like Python, R, SQL, Java and Scala. Apache Spark main aim is to provide hands-on experience to create real-time Data Stream Analysis and large-scale learning solutions for data scientists, data analysts and software developers.


What you will Learn In This Course

  • Hands-on knowledge exploring, running and deploying Apache Spark
  • Access to numerous and wide variety of Spark with Scala, Spark SQL, Spark Streaming and Spark MLLib source code examples
  • Create hands-on Spark environments for experimenting with course examples
  • Participate in course discussion boards with instructor and other students
  • Know when and how Spark with Scala, Spark SQL, Spark Streaming and Spark MLLibr may be an appropriate solution


Who can learn Apache Spark and Scala

There is a huge demand for Apache Spark and Scala professionals in IT industry. Big data professionals,Analytics professionals,Research professionals,IT developers and testers,Data scientists,BI and reporting professionals can do this course.




Introduction to Spark

  • Overview of BigData and Spark
  • MapReduce limitations
  • Spark History
  • Spark Architecture
  • Limitations of MapReduce in Hadoop Objectives
  • Batch vs. Real-time analytics
  • Application of stream processing
  • Spark and Hadoop Advantages
  • Benefits of Spark and Hadoop
  • Introduction to Spark Eco-system
  • Spark Installation


Introduction to Programming in Scala

  • Features of Scala
  • Basic data types and literals used
  • List the operators and methods used in Scala
  • Concepts of Scala
  • Scala foundation
  • Features of Scala
  • Setup Spark and Scala on Unbuntu and Windows OS
  • Install IDE’s for Scala
  • Run Scala Codes on Scala Shell
  • Understanding Data types in Scala
  • Implementing Lazy Values
  • Control Structures
  • Looping Structures
  • Functions
  • Procedures
  • Collections
  • Arrays and Array Buffers
  • Map’s, Tuples and Lists


OOPS and Functional Programming in Scala

  • Implementing Classes
  • Getters and Setters
  • Properties with only Getters
  • Object & Object Private Fields
  • Implementing Nested Classes
  • Abstract Classes
  • Constructor
  • Auxiliary and Primary Constructor
  • Singletons
  • Companion Objects
  • Extending a Class
  • Understanding Packages
  • Override Methods
  • Type Checking
  • Casting
  • Overriding Methods
  • Traits as Interfaces
  • Layered Traits
  • Functional Programming
  • Higher Order Functions
  • Anonymous Functions
  • Closures and Currying
  • Performing File Processing


Foundation to Spark

  • park Shell and PySpark
  • Creating the Spark Context
  • Invoking Spark Shell
  • Loading a file in Shell
  • Basic operations on Shell
  • Spark Java projects
  • Spark Context and Spark Properties
  • Overview of SBT
  • Building a Spark project with SBT
  • Running Spark project with SBT
  • Local mode
  • Spark mode
  • Caching overview
  • Persistance in Spark
  • HDFS data from Spark
  • Implementing Server Log Analysis using Spark


Working with RDD

  • Understanding RDD
  • How to create RDDs
  • RDD operations and methods
  • Transformations in RDD
  • Actions in RDD
  • Loading data into RDD
  • Saving data through RDD
  • Key-Value Pair RDD
  • MapReduce and Pair RDD Operations
  • Spark and Hadoop Integration-HDFS
  • Key-Value Pair RDD
  • Scala RDD, Paired RDD, Double RDD & General RDD Functions
  • Implementing HadoopRDD, Filtered RDD, Joined RDD
  • Transformations, Actions and Shared Variables
  • Spark Operations on YARN
  • Sequence File Processing
  • Partitioner and its role in Performance improvement


Spark Streaming & Spark SQL

  • Introduction to Spark Streaming
  • Introduction to Spark SQL
  • Querying Files as Tables
  • Text file Format
  • JSON file Format
  • Parquet file Format
  • Hive and Spark SQL Architecture
  • Integrating Spark & Apache Hive
  • Spark SQL performance optimization
  • Implementing Data visualization in Spark


Spark ML Programming

  • Explain the use cases and techniques of Machine Learning (ML)
  • Describe the key concepts of Spark ML
  • Explain the concept of an ML Dataset, and ML algorithm, model selection via cross validation


Spark GraphX Programming

  • Explain the key concepts of Spark GraphX programming
  • Limitations of the Graph Parallel system
  • Describe the operations with a graph
  • Graph system optimizations


Course Features

  • Students 0 student
  • Max Students1000
  • Duration10 week
  • Skill levelall
  • LanguageEnglish
  • Re-take courseN/A
Curriculum is empty


0.00 average based on 0 ratings

5 Star
4 Star
3 Star
2 Star
1 Star

Related Courses