BI & Datawarehousing Training

Bigdata Hadoop

(0 review)




  1. A huge set of complicated structured and unstructured data is called as Big Data that cannot be processed using traditional computing techniques. Big data is not merely a data, rather it has become a complete subject, which involves various tools, technqiues and frameworks. Big data involves the data produced by different devices and applications. Given below are some of the fields that come under the umbrella of Big Data.
  2. Social Media Data : Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe.
  3. Stock Exchange Data : The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.
  4. Power Grid Data : The power grid data holds information consumed by a particular node with respect to a base station.
  5. Search Engine Data : Search engines retrieve lots of data from different databases.
  7. Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers. In his report Big Data in Big Companies, IIA Director of Research Tom Davenport interviewed more than 50 businesses to understand how they used big data. He found they got value in the following ways:
  8. Cost reduction: Big data technologies such as Hadoop and cloud-based analytics bring significant cost advantages when it comes to storing large amounts of data – plus they can identify more efficient ways of doing business.
  9. Faster, better decision making: With the speed of Hadoop and in-memory analytics, combined with the ability to analyze new sources of data, businesses are able to analyze information immediately – and make decisions based on what they’ve learned.
  10. New products and services: With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want. Davenport points out that with big data analytics, more companies are creating new products to meet customers’ needs.
  12. Hadoop is an open source framework from Apache and is used to store process and analyze data which are very huge in volume. Hadoop is written in Java and is not OLAP (online analytical processing). It is used for batch/offline processing. It is being used by Facebook, Yahoo, Google, Twitter, LinkedIn and many more. Moreover it can be scaled up just by adding nodes in the cluster.
  14. Accelerated Career Growth and Opportunities
  15. About 90% of business and organizations (Multi-National) have pledged to allocate medium to huge investments in Big Data analytics and technologies – According to Forbes report on the state of Big Data Analytics. A majority of the global organizations have reported a significant impact on revenue growth and business development after incorporating Hadoop technologies in their organizations. Technology with such advantages and growth will assure the professionals to grow.
  16. Big Data Professionals who have decided to migrate to Hadoop from other technologies can also be benefited with accelerated career growth.
  17. Hadoop skills boast salary packages!
  18. According to Indeed, average salary paid to Hadoop Developer is around $102,000. This is among the best salaries paid to professionals across the world. Being said that Hadoop is successful in attracting more global organizations, the possibilities for Hadoop professionals to earn good salary is positive. In another survey by Dice, Big data professionals were earning $89,450 on an average which is much higher than the preceding year.
  19. Flooding Job Opportunities!
  20. It is inevitable to stop the job opportunities and demand for Hadoop skilled professionals. In a report from Forbes, there is nearly 90% increase in the demand for big data professionals in the year 2014 and there is a significant probability for a further leap. The Majority of career experts and analysts proposed that the job market Big Data professional is not a short-living phenomena but a stable market to stay long enough.
  21. Top Companies around the world into Hadoop Technology
  22. World’s top leading companies such as DELL, IBM, AWS (Amazon Web Services), Hortonworks, MAPR Technologies, DATASTAX, Cloudera, SUPERMICR, Datameer, hadapt, Zettaset, pentaho, KARMASPHERE and many others have implemented Hadoop technologies. This number keeps increasing every day for constant reputation for the flexibility and cost-effective factors of Hadoop.





  1. Java
    • Overview of Java
    • Classes and Objects
    • Classes and Objects
    • Inheritance, Aggregation, Polymorphism
    • Command line argument
    • Abstract class and Interfaces
    • String Handling
    • Exception Handling, Multithreading
    • Serialization and Advanced Topics
    • Collection Framework, GUI, JDBC
  1. Linux
    • Unix History & Over View
    • Command line file-system browsing
    • Bash/CORN Shell
    • Users Groups and Permissions
    • VI Editor
    • Introduction to Process
    • Basic Networking
    • Shell Scripting live scenarios
  2. SQL
    • Introduction to SQL, Data Definition Language (DDL)
    • Data Manipulation Language(DML)
    • Operator and Sub Query
    • Various Clauses, SQL Key Words
    • Joins, Stored Procedures, Constraints, Triggers
    • Cursors /Loops / IF Else / Try Catch, Index
    • Data Manipulation Language (Advanced)
    • Constraints, Triggers,
    • Views, Index Advanced


  1. Introduction to BigData

Synopsis – This chapter discusses about general topics related to bigdata.To start with, Big Data

analysis is revolutionizing almost every field. It’s very true that every company has a Big Data of

their own. The scientific use of those data helps the businesses reformulate their marketing

strategies, remodel their products and the like according to the customer interest etc..


  • What is Big Data?
  • Where it is produced?
  • Rise of Big Data
  • Compare Hadoop vs traditional systems
  • Limitations and Solutions of existing Data Analytics Architecture
  • Attributes of Big Data
  • Types of data
  • Other technologies vs Big Data


  1. Big Data and Hadoop Ecosystem

Synopsis – In this module, you will understand Big Data, the limitations of the existing solutions

for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop

ecosystem components, Hadoop Architecture, HDFS, Anatomy of File Write and Read, Rack



  • Limitations and Solutions of existing Data Analytics Architecture
  • Hadoop
  • Hadoop Features
  • Hadoop Ecosystem
  • Hadoop 2.x core components
  • Hadoop Storage: HDFS, Hadoop Processing: MapReduce Framework
  • Anatomy of File Write and Read, Rack Awareness


  1. Hadoop Configuration and Install


Synopsis – In this module you will learn about installing and configuration of Hadoop.


  • Installation of Hadoop
  • Hadoop terminal Commands
  • Hadoop Configuration Files
  • Hadoop Jars Import
  • Work with Cloudera version
  • VM player Installation


  1. Hadoop Architecture and HDFS

Synopsis – In this module, you will learn the Hadoop Cluster Architecture, Important

Configuration files in a Hadoop Cluster, Data Loading Techniques.


  • Hadoop 2.x Cluster Architecture – Federation and High Availability
  • A Typical Production Hadoop Cluster
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Password-Less SSH
  • MapReduce Job Execution
  • Data Loading Techniques: Hadoop Copy Commands


  1. Hadoop MapReduce Framework Basics

Synopsis – In this module, you will understand Hadoop MapReduce framework and the

working of MapReduce on data stored in HDFS. You will learn about YARN concepts in



  • MapReduce Use Cases
  • Traditional way Vs MapReduce way
  • Why MapReduce
  • Hadoop 2.x MapReduce Architecture
  • Learn about Job tracker and Task tracker
  • Use cases of MapReduce
  • Anatomy of MapReduce Program
  • Hadoop 2.x MapReduce Components
  • YARN MR Application Execution Flow
  • YARN Workflow
  • Demo on MapReduce


  1. MapReduce Programs in Java

Synopsis – In this module, you will understand concepts like Input Splits in MapReduce,

Combiner & Partitioner and Demos on MapReduce using different data sets.


  • Basic MapReduce API Concepts
  • Writing MapReduce Driver
  • Mappers in Java
  • Reducers in Java
  • Speeding up Hadoop Development by Using Eclipse
  • Input Splits
  • Relation between Input Splits and HDFS Blocks
  • MapReduce Job Submission Flow
  • Demo of Input Splits
  • MapReduce: Combiner & Partitioner
  • Unit Testing MapReduce Programs


  1. Advance MapReduce

Synopsis – In this module, you will learn Advance MapReduce concepts such as Counters,

Distributed Cache, MRunit, Reduce Join, Custom Input Format, Sequence Input Format and

how to deal with complex MapReduce programs


  • Counters
  • Distributed Cache
  • MRunit
  • Reduce Join
  • Custom Input Format
  • Sequence Input Format


  1. Hive and HiveQL

Synopsis – This module will help you in understanding Hive concepts, Loading and Querying

Data in Hive and Hive UDF.


  • Hive Background
  • Hive Use Case
  • About Hive
  • Hive Vs Pig.
  • Hive Architecture and Components
  • Metastore in Hive
  • Limitations of Hive
  • Comparison with Traditional Database
  • Difference between Hive and RDBMS
  • Hive DDL – Create/Show/Drop Tables
  • Internal and External Tables
  • Hive Data Types and Data Models
  • Partitions and Buckets
  • Hive Tables (Managed Tables and External Tables)
  • Importing Data
  • Querying Data
  • Managing Outputs
  • Hive Script
  • Hive UDF


  1. Pig

Synopsis – In this module, you will learn Pig, types of use case we can use Pig, tight coupling

between Pig and MapReduce, and Pig Latin scripting.


  • What is Pig?
  • MapReduce Vs Pig
  • PIG Architecture & Data types
  • Pig Use Cases
  • Programming Structure in Pig
  • Shell and Utility components
  • PIG Latin Relational Operators
  • Pig Running Modes
  • Pig components
  • Pig Execution
  • Pig Latin Program
  • Data Models in Pig
  • Pig Data Types
  • Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator,
  • Joins and COGROUP, Union, Diagnostic Operators
  • PIG Jars Import
  •  Limitations of PIG
  • Pig UDF


  1. Advance Hive and HBase & NoSQL Databases

Synopsis – In this module, you will understand Advance Hive concepts such as UDF, dynamic

Partitioning. You will also acquire in-depth knowledge of HBase, Hbase Architecture and its



  • What is HBase?
  • HBase Architecture
  • HBase Components
  • Storage Model of HBase
  • HBase vs RDBMS
  • Introduction to Mongo DB
  • CRUD
  • Advantages of MongoDB over RDBMS
  • Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts
  • Hive : Thrift Server, User Defined FunctionsHBase: Introduction to NoSQL Databases and HBase, HBase v/s RDBMS, HBase
  • Components, HBase Architecture, HBase Cluster Deployment


  1. Oozie and Zookeeper

Synopsis – This module will cover Advance HBase concepts. We will see demos on Bulk

Loading, Filters. You will also learn what Zookeeper is all about, how it helps in monitoring a

cluster, why HBase uses Zookeeper.


  • HBase Data Model
  • HBase Shell
  • HBase Client API
  • Data Loading Techniques
  • ZooKeeper Data Model
  • Zookeeper Service
  • Zookeeper
  • Demos on Bulk Loading
  • Getting and Inserting Data
  • Filters in HBase
  • Flume and Sqoop Demo
  • Oozie
  • Oozie Components
  • Oozie Workflow
  • Scheduling with Oozie
  • Demo on Oozie Workflow
  • Oozie Co-ordinator
  • Oozie Commands
  • Oozie Web ConsoleQuiz


Course Features

  • Students 0 student
  • Max Students1000
  • Duration10 week
  • Skill levelall
  • LanguageEnglish
  • Re-take courseN/A
Curriculum is empty


0.00 average based on 0 ratings

5 Star
4 Star
3 Star
2 Star
1 Star

Related Courses