Graduate Certification Course on Big Data, Hadoop and Spark

A training course specifically designed to help you start a career in Big Data, Hadoop & Spark!


Total Lesson


Total Video


An exhaustive course designed to transform you into a Big Data specialist!

Our Big Data Training Course is designed to provide you with hands-on training on sophisticated systems and tools used by Big Data Architects. During this course, you will learn all essential Big Data skills including MapReduce, HDFS, Hive, HBase, Yarn, Pig, Sqoop, Oozie, Flume ,Zookeeper , Spark and Kafka.

Unlike other training programs, our Big Data Training Course is carefully curated and led by some of the biggest industry experts working for Amazon, TCS, Neustar, Anthem and Social Code. Our training program offers and exhaustive and comprehensive curriculum which you cannot expect from other short-term certifications offered elsewhere.

Why Learn Big Data and Hadoop?

Organizations all over the world are realizing the benefits offered by Big Data and Hadoop. Many large companies are recruiting Big Data engineers in huge numbers. Forbes, McKinsey, IDC and Gartner all predict that this trend will continue to rise over the next few years. Already, there is a shortage of high quality Big Data engineers and there will be a huge demand for it in the near future.

Also, Big Data and Hadoop engineers are amongst the highest paid IT professionals in the world today. The average salary for a Big Data and Hadoop engineer is $135,000 and it is predicted that salaries will rise. The market for Big Data and Hadoop engineers is huge and this offers a great opportunity to engineering students and young professionals. By getting certified in Big Data and Hadoop, you’ll be on your way to building a successful career in Big Data.

Who Should Go for this Course?

Our Big Data Training Course is specifically designed for engineering students and early professionals who want to build a career in Big Data and Hadoop.

While our course is only meant for early professionals and engineering students, we do have plans to open new modules for other professional groups in the future.

About the Training

Our Big Data Training Course is designed to help engineering students and early professionals build a career in Big Data and Hadoop. Our hands-on training includes essential Big Data concepts including MapReduce, HDFS, Hive, Hbase, Yarn, Pig, Sqoop, Oozie, Flume , Zookeeper, Spark and Kafka. You will learn how to use these concepts in real-life industry-based use cases in our training course. Additionally you will be learning Python programming language and Tableau Visualization tool.

Our training course is a stepping stone to starting a career in Big Data. You’ll be exposed to theory, practical, assignments, use cases and plenty of interaction with peers and industry experts. After completing our course, you will be certified as a Big Data and Hadoop engineer.

Training Objectives

By completing our Big Data Training Course, you will learn the essential skills and become a Big Data specialist. Our training course will help you:

  • Understand the core concepts of MapReduce and HDFS framework
  • Learn various data loading techniques by using Flume and Sqoop
  • Grasp the concepts of Hadoop 2.x Architecture
  • Write complex programs in MapReduce
  • Perform data analytics using Hive and Pig
  • Schedule tasks using Oozie
  • Implement best practices for Big Data and Hadoop
  • Data Visualization using Tableau
  • Work on live projects
  • Become certified as a Big Data and Hadoop Developer

What are the System Requirements for this Course?

In order to successfully undergo our course, you’ll need a computer which satisfies minimum system requirements. This includes:

  • A Windows (XP SP3 or higher) or Mac (OSX 10.6 or higher) computer
  • CPU processor of Intel i3 or above
  • RAM of 4GB or more
  • Internet connectivity
  • Headset, speakers and microphone

How Will I Execute the Practical?

To execute the practical, we’ll help you set up a virtual machine with local access on your computer. We’ll provide you with all the instructions and a detailed guide for setting up the environment to execute the practical.If you have any doubts regarding the practical, you can always get in touch with our support team. Our expert support team will address your doubts and queries promptly.


Chapter 1 - Big Data Overview
Big Data Overview - Part 1
Big Data Overview - Part 2
Big Data Overview - Part 3
Chapter 2 - Hadoop Overview
Introduction to Hadoop - Part 1
Introduction to Hadoop - Part 2
Introduction to Hadoop - Part 3
LAB Session
Cloudera Virtual Machine Installation
Chapter 3 - Hadoop Distributed File System (HDFS)
HDFS Introduction
HDFS Architecture
HDFS Replication and Re-replication Process
HDFS Write and Read Process
HDFS HA, Failover, Fencing and File Permission
Chapter 4 - Yet Another Resource Negotiator (YARN)
Challenges with Hadoop 1.0 - An introduction to YARN
YARN Architecture
Chapter 5 - MapReduce Programming Framework
Introduction to MapReduce Programming - Part 1
Introduction to MapReduce Programming - Part 2
Introduction to MapReduce Programming - Part 3
LAB Session
MapReduce - Driver Class, Mapper Class
MapReduce - Writable and WritableComparable
MapReduce - Reducer, Combiner etc
MapReduce - Word Count Example
Use Case
MapReduce - Hot and Cold Day - Problem Statement
MapReduce - Hot and Cold Day - Solution
MapReduce - Log File Analysis - Problem Statement
MapReduce - Log File Analysis - Solution
MapReduce - YouTube Data Analysis - Problem Statement
MapReduce - YouTube Data Analysis - Solution
MapReduce - Patent Data Analysis - Problem Statement
MapReduce - Patent Data Analysis - Solution
TOP-N Words
Calculate Maximum Temperature
Chapter 6 - HIVE
HIVE Introduction
HIVE Architecture and Component
HIVE Data Types
HIVE Data Model
LAB Session
Hive - Hive Commands
Hive - Create,Drop and Alter Hive Database
Hive - Create , Describe and Show Hive Table
Hive - Hive Managed Table
Hive - Hive Managed Table and Location key Word
Hive - Hive External Table and Location key Word
Hive - Hive Partitioned
Hive - Hive DROP and ALTER Tables
Hive - Hive DML
Hive - Hive Queries
Hive - Hive Join
Hive - Hive Built In Operators and Functions
Use Case
Indian Railways Train Time Table Analysis - Problem Statement
Indian Railways Train Time Table Analysis - Solution
Airplane Crash History Data Analysis - Problem Statement
Airplane Crash History Data Analysis - Solution
Active Satellite Around Earth Data Analysis - Problem Statement
Active Satellite Around Earth Data Analysis - Solution
Hive Assignment
Chapter 7 - Apache PIG
PIG Introduction
PIG Architecture
PIG Data Model
LAB Session
PIG - Grunt Shell
PIG - Data Model Examples
PIG - Data Types and Operators
PIG - Load and Store Operator
PIG - GROUP Operator
PIG - JOIN Operator
PIG - Built In Functions and UDF
Chapter 8 - Apache HBase
HBase Introduction
HBase Data Model
HBase Architecture
HBase ACID properties
LAB Session
HBase - General Commands
HBase - Data Definition Language
HBase - Data Manipulation Language
HBase - Tools and Utilities - ImportTsv
HBase - Tools and Utilities - all others
HBase - Mapreduce implementation
Chapter 9 - Apache SQOOP
Sqoop Introduction
Sqoop Architecture
LAB Session
Cloudera MySQL Db and Sqoop Commands
Sqoop Import Tool and Examples
Sqoop Import Tool and Examples
Sqoop Import Tool and Examples
Sqoop Import into Hive Tables
Sqoop Imports - Use of Delimiters
Incremental Import
Sqoop saved Jobs
Sqoop Merge Tool
Sqoop import-all Tool and Examples
Sqoop export Tool and Examples
Sqoop export Tool and Examples
Miscellaneous Topics in Sqoop
Chapter 10 - Apache FLUME
Flume Introduction
Flume Architecture
Flume Data Flow
LAB Session
Flume Agent Set up
Flume - Generate events and log them to HDFS
Flume - Simulate Web Log
Flume - Sequence Generator
Use Case
Flume - Use Case - Fetching Twitter Data
Chapter 11 - Apache OOZIE
Oozie Introduction
Oozie Workflow Engine
Oozie Coordinator and Bundle
LAB Session
Create Oozie Workflow job
Create Oozie Coordinator Application
Chapter 12 - Apache Zookeeper
Zookeeper Introduction
Chapter 13 - Python
Python Object and Data Structure Basics
Python Comparison Operators
Python Statements
Methods and Functions
Object Oriented Programming
Modules and Packages
Errors and Exception Handling
Built-in Functions
Python Decorators
Python Generators
Overview of Collections and Types
Manipulating collections using Map Reduce APIs
Pandas – Series and Data Frames
LAB Session
Python LAB - All theories will be covered in LAB
Chapter 14 - Spark
Spark Architecture and Execution Modes
RDD, DAG and Lazy Evaluation
Basic Transformations and Actions
Advanced Transformations
Execution Life Cycle
Accumulators and Broadcast Variables
LAB Session
Creating Data Frames and Pre Defined Functions
Data Frame Operations – Basic Transformations
Data Frame Operations
Spark SQL – Basic Transformations
Different file formats – text, json, orc, parquet, avrò etc
Reading text data with custom delimiters
Compression concepts and algorithms
Chapter 15 - Kafka
Overview of Kafka
Spark Streaming – Legacy
Structured Streaming
Integrating Kafka with Structured Streaming
LAB Session
Structured Streaming
Integrating Kafka with Structured Streaming
Chapter 16 - Tableau
Introduction of Tableau
Tableau Server
Tableau Online
Tableau Desktop
LAB Session
Connecting to Data
Visual Anlytics
Dashboards and Stories

Prosenjit De , Praveenkumar Devarajan , Harish Kumar P V , Soumen Saha

Co-Author of BigDataHorizon’s big data and Hadoop certification course

Praveenkumar Devarajan
Data Architect at Anthem, Orange County, California

Prosenjit De
Co-Founder of BigDataHorizon

Harish Kumar P V
Data Engineer at Leading MNC in US

Soumen Saha
Big Data - Architect , Designer

What is included

Unlike most other short-term Big Data certification programs, our training course is unique. Our comprehensive course covers all the necessary elements to transform you from a college student or a young professional to a certified Big Data expert.

Here are some of the salient features of our exhaustive Big Data Training Course:

  • Practical Assignments after each topic
  • Access to Course Discussion Forum
  • High Quality pre recorded course video for most of the topics 
  • Virtual Machine Installation Support
  • Support from Experts
  • Live Instructor Led sessions 
  • Topic Quizzes
  • Real-life Case Studies
  • One Practice Test
  • Final Assessment
  • Certification


There are no specific prerequisites to be eligible for our Big Data Training Course. As an early professional or an engineering student, you can go ahead and enroll for our training course.

That being said, knowledge of SQL, Java or OOPS programming will be beneficial in understanding our course, but it is not mandatory. However, we do encourage applicants to familiarize themselves with working on Linux/Unix platforms.

We will be providing “Java for Hadoop” and “Linux Basics” as FREE “Self Paced” course part of the Big Data, Hadoop and Sparkcertification course.

Case Studies

Towards the completion of our Big Data Training Course, you will work on a live case study using core concepts learnt during the training. Our industry-specific case studies span various industries including Aviation, Finance, Retail, Social Media and Tourism, among others. You can choose any industry-specific case study to work on.

These are only basic examples of the real-life case studies you can choose to work on. More specifics and details regarding case studies will be revealed to you during your training.


Your doubts and queries regarding our Big Data Training Course – answered!

1. Who will train me? How do you select faculties?

You will be trained by leading industry experts who are highly qualified and certified as instructors. We select faculties with extreme care so that you get the best possible training experience.

All of our instructors are industry practitioners with 10-15 years of relevant experience. Our instructors are subject matters in Big Data and Hadoop working in firms innovative tech firms like Amazon, TCS, Neustar, Social Code, Anthem. We also specially train our instructors so that they can offer you an enriching learning experience.

2. What are the different training models you offer?

We offer two different training models – instructor-led training and self-paced training. You can choose whichever type of training model you’re comfortable with.

Our instructor-led training model includes optional pre-recorded video content for most of the topics for you to go through. And, you’ll have access to 24 to 26 live training sessions (each 2 hours duration) with our faculties to discuss concepts, clarify doubts and get your questions answered.

Our self-paced training model allows you to complete the training course at your own speed. We’ll provide you with prerecorded video content and you can clarify doubts and discuss concepts in our online forum. Additionally we will arrange 6 to 8 doubt clearing sessions (1 hour each ) which will be communicated once you register for the course.

We offer only Instructor Led training model for  The "Graduate Certification Course on Big Data, Hadoop and Spark" course. 

3. What if I miss a class or training session?

When you choose our Big Data Training Course, you need not worry about missing a class or training session. We’ll provide you with prerecorded video content which you can access at any time. And, if you miss any instructor session, you can always attend the missed session with any other live batch.

4. How can I enroll for your training?

You can enroll for our Big Data Training Course right here on our website

5. How do I pay for the training course?

We accept a variety of payment options, which include online money transfer, credit and debit cards (Visa, Mastercard. You can choose any of these payment options as per your convenience.

6. Do you offer group discounts?

Yes, we do! To find out more about what group discounts we offer on our Big Data Training Course, please get in touch with us.

7. How do I learn more about your training course?

If you’d like to learn more about our Big Data Training Course before enrollment, you can always get in touch with our support team. We’ll connect you to customer service representatives who will offer you all the information you need about our training course.

8. How do I get assistance on the training?

We’re committed to make our training a helpful and enriching learning experience. If you choose our instructor-led model, our expert instructors will offer you all the assistance you need on the training. Alternatively, if you choose our self-paced model, you can always use our online forum to interact with peers for any assistance.

9. What if I have any more queries?

If we haven’t answered all of your questions or if you have any specific query, please feel free to get in touch with us using our “Contact Us”page. We’ll do everything we can to resolve your queries.