Delving Deep into Hadoop – Course Contents
Introduction
to Hadoop and Architecture
Hadoop
1.0 Architecture
- Introduction
to Hadoop & Big Data
- Hadoop Evolution
- Hadoop Architecture
- Networking Concepts
- Use cases - Where Hadoop fits into
Hadoop
2.0 Architecture
- Limitations
on Hadoop 1.0 Architecture
- Features of Hadoop 2.0 Architecture
- HDFS Federation
- High Availability of Name Node
- YARN – Yet Another Resource Negotiator
- Developing Applications on YARN
- Non MR applications on top of YARN
Quiz on Architecture
Concepts
Cluster
Installation
Hadoop Cluster Installation
- Types
of Hadoop Cluster
- Installing Pseudo Mode Cluster
- Walk thru on inbuilt scripts, directories, configuration files
and port numbers.
- Discussion on Real Time Cluster Size
Detailed
documentation on Installation Procedure
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png)
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png)
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image003.png)
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png)
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png)
Distributed File System -
HDFS
HDFS
Commands
- Introduction
to HDFS Commands
- Discussion on scenarios where specific commands are applicable
- Introduction to Advanced HDFS Commands including fine tuning of
cluster
Detailed
documentation on all the HDFS Commands
Custom Script
building using HDFS & Unix commands
Quiz on HDFS Commands
Map
Reduce - MR
Map
Reduce using Java
- Introduction
to Map Reduce Architecture
- Detailed discussion on different phases of MR
- Mapper
- Reducer
- Splitting
- Sorting
- Shuffling
- Combiner
- Partitioning
- Developing Map Reduce Application from Scratch using different
use cases
- Discussion of difference between Old MR API & New MR API
- Introduction to different file formats and their internal
features (Sequential, Binary etc.,)
- Analytics using MR on to derive Banking Solution
Case Study on Map
Reduce (Customer Sentiment Analyser)
Map
Reduce using Python – Streaming
- Developing
Map Reduce Application using Python
- Discussion of different features available in Streaming
Case Study on Map
Reduce Streaming (Analytics on Temperature Datasets)
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image002.jpg)
Quiz on Map Reduce
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png)
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png)
Hadoop Eco System Components
Hive
(Data Warehouse on top of HDFS)
- Introduction
to Hive Architecture
- Configuring Hive Metadata store in different ways
- Basic Queries in Hive (DDL, DML)
- Advanced features of Hive
- Partitioning
- Bucketing
- Sampling
- Multi Table Load Queries
- Serialize & De Serialize
- Dealing with different formats of data (Flat file, JSON, CSV
etc.,)
- Query optimization using Hive.
- Developing User Defined Functions (UDF’s) in Java & Python
Case Study (Analytics
on Telecom Datasets)
Quiz on Hive
PIG (Data Flow Language)
- Introduction
to Pig Latin
- Basic Commands in Pig
- Explanation advanced features of Pig with real time scenarios
- Different ways of using PigStorage
- Dealing with Unstructured data
- Developing Regular Expressions
- Developing User Defined Functions (UDF’s) in Java & Python
Case Study (Analytics
on Books Datasets)
Quiz on Pig
SQOOP
(Import – Export utility)
- Introduction
to Sqoop
- Basic Sqoop Commands
- Advanced Import Features
- Advanced Export Features
- Upsert Calls
- EVAL
- Compressed Formats
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png)
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png)
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image003.png)
Case Study (Analytics
on Telecom Datasets)
Quiz on Sqoop
HBASE
(Versioned Database)
- Introduction
to HBASE & NOSQL
- Basic difference in Row Oriented and Column Oriented storage
- Basic HBASE Commands
- Advanced HBASE Features
- Versions
- Compression Techniques
- Bloom Filters
- Sequential Scans
- Bulk Loads to HBASE Features
Case Study
on HBASE
Quiz on HBASE
Flume
- Flume
Architecture
- Configuring Flume Components
- Building Flume Config files for different scenarios
- Basic Config File building
- Config file for connecting to different File Servers
- Config file for connecting to Web Servers
Quiz on Flume
Spark
- Introduction
to Spark and In-memory applications
- Understanding RDD (Resilient Distributed Dataset)
- Spark Context and Spark SQL Context
- Introduction to MLib, Streaming
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png)
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image003.png)
![](file:///C:/Users/shalini/AppData/Local/Temp/msohtmlclip1/01/clip_image004.png)
Quiz on Spark
Kafka
- Introduction
to Kafka architecture
- Single and Multi-Broker configuration
- Java Sample Producer
- Integration with Hadoop (Flume) and Kafka
Quiz on Kafka
Finally this series
of Practical Sessions ends with Quiz on entire course.