BigData – Hadoop (Development)
1: Introduction to Big Data and Hadoop
What is Big Data? Dimensions of Big data – 6 V’s. Why do we bother Big Data? Challenge of Existing Systems. What is Hadoop? Benefits of History of Hadoop. Characteristics of Hadoop. Hadoop Customers and their use cases. Popular Hadoop Vendors and Distributions. Hadoop Certifications details. Q/A.
2: Architecture of Hadoop 1.x and Hadoop 2.x
What is Master-Slave Architecture? Hadoop 1.0 and 2.0 Architecture. Concepts of Blocks, Replication and Rack Awareness. File read and write anatomy. Coherency model. Q/A and Exercises.
3: Installation & Configuration of Hadoop
Installation options and Pre-Requisites. Cloudera Distributed Hadoop (CDH) installation. Hadoop Installation. Modes. Hadoop Configuration File. Basic Linux and Hadoop Commands – Demo. Q/A.
What is MapReduce? Why MapReduce? Benefits of using MapReduce. Word Count example using MapReduce and Eclipse. More Hands-on using real world datasets such as ‘Number of sub-patents’, ‘Calculate
max Temperature’, ‘Find Hot and cold days’, ‘Word size and word count’, ‘health Care Datasets’. Real-world case-studies. Assignments. Q/A and Quiz.
5: Advanced Mapreduce
Partitioners and Combiners. Map side and Reduce side Joins. Hands-on. Assignments. Q/A and Quiz
Lesson Objectives. About Pig. The Grunt Shell. Understanding Pig. Pig Latin Relation Names. Pig Latin Field Names. Pig Data Types. Pig Complex Types. Defining a Schema. The GROUP Operator. GROUP ALL. Relations without a Schema. The FOREACH GENERATE Operator. Specifying Ranges in FOREACH. Field Names in a FOREACH. FOREACH with Groups. The FILTER Operator. The LIMIT Operator. Assignments. Q/A and Quiz.
7: Advanced Pig Programming
The ORDER BY Operator. The CASE Operator. Parameter Substitution. The DISTINCT Operator. Using PARALLEL. The FLATTEN Operator. Splitting a Dataset. Nested FOREACH. About Joins. Performing an Inner Join. Performing an Outer Join. Replicated Joins. The COGROUP Operator. illustrate operator. Pig User Defined Functions. A UDF Example. Invoking a UDF. Tips for Optimizing Pig Scripts. Assignments. Q/A and Quiz.
Hive Architecture. Submitting Hive Queries. Defining a Hive Managed Table. Defining an External Table. Loading Data into a Hive Table. Performing Queries. Understand how Hive table data is stored in HDFS. Hive Partitions. Hive Buckets. Sorting Data. Using Distribute By Storing Results to a File. Specifying MapReduce Properties. Analyzing Big Data with Hive. Hive Join Strategies. Shuffle Joins. Map Joins. Sort Merge Bucket (SMB) Joins. Invoking a Hive UDF. Assignments. Q/A and Quiz.
9: Advanced Hive Programming
Performing a Multi Table/File Insert. Understanding MapReduce in Hive (To understand how Hive queries get executed as MapReduce jobs). Understanding Views. The TRANSFORM Clause. The OVER Clause. Using Windows. Hive Analytics Functions. Hive File Formats. Hive RC/ORC Files. Computing Table Statistics & Explain plan. Compression techniques. Vectorization. Understanding Hive on Tez. Hive Optimization Tips. Hive Query Tunings. Even Resource distribution. Assigning reducers. Handling of Skewness. Assignments. Q/A and Quiz.
What is NoSQL Database? Different types of NoSQL Databases. What is HBase? Why Hbase? Who uses HBase? Use cases of HBase in real World. Hbase Architecture. Setting up and Starting up HBase. HBase Hands-on- Loading, Querying, Filtering and Analyzing data. Sample Hbase POC. Q/A and Quiz.
11: Zookeper & Oozie
What is Zookeeper? Why, where, Who of Zookeeper. How to set-up and use Zookeeper. Hbase and Zookeeper.
Use Case and Hands-on. Hadoop Security System Concepts. What is Oozie? Why, where, Who of Oozie. How to set-up and use Oozie. Use Cases and hands-on. Q/A and Quiz.
12: Flume & Sqoop
What is Flume? What is Sqoop? Hands-on . Q/A and Quiz.
13: Advance Concepts- Introduction to other Popular tools and Hadoop Platforms
Setting up Security on Hadoop –Kerberos. Setting up Rack topology in Hadoop Cluster. Connecting Hadoop with MongoDB. Introduction to real-time streaming frameworks –Storm, StreamSets, Apache NiFi. Installation and exploring Hottonworks and walk-through its Product features and. Offerings. Installation of Hadoop Components individually on host machine and deploy. Hadoop cluster- no Sandbox usage.