Understanding Big Data and Hadoop
- Introduction to big data, limitations of existing solutions
- Hadoop architecture, Hadoop components and ecosystem
- Data loading & reading from HDFS
- Replication rules, rack awareness theory
- Hadoop cluster administrator
- Roles and responsibilities
Hadoop Architecture and Cluster setup
- Hadoop server roles and their usage
- Hadoop installation and initial configuration
- Deploying Hadoop in a pseudo-distributed mode
- Deploying a multi-node Hadoop cluster
- Installing Hadoop Clients
- Understanding working of HDFS and resolving simulated problems.
Hadoop cluster Administration & Understanding MapReduce
- Understanding secondary name node
- Working with Hadoop distributed cluster
- Decommissioning or commissioning of nodes
- Understanding MapReduce
- Understanding schedulers and enabling them.
Backup, Recovery and Maintenance
- Common admin commands like Balancer
- Trash, Import Check Point
- Distcp, data backup and recovery
- Enabling trash, namespace count quota or space quota, manual failover or metadata recovery.
Hadoop Cluster : Planning and Management
- Planning the Hadoop cluster
- Cluster sizing, hardware
- Network and software considerations
- Popular Hadoop distributions, workload and usage patterns.
Hadoop 2.0 and it’s features
Limitations of Hadoop 1.x
- Features of Hadoop 2.0
- YARN framework, MRv2
- Hadoop high availability and federation
- Yarn ecosystem and Hadoop 2.0 Cluster setup.
Setting up Hadoop 2.X with High Availability and upgrading Hadoop
- Configuring Hadoop 2 with high availability
- Upgrading to Hadoop 2
- Working with Sqoop
- Understanding Oozie
- Working with Hive
- Working with Hbase.
Understanding Cloudera manager and cluster setup, Overview on Kerberos
- Hive administration, HBase architecture
- HBase setup, Hadoop/Hive/Hbase performance optimization
- Cloudera manager and cluster setup
- Pig setup and working with grunt
- Why Kerberos and how it helps.