Class Details

Price: $2,995

4 Day Course Includes:

  • Class exercises in addition to training instruction
  • Courseware books, notepads, pens, highlighters and other materials
  • Free subscription to Cloudera's practice exam questions
  • Full breakfast with variety of bagels, fruits, yogurt, doughnuts and juice
  • Tea, coffee, and soda available all day
  • Freshly baked cookies every afternoon - * only at participating locations

For group training options, please call us at (240) 667-7757 or email 

Course Outline

Reasoning behind Apache Hadoop

  • Common Issues Arising with Traditional Large-Scale Systems
  • New Approach Requirements

Concepts of Hadoop

  • HDFS - Hadoop Distributed File System
  • MapReduce
  • Hadoop Clusters
  • Components within the Hadoop Ecosystem

MapReduce Programs

  • Overview of the MapReduce Flow
  • Sample Program Examination
  • Concepts Behind Basic MapReduce API
  • Driver Code
  • Mapper, Reducer and API Streaming
  • Rapid Development with Eclipse
  • The New MapReduce API

Hadoop Workflow Integration

  • Relational Database Management Systems
  • Storage Systems
  • Data Importation with Sqoop
  • Real-Time Data Importation with Flume
  • HDFS Access through FuseDFS and Hoop

Hadoop API

  • ToolRunner
  • MRUnit Testing
  • Intermediate Data Reduction through Combiners
  • Map/Reduce Setup and Teardown – Configuration and Close Methodology
  • Better Load Balancing with Partitioners
  • Directly Access to HDFS
  • Distributed Cache

MapReduce Algorithms

  • Sorting, Searching and Indexing
  • Mahout Machine Learning
  • Term and Inverse Document Frequency
  • Word Co-Occurrence

Pig and Hive

  • Overview of Pig
  • Overview of Hive

Techniques for Practical Development

  • MapReduce Code Debugging
  • LocalJobRunner Mode - Easier Debugging
  • Job Information Retrieval through Counters
  • Logging
  • Splittable File Formats
  • Reducers - Deciding the Optimal Numbers
  • MapReduce Jobs - Map-Only

Advanced MapReduce Programming

  • Customizing Writables and WritableComparables
  • SequenceFiles and Avro Files to Save Binary Data
  • InputFormats and OutputFormats Creation

MapReduce for Joining Data Sets

  • Map-Side Joins
  • Secondary Sort
  • Reduce-Side Joins

Hadoop for Manipulating Graphs

  • Graph Techniques
  • Graph Representation through Hadoop
  • Sample Algorithm Implementation

Oozie Workflow Creation

  • Reason for Oozie
  • Format for Workflow Definition


  • Apache Hadoop Overview and Purpose
  • Hadoop Concepts
  • MapReduce Programs
  • Hadoop Workflow Integration
  • Hadoop API
  • MapReduce Algorithms
  • Pig and Hive
  • Advanced MapReduce Programming
  • MapReduce and Joining Data Sets
  • Hadoop and Manipulating Graphs
  • Creating Oozie Workflow 

Class Exam



  • Exam Code: CCD-410
  • Number of Questions: 60 questions
  • Duration: 90 minutes
  • Passing Score: 67%
  • Test Delivery: Pearson VUE
  • Language: English, Japanese



  • Infrastructure (25%)
  • Data Management (30%)
  • Job Mechanics (25%)
  • Querying (20%)

Phoenix TS is an authorized testing center for Pearson VUE and Prometric exams. Register for exams by contacting us or visit the Pearson VUE and Prometric websites.