Class Details

Price: $3,295
4-Day Course Includes:
  • Class exercises in addition to training instruction
  • Courseware books, notepads, pens, highlighters and other materials
  • Free subscription to Cloudera's practice exam questions
  • Full breakfast with variety of bagels, fruits, yogurt, doughnuts and juice
  • Tea, coffee, and soda available all day
  • Freshly baked cookies every afternoon - * only at participating locations

For group training options, please call us at (240) 667-7757 or email 

Course Outline

Exploring the Application Architecture

  • Development Environment
  • Selecting and Collecting Input Data
  • Data Processing and Analysis Tools
  • How to Best Display Results

Applying Data Sets

  • Managing Metadata
  • Apache Avro
  • What are Avro Schemas?
  • Avro Schema Evolution
  • File Format
  • Evaluating Peformance

Kite SDK Data Module

  • Kite SDK
  • Data Module Fundamentals
  • Constructing Data Sets with Kite SDK
  • Load, Access, and Delete Data Sets

Apache Sqoop to Import Relational Data

  • Apache Sqoop
  • Importing
  • How to Limit Results
  • Sqoop Performance Considerations
  • Using Sqoop 2

Apache Flume for Data Capture

  • What is Apache Flume?
  • Flume Architecture
  • Flume Sources, Sites and Configuration
  • How to Log Aplication Events to Hadoop

Creating Customized Flume Components

  • Flume Data Flow
  • Extension Points
  • Flume Sources
  • Constructing Flume Pollable Sources
  • Constructing Flume Event-Driven Sources
  • What are Flume Interceptors
  • Header-Modifying Flume Interceptors
  • Filtering Flume Interceptors
  • Avro Objects and Custom Flume Interceptors

Apache Oozie for Workflow Management

  • Why Manage Workflows?
  • Apache Oozie
  • Oozie Workflows
  • Validate, Package and Deploy
  • Run and Track Workflows from the Command Line Interface (CLI)
  • what is the Hue User Interface (UI)?

Apache Crunch to Process Data Pipelines

  • Apache Crunch
  • Crunch Pipeline
  • Crunch and Java MapReduce
  • Crunch Projects
  • Reading/Writing Data with Crunch
  • What is the Data Collection API?
  • Understanding and Applying Functions
  • Crunch API and Utility Classes

Apache Hive and Tables

  • Apache Hive
  • How to Access Hive
  • Fundamental Query Syntax
  • Constructing and Filling Tables
  • Understanding How Data is Read in Hive
  • RegexSerDe 

User-Defined Functions

  • Implementing Functions
  • Custom Library Deployment
  • How to Register User-Defined Functions

Impala to Execute Interactive Queries

  • What is Impala?
  • Hive and Impala
  • Running Queries
  • Supporting User-Defined Functions
  • Managing Data and Metadata

Cloudera Search

  • Cloudera Search Capabilities
  • Understanding the Search Architecture
  • What are the Supported Document Formats

Cloudera Search for Indexing Data

  • Managing Collections and Schemas
  • What are Morphlines
  • Batch Mode and Indexing Data
  • Near Real Time and Indexing Data

Methods for Presenting Data

  • Solr Query Syntax
  • Using Hue to Develop a Search UI
  • JDBC to Access Impala
  • Using Impala and Search for Custom Web Applications


  • Kite SDK to Develop Data Sets
  • Creating Custom Flume Components to Digest Data
  • Oozie to Manage Workflows
  • Crunch for Data Analysis
  • How to Write User-Defined Functions for Hive and Impala
  • Morphlines for Data Transformation
  • Cloudera Search

Class Exam



  • Exam Code: CCD-410
  • Number of Questions: 60 questions
  • Duration: 90 minutes
  • Passing Score: 67%
  • Test Delivery: Pearson VUE
  • Language: English, Japanese



  • Infrastructure (25%)
  • Data Management (30%)
  • Job Mechanics (25%)
  • Querying (20%)

Phoenix TS is an authorized testing center for Pearson VUE and Prometric exams. Register for exams by contacting us or visiting the Pearson VUE and Prometric websites.