Benefits
Individuals
- Performance-Based
Employers want to hire candidates with proven skills. The CCP program lets you demonstrate your skills in a rigorous hands-on environment. - Skills not
Products
Cloudera’s ecosystem is defined by choice and so are our exams. CCP exams test your skills and give you the freedom to use any tool on the cluster. You are given a customer problem, a large data set, a cluster, and a time limit. You choose the tools, languages, and approach. (see below for cluster configuration) - Promote and
Verify
As a CCP, you've proven you possess skills where it matters most. To help you promote your achievement, Cloudera provides the following for all current CCP credential holders: - A Unique profile link on
certification.cloudera.com to promote your skills and achievements to
your employer or potential employers which is also integrated to
LinkedIn. (Example of a current CCP profile)
- CCP logo for business cards,
résumés, and online profiles
- Current
The big data space is rapidly evolving. CCP exams are constantly updated to reflect the skills and tools relevant for today and beyond. And because change is the only constant in open-source environments, Cloudera requires all CCP credentials holders to stay current with three-year mandatory re-testing in order to maintain current CCP status and privileges.
Companies
- Performance-Based
Cloudera’s hands-on exams require candidates to prove their skills on a live cluster, with real data, at scale. This means the CCP professional you hire or manage have skills where it matters. - Verified
The CCP program provides a way to find, validate, and build a team of qualified technical professionals - Current
The big data space is rapidly evolving. CCP exams are constantly updated to reflect the skills and tools relevant for today and beyond. And because change is the only constant in open-source environments, Cloudera requires all CCP credentials holders to stay current with three-year mandatory re-testing.
CCP: Data Engineer Exam (DE575) Details
Exam Question Format
You are
given five to eight customer problems each with a unique, large data set, a
7-node high performance CDH5 cluster, and four hours. For each problem, you
must implement a technical solution with a high degree of precision that meets
all the requirements. You may use any tool or combination of tools on the
cluster (see list below) -- you get to pick the tool(s) that are right for the
job. You must possess enough industry knowledge to analyze the problem and
arrive at an optimal approach given the time allowed. You need to know what you
should do and then do it on a live cluster under rigorous conditions, including
a time limit and while being watched by a proctor.
Audience and Prerequisites
Candidates
for CCP: Data Engineer should have in-depth experience developing data
engineering solutions and a high-level of mastery of the skills below. There
are no other prerequisites.
Required Skills
Data Ingest
The skills
to transfer data between external systems and your cluster. This includes the
following:
- Import and export data between
an external RDBMS and your cluster, including the ability to import
specific subsets, change the delimiter and file format of imported data
during ingest, and alter the data access pattern or privileges.
- Ingest real-time and near-real
time (NRT) streaming data into HDFS, including the ability to distribute
to multiple data sources and convert data on ingest from one format to another.
- Load data into and out of HDFS
using the Hadoop File System (FS) commands.
Transform, Stage, Store
Convert a
set of data values in a given format stored in HDFS into new data values and/or
a new data format and write them into HDFS or Hive/HCatalog. This includes the
following skills:
- Convert data from one file
format to another
- Write your data with
compression
- Convert data from one set of
values to another (e.g., Lat/Long to Postal Address using an external
library)
- Change the data format of
values in a data set
- Purge bad records from a data
set, e.g., null values
- Deduplication and merge data
- Denormalize data from multiple
disparate data sets
- Evolve an Avro or Parquet
schema
- Partition an existing data set
according to one or more partition keys
- Tune data for optimal query
performance
Data Analysis
Filter,
sort, join, aggregate, and/or transform one or more data sets in a given format
stored in HDFS to produce a specified result. All of these tasks may include
reading from Parquet, Avro, JSON, delimited text, and natural language text.
The queries will include complex data types (e.g., array, map, struct), the
implementation of external libraries, partitioned data, compressed data, and
require the use of metadata from Hive/HCatalog.
- Write a query to aggregate
multiple rows of data
- Write a query to calculate
aggregate statistics (e.g., average or sum)
- Write a query to filter data
- Write a query that produces
ranked or sorted data
- Write a query that joins
multiple data sets
- Read and/or create a Hive or an
HCatalog table from existing data in HDFS
Workflow
The
ability to create and execute various jobs and actions that move data towards
greater value and use in a system. This includes the following skills:
- Create and execute a linear
workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs,
custom actions, etc.
- Create and execute a branching
workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs,
custom action, etc.
- Orchestrate a workflow to
execute regularly at predefined times, including workflows that have data
dependencies
Exam delivery and cluster information
CCP: Data
Engineer Exam (DE575) is a remote-proctored exam available anywhere, anytime.
See the FAQ for more information and system
requirements.
CCP: Data
Engineer Exam (DE575) is a hands-on, practical exam using Cloudera
technologies. Each user is given their own 7-node, high-performance CDH5
(currently 5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig,
Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also
comes with Python (2.6 and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6,
Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and
NetBeans.
Documentation Available online during the exam
Cloudera Product Documentation
Hadoop - Apache Hadoop 2.5.0-cdh5.3.2
Cloudera Impala Guide
Apache Hive
Sqoop Documentation (v1.4.5-cdh5.3.2)
Spark Overview - Spark 1.2.1 Documentation
Apache Crunch - Apache Crunch
Apache Pig
Kite: A Data API for Hadoop
Apache Avro 1.7.7 Documentation
Apache Parquet
Cloudera HUE
Apache Oozie
Apache Sqoop documentation
Apache Flume 1.5.0 documentation
DataFu 1.1.0
JDK 7 API Docs
Only the documentation, links, and resources listed above are accessible during the exam. All other websites, including Google/search functionality is disabled. You may not use notes or other exam aids.
Hadoop - Apache Hadoop 2.5.0-cdh5.3.2
Cloudera Impala Guide
Apache Hive
Sqoop Documentation (v1.4.5-cdh5.3.2)
Spark Overview - Spark 1.2.1 Documentation
Apache Crunch - Apache Crunch
Apache Pig
Kite: A Data API for Hadoop
Apache Avro 1.7.7 Documentation
Apache Parquet
Cloudera HUE
Apache Oozie
Apache Sqoop documentation
Apache Flume 1.5.0 documentation
DataFu 1.1.0
JDK 7 API Docs
Only the documentation, links, and resources listed above are accessible during the exam. All other websites, including Google/search functionality is disabled. You may not use notes or other exam aids.
Sample Exam Question
LoudAcre
Mobile is a mobile phone service provider that is moving a portion of their
customer analytics workload to Hadoop. Before they can use their customer data,
they want you to clean it and make it consistent.
Errors
were found while looking at the customer records. Unfortunately, different
input methods wrote date fields in different formats. Your task is to
standardize these date fields into a consistent format.
Data
Description
The Hive
metastore contains a database named problem1 that contains a table
named customer. The customertable contains 90 million customer
records (90,000,000), each with a birthday field.
Sample Data (birthday is in bold)
1904287
|
Christopher Rodriguez
|
Jan 11, 2003
|
96391595
|
Thomas Stewart
|
6/17/1969
|
2236067
|
John Nelson
|
08/22/54
|
Output
Requirements
- Create a new table
named solution in the problem1 database of the Hive
metastore
- Your solution table
must have its data stored in the HDFS
directory /user/cert/problem1/solution
- Your solution table
must have exactly the same columns as the customer table in the
same order, as well as keeping the existing file format
- For
every row in the solution table, replace the contents of the
birthday field with a date string in “MM/DD/YY” format.
- MM is the zero-padded month
(01-12),
- DD is the zero-padded day
(01-31),
- YY is the zero-padded 2-digit
year (00-99)
End of
Sample Problem
I pay my regards to Dumpspass4sure for designing Pass4sure Amazon dumps for IT candidates and so I could pass my exam with such high grades. I was not ready to appear in my IT exam but Amazon study guide attracted my attention and gave me courage to make this attempt successfully.
ReplyDeleteWhen I downloaded Cloudera dumps free demo questions from DUMPSSURE.COM I found PDF format very interesting. All this information was presented precisely and in refined form. It was a plus point of Cloudera Certification Exam dumps that it was available in PDF. I think everyone should seek help from this stuff.
ReplyDeleteActual Exam Dumps Questions Answers
You also found these hidden benefits:
Instant Download
Free 90 days updates
100% Passing Guarantee
Updated Study Material
Actual Exam Question
Easy to learn and understand
Discount Offer! Use this Coupon Code to get 20% OFF EL20
ReplyDeleteThough, it was not my first effort but this time succeeded with the help of Exam4lead.com. My experience has been incredible with CLOUDERA CCD-410 Dumps. I got all the concepts during preparation and aced my exam with full confidence. Questions and answers were the most helpful for me as I got a lot of concepts that I did not know before. I am thankful for using such a clear and straightforward language.