NPU-Learning notes: CCP : DE575 - Cloudera Certification : Data Engineer info.

Benefits

Individuals

Performance-Based
Employers want to hire candidates with proven skills. The CCP program lets you demonstrate your skills in a rigorous hands-on environment.
Skills not Products
Cloudera’s ecosystem is defined by choice and so are our exams. CCP exams test your skills and give you the freedom to use any tool on the cluster. You are given a customer problem, a large data set, a cluster, and a time limit. You choose the tools, languages, and approach. (see below for cluster configuration)
Promote and Verify
As a CCP, you've proven you possess skills where it matters most. To help you promote your achievement, Cloudera provides the following for all current CCP credential holders:

A Unique profile link on certification.cloudera.com to promote your skills and achievements to your employer or potential employers which is also integrated to LinkedIn. (Example of a current CCP profile)
CCP logo for business cards, résumés, and online profiles

Current
The big data space is rapidly evolving. CCP exams are constantly updated to reflect the skills and tools relevant for today and beyond. And because change is the only constant in open-source environments, Cloudera requires all CCP credentials holders to stay current with three-year mandatory re-testing in order to maintain current CCP status and privileges.

Companies

Performance-Based
Cloudera’s hands-on exams require candidates to prove their skills on a live cluster, with real data, at scale. This means the CCP professional you hire or manage have skills where it matters.
Verified
The CCP program provides a way to find, validate, and build a team of qualified technical professionals
Current
The big data space is rapidly evolving. CCP exams are constantly updated to reflect the skills and tools relevant for today and beyond. And because change is the only constant in open-source environments, Cloudera requires all CCP credentials holders to stay current with three-year mandatory re-testing.

CCP: Data Engineer Exam (DE575) Details

Exam Question Format

You are given five to eight customer problems each with a unique, large data set, a 7-node high performance CDH5 cluster, and four hours. For each problem, you must implement a technical solution with a high degree of precision that meets all the requirements. You may use any tool or combination of tools on the cluster (see list below) -- you get to pick the tool(s) that are right for the job. You must possess enough industry knowledge to analyze the problem and arrive at an optimal approach given the time allowed. You need to know what you should do and then do it on a live cluster under rigorous conditions, including a time limit and while being watched by a proctor.

Audience and Prerequisites

Candidates for CCP: Data Engineer should have in-depth experience developing data engineering solutions and a high-level of mastery of the skills below. There are no other prerequisites.

Required Skills

Data Ingest

The skills to transfer data between external systems and your cluster. This includes the following:

Import and export data between an external RDBMS and your cluster, including the ability to import specific subsets, change the delimiter and file format of imported data during ingest, and alter the data access pattern or privileges.
Ingest real-time and near-real time (NRT) streaming data into HDFS, including the ability to distribute to multiple data sources and convert data on ingest from one format to another.
Load data into and out of HDFS using the Hadoop File System (FS) commands.

Transform, Stage, Store

Convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS or Hive/HCatalog. This includes the following skills:

Convert data from one file format to another
Write your data with compression
Convert data from one set of values to another (e.g., Lat/Long to Postal Address using an external library)
Change the data format of values in a data set
Purge bad records from a data set, e.g., null values
Deduplication and merge data
Denormalize data from multiple disparate data sets
Evolve an Avro or Parquet schema
Partition an existing data set according to one or more partition keys
Tune data for optimal query performance

Data Analysis

Filter, sort, join, aggregate, and/or transform one or more data sets in a given format stored in HDFS to produce a specified result. All of these tasks may include reading from Parquet, Avro, JSON, delimited text, and natural language text. The queries will include complex data types (e.g., array, map, struct), the implementation of external libraries, partitioned data, compressed data, and require the use of metadata from Hive/HCatalog.

Write a query to aggregate multiple rows of data
Write a query to calculate aggregate statistics (e.g., average or sum)
Write a query to filter data
Write a query that produces ranked or sorted data
Write a query that joins multiple data sets
Read and/or create a Hive or an HCatalog table from existing data in HDFS

Workflow

The ability to create and execute various jobs and actions that move data towards greater value and use in a system. This includes the following skills:

Create and execute a linear workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom actions, etc.
Create and execute a branching workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom action, etc.
Orchestrate a workflow to execute regularly at predefined times, including workflows that have data dependencies

Exam delivery and cluster information

CCP: Data Engineer Exam (DE575) is a remote-proctored exam available anywhere, anytime. See the FAQ for more information and system requirements.

CCP: Data Engineer Exam (DE575) is a hands-on, practical exam using Cloudera technologies. Each user is given their own 7-node, high-performance CDH5 (currently 5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and NetBeans.

Documentation Available online during the exam

Cloudera Product Documentation
Hadoop - Apache Hadoop 2.5.0-cdh5.3.2
Cloudera Impala Guide
Apache Hive
Sqoop Documentation (v1.4.5-cdh5.3.2)
Spark Overview - Spark 1.2.1 Documentation
Apache Crunch - Apache Crunch
Apache Pig
Kite: A Data API for Hadoop
Apache Avro 1.7.7 Documentation
Apache Parquet
Cloudera HUE
Apache Oozie
Apache Sqoop documentation
Apache Flume 1.5.0 documentation
DataFu 1.1.0
JDK 7 API Docs

Only the documentation, links, and resources listed above are accessible during the exam. All other websites, including Google/search functionality is disabled. You may not use notes or other exam aids.

Sample Exam Question

LoudAcre Mobile is a mobile phone service provider that is moving a portion of their customer analytics workload to Hadoop. Before they can use their customer data, they want you to clean it and make it consistent.

Errors were found while looking at the customer records. Unfortunately, different input methods wrote date fields in different formats. Your task is to standardize these date fields into a consistent format.

Data Description

The Hive metastore contains a database named problem1 that contains a table named customer. The customertable contains 90 million customer records (90,000,000), each with a birthday field.

Sample Data (birthday is in bold)

1904287	Christopher Rodriguez	Jan 11, 2003
96391595	Thomas Stewart	6/17/1969
2236067	John Nelson	08/22/54

Output Requirements

Create a new table named solution in the problem1 database of the Hive metastore
Your solution table must have its data stored in the HDFS directory /user/cert/problem1/solution
Your solution table must have exactly the same columns as the customer table in the same order, as well as keeping the existing file format
For every row in the solution table, replace the contents of the birthday field with a date string in “MM/DD/YY” format.

MM is the zero-padded month (01-12),
DD is the zero-padded day (01-31),
YY is the zero-padded 2-digit year (00-99)

End of Sample Problem

Certification FAQ

Verify a Certification

3 comments:

DumpsPass4sureOctober 2, 2019 at 12:33 AM
I pay my regards to Dumpspass4sure for designing Pass4sure Amazon dumps for IT candidates and so I could pass my exam with such high grades. I was not ready to appear in my IT exam but Amazon study guide attracted my attention and gave me courage to make this attempt successfully.
Richard RichiFebruary 21, 2020 at 1:16 PM
When I downloaded Cloudera dumps free demo questions from DUMPSSURE.COM I found PDF format very interesting. All this information was presented precisely and in refined form. It was a plus point of Cloudera Certification Exam dumps that it was available in PDF. I think everyone should seek help from this stuff.

Actual Exam Dumps Questions Answers

You also found these hidden benefits:

Instant Download
Free 90 days updates
100% Passing Guarantee
Updated Study Material
Actual Exam Question
Easy to learn and understand
Exam4leadSeptember 26, 2020 at 2:59 AM
Discount Offer! Use this Coupon Code to get 20% OFF EL20

Though, it was not my first effort but this time succeeded with the help of Exam4lead.com. My experience has been incredible with CLOUDERA CCD-410 Dumps. I got all the concepts during preparation and aced my exam with full confidence. Questions and answers were the most helpful for me as I got a lot of concepts that I did not know before. I am thankful for using such a clear and straightforward language.

Thursday, November 5, 2015

CCP : DE575 - Cloudera Certification : Data Engineer info.

3 comments: