Thursday, November 5, 2015

CCP : DE575 - Cloudera Certification : Data Engineer info.

Benefits
Individuals
  • Performance-Based
    Employers want to hire candidates with proven skills. The CCP program lets you demonstrate your skills in a rigorous hands-on environment.
  • Skills not Products
    Cloudera’s ecosystem is defined by choice and so are our exams. CCP exams test your skills and give you the freedom to use any tool on the cluster. You are given a customer problem, a large data set, a cluster, and a time limit. You choose the tools, languages, and approach. (see below for cluster configuration)
  • Promote and Verify
    As a CCP, you've proven you possess skills where it matters most. To help you promote your achievement, Cloudera provides the following for all current CCP credential holders:
    • A Unique profile link on certification.cloudera.com to promote your skills and achievements to your employer or potential employers which is also integrated to LinkedIn. (Example of a current CCP profile)
    • CCP logo for business cards, résumés, and online profiles
  • Current
    The big data space is rapidly evolving. CCP exams are constantly updated to reflect the skills and tools relevant for today and beyond. And because change is the only constant in open-source environments, Cloudera requires all CCP credentials holders to stay current with three-year mandatory re-testing in order to maintain current CCP status and privileges.
Companies
  • Performance-Based
    Cloudera’s hands-on exams require candidates to prove their skills on a live cluster, with real data, at scale. This means the CCP professional you hire or manage have skills where it matters.
  • Verified
    The CCP program provides a way to find, validate, and build a team of qualified technical professionals
  • Current
    The big data space is rapidly evolving. CCP exams are constantly updated to reflect the skills and tools relevant for today and beyond. And because change is the only constant in open-source environments, Cloudera requires all CCP credentials holders to stay current with three-year mandatory re-testing.
CCP: Data Engineer Exam (DE575) Details
Exam Question Format
You are given five to eight customer problems each with a unique, large data set, a 7-node high performance CDH5 cluster, and four hours. For each problem, you must implement a technical solution with a high degree of precision that meets all the requirements. You may use any tool or combination of tools on the cluster (see list below) -- you get to pick the tool(s) that are right for the job. You must possess enough industry knowledge to analyze the problem and arrive at an optimal approach given the time allowed. You need to know what you should do and then do it on a live cluster under rigorous conditions, including a time limit and while being watched by a proctor.
Audience and Prerequisites
Candidates for CCP: Data Engineer should have in-depth experience developing data engineering solutions and a high-level of mastery of the skills below. There are no other prerequisites.
Required Skills
Data Ingest
The skills to transfer data between external systems and your cluster. This includes the following:
  • Import and export data between an external RDBMS and your cluster, including the ability to import specific subsets, change the delimiter and file format of imported data during ingest, and alter the data access pattern or privileges.
  • Ingest real-time and near-real time (NRT) streaming data into HDFS, including the ability to distribute to multiple data sources and convert data on ingest from one format to another.
  • Load data into and out of HDFS using the Hadoop File System (FS) commands.
Transform, Stage, Store
Convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS or Hive/HCatalog. This includes the following skills:
  • Convert data from one file format to another
  • Write your data with compression
  • Convert data from one set of values to another (e.g., Lat/Long to Postal Address using an external library)
  • Change the data format of values in a data set
  • Purge bad records from a data set, e.g., null values
  • Deduplication and merge data
  • Denormalize data from multiple disparate data sets
  • Evolve an Avro or Parquet schema
  • Partition an existing data set according to one or more partition keys
  • Tune data for optimal query performance
Data Analysis
Filter, sort, join, aggregate, and/or transform one or more data sets in a given format stored in HDFS to produce a specified result. All of these tasks may include reading from Parquet, Avro, JSON, delimited text, and natural language text. The queries will include complex data types (e.g., array, map, struct), the implementation of external libraries, partitioned data, compressed data, and require the use of metadata from Hive/HCatalog.
  • Write a query to aggregate multiple rows of data
  • Write a query to calculate aggregate statistics (e.g., average or sum)
  • Write a query to filter data
  • Write a query that produces ranked or sorted data
  • Write a query that joins multiple data sets
  • Read and/or create a Hive or an HCatalog table from existing data in HDFS
Workflow
The ability to create and execute various jobs and actions that move data towards greater value and use in a system. This includes the following skills:
  • Create and execute a linear workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom actions, etc.
  • Create and execute a branching workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom action, etc.
  • Orchestrate a workflow to execute regularly at predefined times, including workflows that have data dependencies
Exam delivery and cluster information
CCP: Data Engineer Exam (DE575) is a remote-proctored exam available anywhere, anytime. See the FAQ for more information and system requirements.
CCP: Data Engineer Exam (DE575) is a hands-on, practical exam using Cloudera technologies. Each user is given their own 7-node, high-performance CDH5 (currently 5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and NetBeans.

Documentation Available online during the exam
Sample Exam Question
LoudAcre Mobile is a mobile phone service provider that is moving a portion of their customer analytics workload to Hadoop. Before they can use their customer data, they want you to clean it and make it consistent.
Errors were found while looking at the customer records. Unfortunately, different input methods wrote date fields in different formats.  Your task is to standardize these date fields into a consistent format.
Data Description
The Hive metastore contains a database named problem1 that contains a table named customer. The customertable contains 90 million customer records (90,000,000), each with a birthday field.
Sample Data (birthday is in bold)
1904287
Christopher Rodriguez
Jan 11, 2003
96391595
Thomas Stewart
6/17/1969
2236067
John Nelson
08/22/54
Output Requirements
  • Create a new table named solution in the problem1 database of the Hive metastore
  • Your solution table must have its data stored in the HDFS directory /user/cert/problem1/solution
  • Your solution table must have exactly the same columns as the customer table in the same order, as well as keeping the existing file format
  • For every row in the solution table, replace the contents of the birthday field with a date string in “MM/DD/YY” format.
    • MM is the zero-padded month (01-12),
    • DD is the zero-padded day (01-31),
    • YY is the zero-padded 2-digit year (00-99)
End of Sample Problem


3 comments:

  1. I pay my regards to Dumpspass4sure for designing Pass4sure Amazon dumps for IT candidates and so I could pass my exam with such high grades. I was not ready to appear in my IT exam but Amazon study guide attracted my attention and gave me courage to make this attempt successfully.

    ReplyDelete
  2. When I downloaded Cloudera dumps free demo questions from DUMPSSURE.COM I found PDF format very interesting. All this information was presented precisely and in refined form. It was a plus point of Cloudera Certification Exam dumps that it was available in PDF. I think everyone should seek help from this stuff.

    Actual Exam Dumps Questions Answers

    You also found these hidden benefits:

    Instant Download
    Free 90 days updates
    100% Passing Guarantee
    Updated Study Material
    Actual Exam Question
    Easy to learn and understand

    ReplyDelete
  3. Discount Offer! Use this Coupon Code to get 20% OFF EL20

    Though, it was not my first effort but this time succeeded with the help of Exam4lead.com. My experience has been incredible with CLOUDERA CCD-410 Dumps. I got all the concepts during preparation and aced my exam with full confidence. Questions and answers were the most helpful for me as I got a lot of concepts that I did not know before. I am thankful for using such a clear and straightforward language.

    ReplyDelete