Thursday, November 5, 2015

CS570 - Big Data Processing - MapReduce Programming Notes

CS570 - Big Data Processing - MapReduce Programming

Objectives
  • To learn how to use Apache Hadoop framework
  • To understand MapReduce pattern
  • To analyze big data application and apply MapReduce pattern
  • To learn HBase and NoSql DB
  • To learn Apache Mahout machine learning library
Schedule
Week#
Topic
01
01
01
Introduction - extra
02~04
02
02
Amazon AWS - extra
02
Cloudera Hadoop - extra
  • Objectives:
(1) Overview of this course
  • Instructive Coverage & Activities:
  • Assignments:
(1) Drawing tables to show MapReduce algorithms
05
05
  • Objectives:
(1) Overview of design patterns
  • Instructive Coverage & Activities:
  • Assignments:
(1) Reading course material
Quiz
06
  • Objectives:
  • Instructive Coverage & Activities:
(1) Summarization Patterns (2) Numerical Summarizations (3) Pattern Description 1 (4) Intent 1 (5) Motivation 1 (6) Applicability 1 (7) Structure 1(8) Consequences 1 (9) Known uses 1 (10) Resemblances 1 (11) Performance analysis 1 (12) Numerical Summarization Examples (13) Minimum maximum and count example (14) MinMaxCountTuple code (15) Mapper code 1 (16) Reducer code 1 (17) Combiner optimization 1 (18) Data flow diagram (19) Average example (20) Mapper code 2 (21) Reducer code 2 (22) Combiner optimization 2 (23) Data flow diagram 2 (24) Median and standard deviation (25) Mapper code 3 (26) Reducer code 3 (27) Combiner optimization 3 (28) Memory-conscious median and standard deviation (29)Mapper code 4 (30) Reducer code 4 (31) Combiner optimization 4 (32) Data flow diagram 4 (33) Inverted Index Summarizations (34) Pattern Description 5 (35) Intent 5 (36) Motivation 5 (37) Applicability 5 (38) Structure 5 (39) Consequences 5 (40) Performance analysis 5 (41) Inverted Index Example (42) Wikipedia reference inverted index (43) Mapper code 6 (44) Reducer code 6 (45) Combiner optimization 6 (46) Counting with Counters (47) Pattern Description 6 (48) Intent 6 (49) Motivation 6 (50) Applicability 6 (51) Structure 6 (52) Consequences 6 (53) Known uses 6 (54)Performance analysis 6 (55) Counting with Counters Example (56) Number of users per state (57) Mapper code 7 (58) Driver code 7
  • Assignments:
(1) Inverted Index
Quiz
07
  • Objectives:
  • Instructive Coverage & Activities:
  • Assignments:
(1) Distinct Design Pattern
Quiz
07
  • Objectives:
  • Instructive Coverage & Activities:
  • Assignments:
(1) Exercise
None
08~09
  • Objectives:
  • Instructive Coverage & Activities:
(1) Structured to Hierarchical (2) Pattern Description 1 (3) Intent 1 (4) Motivation 1 (5) Applicability 1 (6) Structure 1 (7) Consequences 1 (8) Known uses 1 (9) Resemblances 1 (10) Performance analysis 1 (11) Structured to Hierarchical Examples (12) Post comment building on StackOverflow (13)Mapper code 2 (14) Reducer code 2 (15) Question_answer building on StackOverflow (16) Mapper code 3 (17) Reducer code 3 (18) Partitioning (19)Pattern Description 4 (20) Intent 4 (21) Motivation 4 (22) Applicability 4 (23) Structure 4 (24) Consequences 4 (25) Known uses 4 (26) Performance analysis 4 (27) Partitioning Examples (28) Partitioning users by last access date (29) Mapper code 5 (30) Partitioner code 5 (31) Reducer code 5 (32)Binning (33) Pattern Description (34) Intent 6 (35) Motivation 6 (36) Structure 6 (37) Consequences 6 (38) Resemblances 6 (39) Performance analysis 6 (40) Binning Examples (41) Binning by Hadoop-related tags (42) Mapper code 7 (43) Total Order Sorting (44) Pattern Description 8 (45) Intent 8(46) Motivation 8 (47) Applicability 8 (48) Structure 8 (49) Consequences 8 (50) Resemblances 8 (51) Performance analysis 8 (52) Total Order Sorting Examples (53) Sort users by last visit (54) Driver code (55) Analyze mapper code (56) Order mapper code (57) Shuffling (58) Pattern Description (59) Intent 9 (60) Motivation 9 (61) Structure 9 (62) Consequences 9 (63) Resemblances 9 (64) Performance analysis 9 (65) Shuffle Examples (66) Anonymizing StackOverflow comments (67) Mapper code 10
  • Assignments:
(1) Exercise
Quiz
10~11
  • Objectives:
  • Instructive Coverage & Activities:
  • Assignments:
(1) Exercise
Quiz
12
  • Objectives:
  • Instructive Coverage & Activities:
  • Assignments:
(1) Create random data on fly
Quiz
13
  • Objectives:
  • Instructive Coverage & Activities:
(1) Customizing Input and Output in Hadoop (2) InputFormat (3) RecordReader (4) OutputFormat (5) RecordWriter (6) Generating Data (7) Pattern Description (8) Intent 1 (9) Motivation 1 (10) Structure 1 (11) Consequences 1 (12) Resemblances 1 (13) Performance analysis 1 (14) Generating Data Examples (15) Generating random StackOverflow comments (16) Driver Code 2 (17) InputSplit code (18) InputFormat code (19) RecordReader code(20) External Source Output (21) Pattern Description (22) Intent 3 (23) Motivation 3 (24) Structure 3 (25) Performance analysis 3 (26) External Source Output Example (27) Writing to Redis instances (28) OutputFormat code (29) RecordReader code (30) Mapper Code (31) Driver Code (32) External Source Input (33) Pattern Description (34) Intent 4 (35) Motivation 4 (36) Structure 4 (37) Consequences 4 (38) Performance analysis 4 (39) External Source Input Example (40) Reading from Redis Instances (41) InputSplit code 5 (42) InputFormat code 5 (43) RecordReader code 5 (44) Driver code 5 (45) Partition Pruning (46) Pattern Description 6 (47) Intent 6 (48) Motivation 6 (49) Structure 6 (50) Consequences 6 (51) Resemblances 6 (52)Performance analysis 6 (53) Partition Pruning Examples (54) Partitioning by last access date to Redis instances (55) Custom WritableComparable code (56) OutputFormat code 7 (57) RecordWriter code 7 (58) Mapper code 7 (59) Driver code 7 (60) Querying for user reputation by last access date(61) InputSplit code 8 (62) InputFormat code 8 (63) RecordReader code 8 (64) Driver code 8
  • Assignments:
(1) Exercise
Quiz
14
  • Objectives:
  • Instructive Coverage & Activities:
  • Assignments:
(1) None
Quiz
15
  • Objectives:
(1) Final review (2) Final exam
  • Assignments:
None
  • Quiz/Exam:
(1) Final Exam

Week 2  VM system setup & Word Count

Your homework asnwer must include a MapReduce program.


Note: You can use any of the following VMs to do the homework
http://npu85.npu.edu/~henry/npu/classes/cloud_computing/cloudera_hadoop/slide/cloudera_vm.html

============== course material ================
http://npu85.npu.edu/~henry/npu/classes/index_course.html

Week3  Map Reduce for Average
Q4 ==> MapReduce for Average 

Week4 Pi
Q1 ==> Pi 

Week5 XML Parsing
Q2 => XML parsing + MapReduce 

Week6 Summarization Pattern
Q10  ==> MinMax + Counter 

Week8 Min Word Length
Q37 ==> Minimum Word Length 





3 comments:

  1. Your blog is very interesting. Have many things to learn in this blog about big data if you want to learn more about big data then we provide both online and offline trainings.
    Hadoop Certification in Chennai | Hadoop Training in T Nagar | hadoop training in velachery | Hadoop Training in Anna Nagar | Big Data Training in T Nagar

    ReplyDelete
  2. innovative and creative informations about hadoop.keep add this kind of new information which is really worth for us.Hadoop training in chennai
    Big Data Training in Chennai
    Hadoop Training Chennai
    Big data training

    ReplyDelete
  3. Best Online Merkur 45C Review 2021
    The Merkur 45C is a fully 메리트카지노 adjustable safety razor, adjustable blade, for short handle, knurled, razor head and much more. 온카지노 It is powered by 메리트 카지노 쿠폰 a 10-millimeter

    ReplyDelete