Friday, May 6, 2016

敏捷團隊的工作方法(Agile work methodology)

目標
據統計,在軟體開發專案中,期初規劃產生的需求中,大約55%會產生變動。對很多軟體業者來說,「詳細規劃‧一次到位」是不可能的事,因為客戶一開始只講得出需要(Needs)大方向,要看到實際成品後,詳細需求(Requirements)才會逐步的浮現,如果堅持「先寫作文再做事」,就算客戶在需求文件上簽了名,開發團隊埋頭苦幹的成果,也不見得是客戶滿意的產品。 從1990年代以來,主張「善用需求變更」的敏捷方法,不但已經是資訊業界公認的最佳開發實務,更是英國、美國和澳洲政府資訊專案的指定作法。敏捷方法的派別雖然多,共同點都是組成跨職能專責團隊,透過短天數開發週期,儘快且持續交付漸進的成果,驗證技術可行性,確認使用者需要,並據以調整接續的需求內容。 本課程聘請大師級的敏捷教練(Agile Coach),透過案例故事,說明敏捷方法的精隨,以實際輔導的經驗,分享在企業或專案的層次,從無到有轉型敏捷開發團隊的作法,並以實務的角度,引導您一步一步體驗敏捷團隊的工作方法。
課程結束後您可以獲得以下知識與技能:
  1. 快速掌握敏捷方法的原理和實務做法
  2. 學會從使用者的角度看需求、從宏觀的角度看專案
  3. 學到上班用得到的敏捷技巧
對象
  1. 新創團隊的成員
  2. 預計轉型使用敏捷方法的團隊成員
  3. 在大型專案計畫中,預計局部使用敏捷方法的團隊成員
  4. 已經使用敏捷方法,但遇到瓶頸的團隊成員
預備知識
曾經參與專案執行或產品開發
內容
第一天:敏捷專案管理的四大元素
  1. 別讓團隊不開心,發揮敏捷開發的力量
  2. 別讓客戶不開心,善用變更創造的價值
  3. 別讓老闆不開心,用減法定需求的藝術
  4. 別讓大家不開心,掌握專案真實的進度
第二天:敏捷大師換你做
  1. 無需天長地久,輕鬆和Scrum作朋友
  2. 設計我也會,以人為本 (User-centric)的規劃心法
  3. 數據驅動開發,自訂衡量績效的有感指標
  4. 創造雙贏合約,敏捷委外開發的方式
第三天:團隊擅變力的七個步驟
  1. 制定願景,許專案一個明確的未來
  2. 規劃產品,捕捉重要使用者的需求
  3. 期初規劃,集中產能做最重要的事
  4. 站立會議,打造能自我管理的團隊
  5. 期末展示,交實際成果推專案前進
  6. 回顧反省,持續提升團隊的敏捷力

Friday, December 4, 2015

Monday, November 30, 2015

How to calaculate MIPS ?

https://cseweb.ucsd.edu/classes/sp97/cse141/hw1a.html

Question 1.
Suppose that when Program A is run, the user CPU time is 3 seconds, the elapsed wallclock time is 4 seconds, and the system performance is 10 MFLOP/sec. 

Assume that there are no other processes taking any significant amount of time, and the computer is either doing calculations in the CPU, or doing I/O, but it can't do both at the same time. 

We now replace the processor with one that runs six times faster, but doesn't affect the I/O speed. What will the user CPU time, the wallclock time, and the MFLOP/sec performance be now?

CPU performanceB/CPU performanceA = CPU timeA/CPU timeB
6 = 3/CPU timeB
User CPU Time = .5 seconds

Since the I/O time is unaffected by the performance increase, it still takes 1 second to do I/O. Therefore it takes 1 + .5 = 1.5 seconds to run Program A on the faster CPU
Wallclock Time = 1.5 seconds

System Performance in MFLOPS = 
Number of Floating Point Operations *106/Wallclock Time

Old System Performance (10) = #FLOP * 106/4

#FLOP = 40 * 106
New System Performance = 40 * 106/1.5
MFLOP/sec = 26.667

Question 2. 
You are on the design team for a new processor. The clock of the processor runs at 200 MHz. The following table gives instruction frequencies for Benchmark B, as well as how many cycles the instructions take, for the different classes of instructions. For this problem, we assume that (unlike many of today's computers) the processor only executes one instruction at a time.
Instruction Type
Frequency
Cycles
Loads & Stores
30%
6 cycles
Arithmetic Instructions
50%
4 cycles
All Others
20%
3 cycles

Calculate the CPI for Benchmark B.

If we say that there are 100 instructions, then:
30 of them will be loads and stores.
50 of them will be arithmetic instructions.
20 of them will be all others.
(30 * 6) + (50 * 4) + (20 * 3) = 440 cycles/100 instructions
Therefore, there are 4.4 Cycles per instruction.
The CPU execution time on the benchmark is exactly 11 seconds. What is the ``native MIPS'' processor speed for the benchmark in millions of instructions per second?

The formula for calculating MIPS is: MIPS = Clock rate/(CPI * 106)

The clock rate is 200MHz so...
MIPS = (200 * 106)/(4.4 * 106) = 45.454545
The hardware expert says that if you double the number of registers, the cycle time must be increased by 20%. What would the new clock speed be (in MHz)?
Clock time = 1/Cycle Time
Cycle Time = 1/Clock Time
Cycle Time = 1/(200 * 106) = 5 * 10-9
The cycle time is then increased by 20%:
(5 * 10-9) * 1.2 = 6 * 10-9
The new clock rate is thus:
1/(6 * 10-9) = 166.667 * 106 or 166.667 MHz

>The compiler expert says that if you double the number of registers, then the compiler will generate code that requires only half the number of Loads & Stores. What would the new CPI be on the benchmark?

There were 100 instructions in part b, so we will reduce the number of loads and stores by half, and this will reduce the total number of instructions. So the new instruction mix will be:
15 Loads and Stores
50 Arithmetic Instructions
20 All Others

The total number of instructions is now 85, so the answer is:
((15 * 6) + (50 * 4) + (20 * 3)) / 85 = 350 cycles/ 85 instructions = 4.12 CPI

> How many CPU seconds will the benchmark take if we double the number of registers (taking into account both changes described above)?
CPU seconds = (Number of instructions * Number of Clocks per instructions)/Clock Rate

First thing we need to do, is calculate the number of instructions which execute in 11 seconds on the new benchmark - the one with half the number of loads and stores.

To do this, we will need to figure out how many instructions execute on the original benchmark in 11 seconds. 

Since we know the MIPS or how many Millions of Instructions Per Second for the original benchmark, we say: (45.45 * 106) * 11 = 500 * 106 instructions in 11 seconds

Now we need to figure out how many of those are Loads and Stores so:
(500 * 106) * .3 = 150 * 106 are Load and Store instructions because the chart says that 30% of all instructions are Loads and Stores. 

Now we need to cut this number in half, because the new benchmark says that we have half the number of loads and stores , but the cycle time increases by 20%. Therefore there are only 75 * 106loads and stores. This also means that there are now less total instructions, 425 * 10total instructions.

The final solution is:
((425 * 106) * 4.12)/(166.667 * 106) = 10.548 seconds
 

How to use Cloudera Manager to setup a new cluster. (According to Sean from Cloudera on 2015/07/17)


Deploying nodes from quickstart VM?
https://community.cloudera.com/t5/Apache-Hadoop-Concepts-and/Deploying-nodes-from-quickstart-VM/td-p/29715


If you want to have a virtual cluster, I would strongly recommend just
starting with vanilla Linux VMs, downloading and running the Cloudera
Manager installer on one of them, and building a new cluster. You may be
surprised at how easy it is, once you have the VMs networked together
properly. You would have to reset so much on the QuickStart VM to get it to
incorporate a copy of itself as another node in the cluster - it would
actually be harder than starting from scratch. The QuickStart VM is
designed to "just work" as robustly as possible regardless of how the
virtual network is setup, and that requires that it make some assumptions
that it is just a single node. So be aware that you're going to run into
some issues if you try this, and we do not try to cater to this use case.

Specifically, you're going to run into a lot of networking issues. The VM
has the hostname quickstart.cloudera 'baked' into it. To add another node,
you would need another hostname, and that's going to require changing so
many config files and resetting so many services that you would basically
be starting from scratch anyway. You would also need to be careful with IP
addresses. If another network device is not available early enough in the
boot, the VM will use 127.0.0.1 - which works fine as a single-node, but
that's not how you want machines to refer to themselves in a distributed
system, because as soon as it's resolved elsewhere it's wrong. So you'd
need to make sure the VM had an externally routeable IP (e.g. use bridged
networking, or a similar option) and was rebooted (in my experience, you
have to reboot twice after making the change) in order to have the correct
networking device be available early enough in the boot process. Not to
mention, this is all in theory - I don't know that anyone has successfully
done this. Again - it's so much easier to just install using Cloudera
Manager on top of some new Linux VMs.

Sunday, November 15, 2015

JavaSE8與Lambda共舞 -- 價值5萬NTD的課程....認真準備Java8 OCPJP考試自學吧!

Java是物件導向的程式語言,而物件導向的實踐,成功地拉近人類大腦的思維邏輯與程式開發的邏輯,使得程式開發專案從需求、分析、架構、設計到製作,各階段的鴻溝得以縮到最小,進而降低專案失敗的可能性。
物件導向模擬真實世界的情境,藉由不同功能物件的協同作業,來達成系統特定的功能。然而在某些情況下,物件的思維反而會使得程式開發變得有點煩瑣和累贅,例如提供給視窗程式用的事件處理器(Event Handler)、提供給排序邏輯用的比較子(Comparator)、建構執行緒用的Runnable物件等,如果都按照正統的作法來開發,最精簡的方式大概就只有透過巢狀類別(Nested Class)的寫法了。
舉例來,假設有一個Product類別,其成員包括id(int)name(String)stock(int)unitPrice(double)Product本身並沒有實作Comparable,現在有一群存放在ArrayList裡的Product物件,我們想利用Productname欄位來排序,但又不想額外撰寫比較子(Compatator)類別,其做法可能如下:
ArrayList <Product> products = new ArrayList< >();
products.add (
new Product(5, "Tiger", 500, 380000.6));
products.add (
new Product(1, "Elephant", 200, 290000.5));
products.add (
new Product(6, "Cat", 700, 80000.2));
products.add (
new Product(4, "Impala", 600, 120000.3));
products.add (
new Product(2, "Lion", 100, 450000.9));
products.add (
new Product(3, "Dog", 300, 90000.7));

Collections.sort (products, 
new Comparator <Product> () { 
@Override 
public int compare(Product o1, Product o2) { 
return o1.getName().compareTo(o2.getName()); 
} 
}
); 

for (Product p:products){ 
System.
out.println(p);
}
巢狀類別的寫法,的確讓程式碼變得稍微複雜,可讀性也降低了不少。Java SE 8引進了Lambda Expression的語法,可讓程式碼進一步簡化如下:
ArrayList <Product> products = new ArrayList< > ( );
products.add (
new Product( 5, "Tiger", 500, 380000.6));
products.add (
new Product( 1, "Elephant", 200, 290000.5));
products.add (
new Product( 6, "Cat", 700, 80000.2));
products.add (
new Product( 4, "Impala", 600, 120000.3));
products.add (
new Product( 2, "Lion", 100, 450000.9));
products.add (
new Product( 3, "Dog", 300, 90000.7));

Collections.sort(products, (Product o1, 
Product o2) - > o1.getName( ).compareTo(o2.getName( ))); 
products.stream( ).forEach(
(p) - > {System.out.println(p);});
是不是優雅多了呢?不只比較子,連for-each迴圈都變得更精簡、更直覺了!透過Lambda Expression的寫法,Java程式的語法得以更精簡、更容易上手,可讀性跟維護性當然也跟著大幅提升。

初階
---------
課程目標
本課程將協助您了解Java技術的重要特性,並引導您進入Java技術的殿堂。課程內容以循序漸進的方式,著重於基本的Java語法介紹及物件導向基礎概念,像是封裝、繼承與多型等概念,讓初學者以輕鬆無負擔的方式來學習Java程式語言。課程中以Netbeans來開發Java應用程式,讓您能更熟悉Java程式語言的語法與物件導向概念。
適合對象
  1. 程式設計師、網頁設計人員、欲熟悉Java語言程式開發技術者
  2. 欲開發Android App
  3. 特別為沒有程式開發經驗者所量身訂製,以深入淺出的方式介紹Java語言強大的開發能力
預備知識
  1. 會操作Windows作業系統
  2. 基本邏輯概念
課程內容
  1. Java平台與開發環境介紹
  2. 建立Java主程式類別(Main Class)
  3. 介紹變數
  4. 處理字串資料
  5. .處理數值資料
  6. 處理多重項目資料
  7. 描述物件(Object)與類別(Class)
  8. 資料操作及格式化
  9. 方法建立與使用
  10. 封裝
  11. 決策語法
  12. 使用NetBeans Debugger
  13. 陣列與迴圈
  14. 繼承
  15. 多型
  16. 介紹Lambda Expression
  17. 例外處理
       
進階
---------
課程目標
本課程教您使用最新的Java SE 8來撰寫Java程式,課程內容涵蓋物件導向的概念以及Java進階語法,像是Lambda Expression、Design Pattern、JDBC連結資料庫、多執行序以及Fork-Join框架等與設計觀念,並為已熟悉電腦語言、程式設計的開發人員介紹如何利用Java語言及環境來開發Application。
適合對象
  1. 專門為已熟悉電腦語言程式設計的開發人員介紹如何開發Java Application
  2. 欲開發Android App
  3. 適合想要了解Java程式語言的開發人員
預備知識
  1. 已熟悉電腦語言程式設計
  2. 會操作Windows作業系統
  3. Java程式語言基礎語法
  4. 資料庫基礎概念
  5. 基礎SQL語法
課程內容
  1. Java平台與開發環境介紹
  2. 回顧Java語法及類別
  3. 封裝與繼承 (ok, 2015/Aug)
  4. Overriding Method  (ok, 2015/Aug)
  5. 多型 (ok, 2015/Aug)
  6. static (ok, 2015/Aug)
  7. Singleton Design Pattern  (NEED TO STUDY MORE)
  8. 抽象類別、介面、巢狀類別  (ok, 2015/Oct)
  9. Lambda Expressions  (ok, 2015/Oct)
  10. 集合(Collections)與泛型(Generics) (NEED TO STUDY MORE)
  11. Builder Design Pattern (NEED TO STUDY MORE)
  12. 使用Lambda Expression走訪集合物件  (ok, 2015/Nov)
  13. 使用Lambda Expression過濾集合物件  (ok, 2015/Nov)
  14. Lambda內建的Funtional Interface  (ok, 2015/Nov)
  15. 使用Lambda操作資料 (ok, 2015/Nov)
  16. Exceptions 及 Assertions (ok, 2015/Oct)
  17. Java Date/Time API (NEED TO STUDY MORE)
  18. I/O基礎及NIO.2 (Ok for file reader/buffered reader ... etc, need to browse detailed exam subjects)
  19. 並行(Concurrency)及Fork-Join框架 (NEED TO STUDY MORE ABOUT CONCEPT)
  20. Parallel Streams (NEED TO STUDY MORE ABOUT CONCEPT BEFORE PROGRAMMING)
  21. JDBC (NEED TO STUDY MORE , Especially for stored procedure)
  22. Localization (NEED TO STUDY MORE ABOUT CONCEPT)

Thursday, November 5, 2015

CCP : DS700/ DS701/DS702 Data Scientist Exams

Required Exams
  • DS700 – Descriptive and Inferential Statistics on Big Data
  • DS701 – Advanced Analytical Techniques on Big Data
  • DS702 - Machine Learning at Scale
Each exam may be taken in any order. All three exams must be passed within 365 days of each other. Candidates who fail an exam must wait a period of thirty calendar days, beginning the day after the failed attempt, before they may retake the same exam. Candidates must pay for each exam attempt.

Each passed exam is verifiable in your exam transcript and history.
Exam Format
Each exam is a single challenge scenario. You are provided access to the scenario, the data sets, and the cluster. You are given eight (8) hours to complete the challenge. See below for more information on the cluster.


Required Skills
Common Skills (all exams)
  • Extract relevant features from a large dataset that may contain bad records, partial records, errors, or other forms of “noise”
  • Extract features from a data stored in a wide range of possible formats, including JSON, XML, raw text logs, industry-specific encodings, and graph link data
DS700 - Descriptive and Inferential Statistics on Big Data
  • Use statistical tests to determine confidence for a hypothesis
  • Calculate common summary statistics, such as mean, variance, and counts
  • Fit a distribution to a dataset and use that distribution to predict event likelihoods
  • Perform complex statistical calculations on a large dataset
DS701 - Advanced Analytical Techniques on Big Data
  • Build a model that contains relevant features from a large dataset
  • Define relevant data groupings, including number, size, and characteristics
  • Assign data records from a large dataset into a defined set of data groupings
  • Evaluate goodness of fit for a given set of data groupings and a dataset
  • Apply advanced analytical techniques, such as network graph analysis or outlier detection
DS702 - Machine Learning at Scale
  • Build a model that contains relevant features from a large dataset
  • Predict labels for an unlabeled dataset using a labeled dataset for reference
  • Select a classification algorithm that is appropriate for the given dataset
  • Tune algorithm metaparameters to maximize algorithm performance
  • Use validation techniques to determine the successfulness of a given algorithm for the given dataset


Exam Delivery and Cluster Information
All CCP: Data Scientist exams are remote-proctored and available anywhere, anytime. See the FAQ for more information and system requirements.
Exams are hands-on, practical exams using data science tools on Cloudera technologies. Each user is given their own 7-node, high-performance CDH5 (currently 5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, NetBeans, scikit-learn, octave, NumPy, SciPy, Anaconda, R, plyr, dplyrimpaladb, SparkML, vowpal wabbit, clouderML, oryx, impyla, CoreNLP, The Stanford Parser: A statistical parser, Stanford Log-linear Part-Of-Speech Tagger, Stanford Named Entity Recognizer (NER), Stanford Word Segmenter, opennlp, H2O, java-ml, RapidMiner, caffe, Weka, NLTK, matplotlib, ggplot, d3py, SparkingPandas, randomforest, R: ggplot2, Sparkling water. The cluster is open and candidates are allowed to install any tool they wish during the exam window.
Currently, the cluster is open to the internet and there are no restrictions on tools you can install or websites or resources you may use.
CCP:DS Solution Kit