https://cseweb.ucsd.edu/classes/sp97/cse141/hw1a.html
Question 1.
Suppose that when Program A is run, the user CPU
time is 3 seconds, the elapsed wallclock time is 4 seconds, and the system
performance is 10 MFLOP/sec.
Assume that there are no other processes taking
any significant amount of time, and the computer is either doing calculations
in the CPU, or doing I/O, but it can't do both at the same time.
We now replace
the processor with one that runs six times faster, but doesn't affect the I/O
speed. What will the user CPU time, the wallclock time, and the MFLOP/sec performance
be now?
CPU performanceB/CPU performanceA =
CPU timeA/CPU timeB
6 = 3/CPU timeB
User CPU Time = .5 seconds
Since the
I/O time is unaffected by the performance increase, it still takes 1 second to
do I/O. Therefore it takes 1 + .5 = 1.5 seconds to run Program A on the faster
CPU
Wallclock Time = 1.5 seconds
System
Performance in MFLOPS =
Number of Floating Point Operations *106/Wallclock
Time
Old System Performance (10) = #FLOP * 106/4
#FLOP =
40 * 106
New
System Performance = 40 * 106/1.5
MFLOP/sec = 26.667
Question
2.
You are on the design team for a new processor. The clock of the processor runs
at 200 MHz. The following table gives instruction frequencies for Benchmark B,
as well as how many cycles the instructions take, for the different classes of
instructions. For this problem, we assume that (unlike many of today's
computers) the processor only executes one instruction at a time.
Instruction Type
|
Frequency
|
Cycles
|
Loads & Stores
|
30%
|
6 cycles
|
Arithmetic Instructions
|
50%
|
4 cycles
|
All Others
|
20%
|
3 cycles
|
Calculate the CPI for
Benchmark B.
If we say that there are
100 instructions, then:
30 of them will be loads and stores.
50 of them will be arithmetic instructions.
20 of them will be all others.
(30 * 6) + (50 * 4) + (20 * 3) = 440 cycles/100 instructions
Therefore, there are 4.4 Cycles per instruction.
The CPU execution time on
the benchmark is exactly 11 seconds. What is the ``native MIPS'' processor
speed for the benchmark in millions of instructions per second?
The
formula for calculating MIPS is: MIPS =
Clock rate/(CPI * 106)
The clock
rate is 200MHz so...
MIPS = (200 * 106)/(4.4 * 106) = 45.454545
The hardware expert says
that if you double the number of registers, the cycle time must be increased by
20%. What would the new clock speed be (in MHz)?
Clock
time = 1/Cycle Time
Cycle Time = 1/Clock Time
Cycle Time = 1/(200 * 106) = 5 * 10-9
The cycle time is then increased by 20%:
(5 * 10-9) * 1.2 = 6 * 10-9
The new clock rate is thus:
1/(6 * 10-9) = 166.667 * 106 or 166.667 MHz
>The compiler expert says
that if you double the number of registers, then the compiler will generate
code that requires only half the number of Loads & Stores. What would the
new CPI be on the benchmark?
There
were 100 instructions in part b, so we will reduce the number of loads and
stores by half, and this will reduce the total number of instructions. So the
new instruction mix will be:
15 Loads and Stores
50 Arithmetic Instructions
20 All Others
The total number of instructions is now 85, so the answer is:
((15 * 6) + (50 * 4) + (20 * 3)) / 85 = 350 cycles/ 85 instructions = 4.12 CPI
> How many CPU seconds
will the benchmark take if we double the number of registers (taking into
account both changes described above)?
CPU seconds
= (Number of instructions * Number of Clocks per instructions)/Clock Rate
First
thing we need to do, is calculate the number of instructions which execute in
11 seconds on the new benchmark - the one with half the number of loads and
stores.
To do
this, we will need to figure out how many instructions execute on the original
benchmark in 11 seconds.
Since we know the MIPS or how many Millions of
Instructions Per Second for the original benchmark, we say: (45.45 * 106) * 11 = 500 * 106 instructions in
11 seconds
Now we
need to figure out how many of those are Loads and Stores so:
(500 * 106) * .3 = 150 * 106 are Load
and Store instructions because the chart says that 30% of all instructions are
Loads and Stores.
Now we need to cut this number in half, because the new
benchmark says that we have half the number of loads and stores , but the cycle
time increases by 20%. Therefore there are only 75 * 106loads
and stores. This also means that there are now less total instructions, 425
* 106 total instructions.
The final
solution is:
((425 * 106) * 4.12)/(166.667 * 106) = 10.548
seconds