CSC 8400-002, Examination E1, February 24, 1999

CSC 8400-002

Examination E1

October 12, 2000

PROBLEM 1 [15%] Assume a certain modification in the machine speeds up the execution of integer ALU instructions by a factor of 2, slows down the execution of floating point instructions by a factor of 3, and does not affect the execution time of any other instructions. What would be the speedup from this modification if the machine spends 50% of the time executing integer ALU instructions and 10% of the time executing floating point instructions?

PROBLEM 2 [15%] Assume your workload consists of three programs: P1, P2 and P3. Every day you run P1 100 times, P2 80 times and P1 20 times. You have to decide which of the two machines - machine A or machine B - to purchase. Here are the results of running the three programs on the two machines:

P1 took 2 seconds on A and 10 seconds on B;

P2 took 10 seconds on A and 1 second on B;

P3 took 100 seconds on A and 80 seconds on B.

What are the weighted execution times (weighted arithmetic means) for your workload on A and B? Which of the machines is faster for your workload and how many times faster?

PROBLEM 3 [6%] How many explicit operands (including source and destination operands) would the ADD instruction have in the assembly languages for the following architectures:

Accumulator

Stack

PROBLEM 4 [6%] If we compare memory/memory and register/register GPR architectures,

Which of them would generally have higher instruction counts?

Which of them would have longer average instruction execution time?

PROBLEM 5 [14%] We are comparing two pipelined machines: M1 and M2. The pipe stage for both machines is 1 clock cycle, and its length is 15 ns for M1 and 10 ns for M2.

M1 has no stalls, while M2 has (only) control hazard stalls. In particular, in M2 every taken branch causes a 1-cycle stall, and every branch that is not taken causes a 3-cycle stall.

We also know that 20% of all instructions are branches and 70% of the branches are taken.

a) What are the CPIs for the two machines?

Which of the two machines is faster and how many times faster?

PROBLEM 6 [14%] Consider the following DLX' code:

SW R1, 100(R2)

LW R2, 100(R1)

LW R1, 300(R0)

ADD R3, R3, R1

SUB R5, R1, R3

SW R6, 200(R5)

ADD R6, R5, R6

How many clock cycles would this code take in the pipeline of Figure 3.4 if the pipeline uses data forwarding whenever possible?

How many clock cycles would this code take if no forwarding is used?

PROBLEM 7 [15%] Explain what the WAW data hazard is and why it never happens in our DLX' pipeline of Figure 3.4

PROBLEM 8 [15%] Consider the following DLX' code:

ADDI R1, R0, #20

ADD R2, R1, R0

BNEZ R1, Target

ADDI R1, R1, #5

ADD R2, R1, R1

LW R1, 0(R1)

Target: ADD R1, R2,R1

What will be the final value of R1 if the machine on which this code runs does not use the delayed branch scheme?
What would be the final value of R1 if the machine uses the delayed branch scheme with a branch delay of length 1?
What would be the final value of R3 if the machine uses the delayed branch scheme with a branch delay of length 2?