CSC 8400-002

Examination E1

February 23, 2000

PROBLEM 1 [12%]

In this problem, assume that we are considering enhancing a machine by adding a vector mode to it (you don't need to know anything about how vectors work to answer this question). When a computation is run in vector mode it is 10 times faster than the normal mode of execution. We call the percentage of time that could be spent using vector mode the *percentage of vectorization*.

- If the percentage of vectorization is 50%, what is the speedup from the enhancement?
- What percentage of vectorization is needed to achieve a speedup of 5?

**PROBLEM** **2 [12%]**

Assume your workload consists of two programs: P1 and P2. P1 is run19 times more frequently than P1. You have to decide which of the two machines - machine A or machine B - to purchase. Here are the results of running the two programs on the two machines:

P1 took 2 seconds on A and 25 seconds on B;

P2 took 3 seconds on A and 10 seconds on B.

What are the weighted execution times (weighted arithmetic means) for your workload on A and B? Which of the machines is faster for your workload and how many times faster?

**PROBLEM** **3 [6%]**

Explain the difference between the Big Endian and Little Endian conventions.

PROBLEM 4 [10%]

Translate the following high-level language code fragment into the assembly language of a a) stack and b) accumulator architecture:

A=B-C;

C=A+B;

PROBLEM 5 [20%]

Consider the following DLX' code:

SW R1, 100(R2)

LW R2, 100(R1)

LW R2, 300(R2)

ADD R3, R3, R2

SUB R5, R2, R3

SW R6, 200(R5)

- How many clock cycles would this code take in the pipeline of Figure 3.4 if the pipeline uses data forwarding whenever possible?
- How many clock cycles would this code take if no forwarding is used?

PROBLEM 6 [20%]

Assume the following instruction frequencies for your benchmark DLX' program:

ADD - 10%

ADDI - 5%

SUB - 5%

LW - 30%

SW - 10%

BEQZ - 20%

BNEZ - 20%

- What is the CPI for this program on the non-pipelined machine of Figure 3.1?
- Assume we pipeline the machine of Figure 3.1 in such a way that:

- There are 5 pipe stages
- Every pipe stage is 1 cycle
- The clock cycle of the pipelined machine is 1.1 times the clock cycle of the non-pipelined machine
- Every branch instruction causes a 1-cycle stall
- 50% of the load instructions also cause a 1-cycle stall
- There are no other stalls.

- What is the CPI for the pipelined machine?
- What is the speedup from pipelining (from pipelining the machine of Figure 3.1)?

PROBLEM 7 [20%]

Consider the following DLX' code:

ADDI R1, R0, #20

SW R1, 20(R0)

LW R2, 0(R1)

BNEZ R2, Target

ADDI R1, R1, #5

ADD R2, R3, R3

LW R3, 0(R3)

Target: ADD R3, R2,R1

- What will be the final value of R3 if the machine on which this code runs does not use the delayed branch scheme? - If you think this value is impossible to determine, explain why.
- What would be the final value of R3 if the machine uses the delayed branch scheme with a branch delay of length 1? - If you think this value is impossible to determine, explain why.
- What would be the final value of R3 if the machine uses the delayed branch scheme with a branch delay of length 2? - If you think this value is impossible to determine, explain why.