CSC 8400-001

Examination E1

October 24, 2000

PROBLEM 1 [20%] Assume that we make an enhancement to a computer that improves a certain mode M of execution.

a) Suppose M is used 10% of the time. What is the limit for the overall speedup that can be achieved by making M faster (and faster)?

b) Suppose, again, that M is used 10% of the time, and the enhancement improves M by a factor of 20. What is the speedup we have obtained from the enhancement?

c) Assume the enhancement improves M by a factor of 10 and M is used 5% of the time, measured as a percentage of the execution time when the enhanced mode is in use. What percentage of the original execution time has been converted to fast mode?

 

 

PROBLEM 2 [20%] Consider the following fragment of C code:

For (i=1; i<=10; i++)

{A[i] = B[i] + 5 + C;}

Assume that A and B are arrays of 32-bit integers, and C and i are 32-bit integers. A, B and C are in memory at the addresses 100, 5000 and 8000, respectively (the address of an array is the address of the first element of it).

a) Write the code for DLX'. Your code should be as short as possible.

b) What is the size of your code in bytes?

c) What is the instruction count for your code?

d) How many memory-data references will be executed?

e) What will be the total memory traffic, i.e. how many (instruction and/or data) bytes will be transferred to and/or from memory while the code runs?

 

PROBLEM 3 [20%] Assume for our benchmark for DLX' we have the following instruction frequencies:

We also know the following:

  1. What would be the CPI for the non-pipelined machine of Figure 3.1?
  2. What would be the CPI for the pipelined machine of Figure 3.4, assuming that the latter uses forwarding whenever possible, and does not use any branch penalty reduction schemes?
  3. What would be the CPI for the pipelined machine of Figure 3.4, assuming that the latter uses forwarding whenever possible, and also uses the "predict not taken" scheme?

 

PROBLEM 4 [20%] List as many different ways/methods as you remember for eliminating or reducing stalls caused by:

  1. Structural hazards
  2. Data hazards
  3. Control hazards.

 

 

PROBLEM 5[20%] The following three codes (where "code fragment 1", "code fragment 2" and "code fragment 3" stand for some instruction sequences that do not contain control instructions) have been written for a machine that does not use delayed branch. Translate them into equivalent codes for a machine that uses delayed branch with the delay length of 1. You are not allowed to use "nop", so the delay slot should be filled with something useful, applying one of the three methods: "from before", "from target" or "from fall through". In each case only one of these three methods is (safely) applicable, and you should indicate which of them you applied.

a)

[code fragment 1]

Instruction A: ADD R1, R2, R2

Instruction B: BNEZ R1, Instruction D

Instruction C: ADD R3, R3, R3

[code fragment 2]

Instruction D: ADD R3, R1, R1

Instruction E: SW R3, 100(R0)

[code fragment 3]

 

b)

[code fragment 1]

Instruction A: ADD R1, R2, R2

Instruction B: BNEZ R1, Instruction D

Instruction C: ADD R3, R1, R1

[code fragment 2]

Instruction D: ADD R3, R3, R3

Instruction E: SW R3, 100(R0)

[code fragment 3]

c)

[code fragment 1]

Instruction A: ADD R1, R2, R2

Instruction B: BNEZ R2, Instruction D

Instruction C: ADD R3, R3, R3

[code fragment 2]

Instruction D: ADD R3, R3, R3

Instruction E: SW R3, 100(R0)

[code fragment 3]