Lab 6 - Benchmarks &
Performance Tuning
Introduction
- This lab is designed to increase
your understanding of benchmark design, low-level performance measurement
and experimental research of computer architecture.
- Using
an existing kernel benchmark that performs matrix multiplication, your
task is to create three additional versions of the matrix multiplication
function and improve its performance as much as you can. Then, using the
experimental results you get from running your code you will prepare a
brief (approx. 2 or more pages) write-up of your results.
- This lab is to be completed individually,
not in teams.
Lab Steps
1.
Login
to helix or felix
(not tanner) and change the current directory to csc8400.
2.
Copy
two files (mm.c & Makefile)
from the directory /mnt/a/mdamian/systems/profile into your current directory, by
typing in /mnt/a/mdamian/systems/installprof
A directory profile containing the two lab files will be
created in your current directory.
3.
Use
the Makefile to compile four versions of the mm.c program, by typing in make
4.
Run
each and compare the performance results for each version, by typing in make run
5.
Create
three additional versions of the matrix multiplication function with the goal
of improving performance over the original version, and repeat the experiments
to gather performance data.
6.
Calculate
speedup (original execution time divided by improved execution time) for each
of your three new versions as compared to the original. Do this for each of the
four compilation methods (no optimization, -O, -O2 and –O3).
What to hand in
On the due date, please
include all of the following in a single MS Word document, submitted via email:
- Report - a
brief (approx. 2 pages) report of your finding, including:
o
your name and due
date
o
description of
the approach you used in your experiments
o
name of machine
(or machines) you performed your experiments on
o
table of
results: version of ÒmmÓ, compiler
flags used (or not), average execution time, speedup vs. original
o
graph of the
results that clearly shows the differences in speedup
o
Any assumptions
you made, problems you encountered, unexpected observations you made
- Source Code - your complete source code and one or more examples of test runs
of your program
Grading criteria
- Only completed write-ups and
correct code, including all of the items listed in ÒWhat to hand inÓ, will
be eligible for any points. Incomplete write-ups will not earn any
points.
- If your best new version of ÒmmÓ
achieves a speedup of 1.25 or more over the original, you can earn up to 85 points.
- If your best new version of ÒmmÓ
achieves a speedup of 1.33 or more over the original, you can earn up to 90 points.
- If your best new version of ÒmmÓ
achieves a speedup of 1.50 or more over the original, you can earn up to 95 points.
- If your best new version of ÒmmÓ
achieves a speedup of 2.50 or more over the original, you can earn up to 100 points.
- Up to 15 bonus points will be rewarded (if needed... maximum score
is 100 for the assignment) for an insightful explanation of the effect on
speedup of the compiler optimization flags, including both expected and
unexpected observations. Of
particular value is any description of the effect of the optimization on
the generated assembly language code.
Notes
- This workshop is more about approach
rather than an arriving at an absolutely precise answer. In fact, depending on the approach
you choose and the assumptions you make, your results could be quite
different from others in the class.
That is perfectly okay!
- Note that the code you submit must be your own. Please limit discussions of this workshop with fellow students to general concepts, approaches and
techniques, rather than to the specific details of a solution.