Performance Tuning

Lab 6 - Benchmarks & Performance Tuning

Introduction

This lab is designed to increase your understanding of benchmark design, low-level performance measurement and experimental research of computer architecture.
Using an existing kernel benchmark that performs matrix multiplication, your task is to create three additional versions of the matrix multiplication function and improve its performance as much as you can. Then, using the experimental results you get from running your code you will prepare a brief (approx. 2 or more pages) write-up of your results.
This lab is to be completed individually, not in teams.

Lab Steps

1. Login to helix or felix (not tanner) and change the current directory to csc8400.

2. Copy two files (mm.c & Makefile) from the directory /mnt/a/mdamian/systems/profile into your current directory, by typing in /mnt/a/mdamian/systems/installprof

A directory profile containing the two lab files will be created in your current directory.

3. Use the Makefile to compile four versions of the mm.c program, by typing in make

4. Run each and compare the performance results for each version, by typing in make run

5. Create three additional versions of the matrix multiplication function with the goal of improving performance over the original version, and repeat the experiments to gather performance data.

6. Calculate speedup (original execution time divided by improved execution time) for each of your three new versions as compared to the original. Do this for each of the four compilation methods (no optimization, -O, -O2 and –O3).

What to hand in

On the due date, please include all of the following in a single MS Word document, submitted via email:

Report - a brief (approx. 2 pages) report of your finding, including:

o your name and due date

o description of the approach you used in your experiments

o name of machine (or machines) you performed your experiments on

o table of results: version of “mm”, compiler flags used (or not), average execution time, speedup vs. original

o graph of the results that clearly shows the differences in speedup

o Any assumptions you made, problems you encountered, unexpected observations you made

Source Code - your complete source code and one or more examples of test runs of your program

Grading criteria

Only completed write-ups and correct code, including all of the items listed in “What to hand in”, will be eligible for any points. Incomplete write-ups will not earn any points.
If your best new version of “mm” achieves a speedup of 1.25 or more over the original, you can earn up to 85 points.
If your best new version of “mm” achieves a speedup of 1.33 or more over the original, you can earn up to 90 points.
If your best new version of “mm” achieves a speedup of 1.50 or more over the original, you can earn up to 95 points.
If your best new version of “mm” achieves a speedup of 2.50 or more over the original, you can earn up to 100 points.
Up to 15 bonus points will be rewarded (if needed... maximum score is 100 for the assignment) for an insightful explanation of the effect on speedup of the compiler optimization flags, including both expected and unexpected observations. Of particular value is any description of the effect of the optimization on the generated assembly language code.

Notes

This workshop is more about approach rather than an arriving at an absolutely precise answer. In fact, depending on the approach you choose and the assumptions you make, your results could be quite different from others in the class. That is perfectly okay!
Note that the code you submit must be your own. Please limit discussions of this workshop with fellow students to general concepts, approaches and techniques, rather than to the specific details of a solution.