In order to examine our oringinal question - transfer rate of files over the internel - we decided to collect the following data:
It was reasonably clear that many factors would influence the transfer rate of files, but we decided that these were ones that we could reasonably control or measure.
So that we could record replecations of data for hours, locations, and files, we selected four sites and four files of similar size at each location. We selected graphics images by M. C. Escher because they were readily available at many locations. We decided to use the following locations:
We settled on these locations after a few priliminary trials at several locations. These locations all had a fairly large selection of M.C. Escher graphics. Our trials at downloading files from locations in Europe indicatied that times would be prohibative. Illinois was chosen for its computer resources, Pittsburgh because it was fairly close, Utah as a distant site in the U.S., and the National Gallery as one site that we would suspect lacked large computing facilities.
Files were downloaded at four different times of the day; 8 A.M., 12 Noon, 4 P.M., and 8 P.M. These times were selected primarily because of the workshop schedule of the team members. It became evident using experimental data analysis that a collection of dowload times during many other times of day would be needed if we were to do any reasonable regression. This would not be possible under the time constraints of this project.
At various times, either one, two, three, or four people would download files from the same site. Four or five files at each location were used for all trials. The download time was then calculated by selecting the file and measuring the number of seconds it took for the computer to complete the transfer. All sites did not contain the same files, but files of similar size were used.
We made up several copies of a form which listed the site location, name of the files, time of day, file size, download time, and number of team members accessing that site. Each member of the team then collected data from several sites at various times over the next few days. The measurement of download time was to simply click on the file name or its image and record the number of seconds until the transfer was complete. The data was then loaded into a Minitab worksheet for manipulation and analysis.
It was assumed early on that the rate of transfer would be a more consistent measure than total transfer time. We created a new variable called Kb/sec. by dividing the file size measured in kilobytes by the number of seconds to download.
A first step in analysing the data was to group the data by site and do side-by-side box plots of transfers rates using all sites at the four times of day.
A second set of box plots would examine the same information by site.
We had expected that transfer time of files may be exponentially related to file size. The analysis did not bear this out. A simple linear regression model of download time versus file size for the University of Illinois data did show a significant relationship between these variables.
Another factor that we assumed would be a measurable factor was the number of team members accesssing the same site. This had only a negligable effect.
In our data, the major factor in determining transfer time and transfer rate appears to be time of day. Transfer rates are reasonably high early in the morning, drop off drastically as the day progresses, bottoming out in the late afernoon, and increasing again toward evening. Since our data collection was restricted to only four times during the day due to the workshop schedule, data during intermediate times are missing. We cannot verify these conclusions with any certainty. Collection of data at other times would be a direction for further study.
Return to STATS Project Reports Index