In this experiment, we found the latency incurred in measuring CPU cycles. This measurement of time overhead has to be subtracted from the forthcoming experiments.
Methodology:
We have used the RDTSC – RDTSCP instruction along with CPUID (serialize the instruction pipeline). This function fetches the count of CPU clock cycles. After taking the elapsed clock cycles between two successive calls to RDTSC instruction, we can divide it by the CPU clock frequency to get the absolute time. We have run this operation for 50,000 iterations and found the minimum, maximum, average and steady point(mode). We are convinced that “Steady Point” is the best method to measure a quantity because this excludes the outliers and variations that occur in the system. An alternate method is to exclude the 5 minimum and maximum values and compute the average of the remaining values. …show more content…
To measure the loop overhead, we have enclosed a for (j = 0; j < 1; j++) loop inside a averaging loop of 500,00 times and summarized the results.
Predictions:
Reading time includes (1) allocating register, (2) copy the clock cycles to register, (3) store in a variable (4) to and from jumps and this repeats for measuring start and end time, so twice the number of instructions. Assuming 1 clock cycle / instruction, our base hardware estimate was 5 clock cycles. So to measure start and end time, it will be 10 clock cycles.
Software estimate includes validation of asm routine, assembling instructions, protection checks and register checks. Mathematically we computed that these will add to 6 clock cycles. So to read start and end will take 12 clock cycles.