Notes on DBMS Internals
Neil Conway email@example.com November 10, 2003
These notes were originally written for my own use while taking CISC-432, a course in DBMS design and implementation at Queen’s University. The textbook used by that class is Database Management Systems, 3rd Edition by Raghu Ramakrishnan and Johannes Gehkre; some of the material below may be speciﬁc to that text. This document is provided in the hope that it is useful, but I can’t provide any assurance that any information it contains is in any way accurate or complete. Corrections or additions are welcome. Distribution Terms: This document is released into the public domain.
• A DBMS frequently needs to sort data
…show more content…
That means that we create N sorted runs of B pages each. B – In the merge passes, rather than merging two runs at a time, we can merge B − 1 runs simultaneously: by reading the ﬁrst page of each sorted run into a buﬀer and then picking the smallest of these values, writing it to the output buﬀer, and then repeating the process. This means that merging takes logB−1 N passes. B – Since there is 1 additional initial pass and each pass requires 2N I/Os, the total cost of the merge sort is: 2N ∗ (1 + logB−1 N ) B • Variations On Merge Sort: replacement sort: If the initial pass can produce longer runs, this may be good, even if it results in using a sorting algorithm that is not eﬃcient (i.e. we are willing to trade increased CPU time for fewer I/Os). A “replacement sort” achieves this: slower sort, longer initial runs (twice the length of the runs produced by a traditional sorting algorithm, on average). double buﬀering: Since the dominant factor that determines the performance of an external sort is the number of disk I/Os that need to be performed, we can trade some buﬀer space to improve performance by prefetching data into some buﬀers before it is actually required (and predicting what data is needed next is usually pretty easy). This means that the main sort process doesn’t need to block waiting for I/O. The trade oﬀ is that there are fewer