Dbms Essay

12972 Words Jul 8th, 2011 52 Pages
Notes on DBMS Internals
Neil Conway neilc@samurai.com November 10, 2003

These notes were originally written for my own use while taking CISC-432, a course in DBMS design and implementation at Queen’s University. The textbook used by that class is Database Management Systems, 3rd Edition by Raghu Ramakrishnan and Johannes Gehkre; some of the material below may be specific to that text. This document is provided in the hope that it is useful, but I can’t provide any assurance that any information it contains is in any way accurate or complete. Corrections or additions are welcome. Distribution Terms: This document is released into the public domain.

Query Evaluation
External Sorting
• A DBMS frequently needs to sort data
…show more content…
That means that we create N sorted runs of B pages each. B – In the merge passes, rather than merging two runs at a time, we can merge B − 1 runs simultaneously: by reading the first page of each sorted run into a buffer and then picking the smallest of these values, writing it to the output buffer, and then repeating the process. This means that merging takes logB−1 N passes. B – Since there is 1 additional initial pass and each pass requires 2N I/Os, the total cost of the merge sort is: 2N ∗ (1 + logB−1 N ) B • Variations On Merge Sort: replacement sort: If the initial pass can produce longer runs, this may be good, even if it results in using a sorting algorithm that is not efficient (i.e. we are willing to trade increased CPU time for fewer I/Os). A “replacement sort” achieves this: slower sort, longer initial runs (twice the length of the runs produced by a traditional sorting algorithm, on average). double buffering: Since the dominant factor that determines the performance of an external sort is the number of disk I/Os that need to be performed, we can trade some buffer space to improve performance by prefetching data into some buffers before it is actually required (and predicting what data is needed next is usually pretty easy). This means that the main sort process doesn’t need to block waiting for I/O. The trade off is that there are fewer

Related Documents