Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
22 Cards in this Set
- Front
- Back
Block
|
The amount of data read or written in one I/O operation.
|
|
Page
|
Same as a block
|
|
Blocking factor
|
The number of physical records per block.
|
|
Inputs of physical design phase
|
Logical (implementation) model
Documentation/Definitions DBMS Characteristics Response time requirements Security, Backup, Recovery, Retention, Integrity requirements |
|
Outputs of physical design phase
|
Produces a Description of the Implementation of the Database on Secondary Storage.
Describes the storage structures and access methods used to achieve efficient access to the data. |
|
Primary & Secondary index differences
|
Primary: An ordered, fixed-length file that stores the primary key and a pointer.
Each record in the index file corresponds to each block in the ordered data file. Sparse / nondense index Secondary: A secondary index is an ordered file that stores the nonordering field of a data file and a pointer. If the indexing field is a secondary or candidate key, then the index is dense. If the indexing field is a nonkey field, then the index is sparse. Introduces the idea of adding a level of indirection |
|
Clustered vs. Unclustered indexes
|
If order of data records is the same as, or `close to’, order of data entries, then called clustered index.
A file can be clustered on at most one search key. Cost of retrieving data records through index varies greatly based on whether index is clustered or not! |
|
Describe hash based indexes
|
Hash function calculates the address of the page on which the record is stored.
The field that’s used in the hash function is called the hash field. Called a hash key if the hash field is also a key field. Good for equality searches Not good for range searches |
|
Describe B-tree Indexes
|
If the root is not a leaf node, it must have at least two children.
For a tree of order n, each node (other than root/leaf nodes) must have between n/2 and n pointers and children. … (other rules) Tree must always be balanced Every path from the root to a leaf must have same length. This is what matters. This means that it always takes about the same time to access any record. Leaf pages contain data entries, and are chained (prev & next) Non-leaf pages contain index entries and direct searches: |
|
Tradeoffs in deciding which indexes to create
|
Indexes can make queries go faster, updates slower but require disk space, too.
|
|
Index selection guidelines
|
Attributes in WHERE clause are candidates for index keys.
Exact match condition suggests hash index. Range query suggests tree index. Clustering is especially useful for range queries; can also help on equality queries if there are many duplicates. Multi-attribute search keys should be considered when a WHERE clause contains several conditions. Order of attributes is important for range queries. Such indexes can sometimes enable index-only strategies for important queries. For index-only strategies, clustering is not important! Try to choose indexes that benefit as many queries as possible. Since only one index can be clustered per relation, choose it based on important queries that would benefit the most from clustering. |
|
Rules of thumb for indexes
|
Index primary key fields
DBMS may do this automatically Composite keys require composite indexes Index foreign key fields Index other fields frequently used in: WHERE clauses GROUP BY clauses ORDER BY clauses Consider composite indexes when fields are used together in conditions |
|
Advantages/Disadvantages of Clustered Indexes
|
Advantages:
Efficient storage; faster searches than heap Faster inserts/deletes than sorted file Disadvantages: Only one clustering factor per file |
|
Advantages/Disadvantages of Unclustered tree indexes
|
Advantages:
Fast searches, fast inserts/deletes Disadvantages: Slow scans and range searches with many matches |
|
Advantages/Disadvantages of Hashed indexes
|
Advantages:
Same as unclustered, but faster on equality searches Disadvantages: Does not support range searches |
|
Four steps of transaction analysis
|
1. Create xact/table cross-reference and/or usage map.
2. Determine which tables are most frequently accessed by xacts 3. Analyze usage data for selected xacts that involve the most frequently-used tables. 4. Design to improve performance of key transactions (indexes) |
|
Which transactions should be analyzed?
|
May not be able to analyze all xacts
Focus on “important” transactions 80/20 rule 80% of data access comes from 20% of xacts Also consider high-priority xacts |
|
What information is contained in a cross reference table
|
Transactions and relations and the operations performed for each transaction
|
|
What information should you consider when performing a detailed analysis of a transaction
|
Tables and attributes used
Type of access (select, update) Avoid indexes for update xacts Attributes used in conditional statements Range vs. equality (cluster vs. hash index) Attributes used in joins Attributes used in order/group by Expected frequency of query Performance goals: High freq/priority queries are priority candidates for indexes. |
|
What is denormalization, why would it be considered
|
Introduce additional redundancy
Improves performance May lead to anomolies |
|
What is a view? Why are views used?
|
View
Virtual table based on base tables in the DB Only the definition really exists in the DB Provide a limited “view” of the DB “Pre-do” joins Limit tables Limit columns Way to implement logical independence Used for security |
|
Name and briefly describe four design goals for distributed DBMS'
|
Local transparency
users don’t need to know where data is stored Replication transparency user can treat data as if it is stored at only one site, even if it is replicated at several sites Failure transparency a transaction is committed or not committed commit protocol Concurrency transparency it appears to the user as their transaction is the only activity on the system |