Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
83 Cards in this Set
- Front
- Back
static hashing |
each bucket is one page number of buckets are fixed if bucket is filled, add overflow page |
|
dynamic hashing |
split bucket when full, rehash single split buckets entries between old and new -use directory of pointers may need to double directory if no bucket chains available |
|
dynamic hashing: find the bucket for k |
take the last global depth # bits of h(k) |
|
dynamic hashing: find the value in the bucket |
take the last local depth # bits of h(k) |
|
when do you increase the local depth of a bucket? |
when you split a bucket because the bucket is full, the new global depth = old global depth + 1 |
|
when do you increase the global depth of the directory? |
when you double the directory of pointers |
|
renaming can be done in two ways: |
by position by name |
|
set difference (R-S) |
all tuples in R that are not also in S |
|
cross product (RxS) |
in Result for all r in R, s in S |
|
set division (R/S) |
= { x | for all y in S, (x, y) is in R } "for all" queries |
|
Natural Join (R |X| S) |
– R and S must have at least one field with same domain |
|
cross product: what do you do if there are overlapping fields? |
Rename overlapping fields (can’t have duplicate field names in the result rela;on) |
|
conditional join A |X | {aid > bid} B |
– Only results in A X B with A.aid > B.bid are in result |
|
equi join A |X | {aid = bid} B |
– Only results in AXB with A.aid == B.bid are in result – The join fields (aid, bid) only appear once in result schema |
|
natural join A | X | B |
– Equi-Join on all matching fields of A and B |
|
Aggregation |
Partitioning a relation |
|
Correlated query |
A query that is dependent on the row in another query |
|
when to use HAVING |
for a group qualification |
|
queries with "some" |
WHERE EXISTS |
|
Find names for all critics that have rated some movie |
SELECT C.cname FROM Critics C WHERE EXISTS (SELECT * FROM Ratings R WHERE C.cid = R.cid) |
|
Name of Critics who have reviewed every movies |
SELECT C.cname FROM Critics C WHERE NOT EXISTS( (SELECT M.mid FROM Movies M) EXCEPT (SELECT R.mid FROM Reviewers R WHERE R.cid = C.cid) |
|
selectivity |
Pages retrieved from an operation |
|
Access path |
Method for obtaining tuples (records) |
|
Reduction factor |
number of tuples in result |
|
index nested loop join |
For each r in R: use index to find matching tuples s in S and add (s,r) to result |
|
sort merge join |
sort R and sort S on the join attributes merge R & S applying join condition: r first record of R, s first record of S while still r and s values to consider: if (r > s): s++ else if (r < s): r++ else: add (r,s) to result loop over duplicates advance r and s |
|
cost of index nested loop join |
Scan R + (for each r in R) (hash look-up to find matching S RIDs + file reads to get matching S records) |
|
cost of page nested loop join |
M + M*N |
|
cost of sort merge join |
sort cost: M(log M) + N (log N) merge cost: best case: M+N worst case: M*N |
|
range partitioning sort |
given R partitioned at one or more nodes, but not partitioned on sort column(s) 1) in parallel, each node partitions its Ri on the sort attribute(s) 2) each node, in parallel, does a local external Sort of its partition |
|
parallel hash join |
1) partitions get distributed to different nodes 2) do second parallel join phase at each node |
|
fastest sorting alg for sorting data that all fit in memory |
Quick Sort & Merge Sort about the same |
|
sort largest sized array completely in memory |
Quick Sort and Selection Sort |
|
Why is sorting important for DBMS? |
Data can be larger than the size of memory |
|
Sorting is used in which queries |
Queries with DISTINCT (eliminate duplicates) GROUP BY queries Bulk loading a B+ Tree faster join alg bulk loading B+ Tree |
|
cost of external merge sort |
(1+ Log B-1(ceiling (N/B) ))*2N |
|
good idea to use a B+ Tree index to sort? (ie get RIDs in order by traversing leaf pages then using RIDs to fetch records from file) |
if B+ tree is clustered? YES if B+ tree is not clustered? NO |
|
access methods |
file scan binary search (if file sorted) index on selection fields + retrieve tuple |
|
projection operator |
1) scan to remove unwanted attributes 2) remove duplicates: SELECT DISTINCT |
|
remove duplicates by sorting (projection) |
remove unwanted attributes on pass 0 remove duplicates in merge passes |
|
remove duplicates by hash (projection) |
create hash partitions (h1) eliminate duplicates (hash using h2, only insert value in bucket if no duplicate value already there, append new hash table to the result) |
|
nested loop join |
for each tuple r in R for each tuple s in S if r.a = s.a: append (r,s) to reuslt |
|
cost of nested loop join (not page) |
M + (M*f)*N f is tuples/page |
|
block nested loop join |
Use B-2 pages for R, 1 page for S, 1 page for output: for each block of B-2 pages of R: for each page of S: for all in-memory r tuples of R, s of S if r.a = s.a append (r,a) to result |
|
cost of block nested loop join |
M + ceiling[M/(B-2)]*N |
|
joins that use partitioning |
only compare r and s tuples in the same partition |
|
lower bound for all join methods |
O(M+N) |
|
block nlj with p processors is how many times faster? |
p times faster! |
|
partitioned nested loop join |
each outer tuple must be compared with each inner tuple that it might join with partitioning phase on join column local nested loop join on partitions |
|
given R partitioned at one or more nodes, but not partitioned on the sort column |
in parallel, each node partitions its Ri on the sort attribute (keep tuples in my range, send tuples in other ranges to other nodes) each node, in parallel, does a local external sort of its partition |
|
parallel hash join |
first phase partitions distributed to different nodes second parallel join phase at each node |
|
what alg is almost always best for equi-join? |
parallel hash join |
|
properties of transactions |
atomic consistent isolation durability |
|
concurrent execution |
T1 and T2’s execu3on can overlap, be interleaved |
|
parallel execution |
T1 and T2 are concurrent and they are run on a parallel machine |
|
serial schedule |
any schedule where each transaction executes in whole before starting the next transaction |
|
serializable schedule |
any concurrent schedule that is equivalent to some serial schedule |
|
2 Phase Locking |
1. Acquire Locks on Objects 2. Release Locks on Objects (Hold locks until XACT completes) |
|
how to solve deadlocks |
acquire in order |
|
when do we lock? |
on a call to access data in the buffer manager |
|
distributed system benefits |
sharing resources (clients can share same file) availability (if service replicated, if one goes down, still availability at another) scalability efficiency |
|
distributed system drawbacks |
more complex keep consistency of replicated states locating resources node crashes security |
|
two solutions to the fact that all distributed servers need to decide to either commit or abort the distributed XACT |
designated coordinator no designated coordinator |
|
pro/con of designated coordinator |
+ easier, fewer messages - single botteneck and point of failure |
|
pro/con of no designated coordinator |
- more complicated protocol, more messages + distribute load, no single point of failure |
|
2 phase commit (phase 1) |
phase 1: prepare to commit coordinator sends prepare message to all Si site Si recieves message, makes decision then sends ready/abort back to coordinator |
|
2 phase commit (phase 2) |
phase 2: decide to commit or abort coordinator receives all votes(if all ready, commit, else abort)sends decision to all sites site performs the local action |
|
relational algebra: when to use division |
"for all" queries |
|
when you want all of the results in a union |
UNION ALL |
|
how to remove duplicates from a SELECT |
you can sort (remove unwanted attributes on pass 0, remove duplicates on merge passes) you can hash into partitions |
|
parallel tradeoff |
amount of duplicated state vs communication overhead |
|
atomic |
all or nothing affect on DB |
|
consistency |
moves DB from one consistent state to another |
|
isolation |
XACT appears as if no other simultaneous XACT running on DB |
|
durability |
committed XACTS are permanent |
|
what does concurrency of transactions improve |
throughput, average response time, time to completion |
|
crash recovery |
replay log redo lost actions of committed XACTs undo actions of incomplete or aborted XACTs |
|
optimistic concurrency control |
no locking, keep XACT state to detect conflicts at commit time, -if no conflicts, then commit XACT -else, abort XACT |
|
pro/cons of strict 2 phase locking |
avoids cascading aborts, but restricts concurrency |
|
multiple server options |
same service replicated over multiple servers service is partitioned over multiple servers |
|
distributed DB act as one? |
run software that implements an abstraction of a single system |
|
multi server DBMS |
multiple servers cooperatively run a query |
|
issues with multi server DBMS |
distributed algorithms coordinating XACTs lock management keeping replicas consistent fault tolerance |