• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/83

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

83 Cards in this Set

  • Front
  • Back

static hashing

each bucket is one page


number of buckets are fixed


if bucket is filled, add overflow page

dynamic hashing

split bucket when full, rehash single split buckets entries between old and new


-use directory of pointers


may need to double directory if no bucket chains available

dynamic hashing:


find the bucket for k

take the last global depth # bits of h(k)

dynamic hashing:


find the value in the bucket

take the last local depth # bits of h(k)

when do you increase the local depth of a bucket?

when you split a bucket because the bucket is full, the new global depth = old global depth + 1

when do you increase the global depth of the directory?

when you double the directory of pointers

renaming can be done in two ways:

by position


by name

set difference (R-S)

all tuples in R that are not also in S

cross product (RxS)

in Result for all r in R, s in S

set division (R/S)

= { x | for all y in S, (x, y) is in R }


"for all" queries

Natural Join (R |X| S)

– R and S must have at least one field with same domain

cross product: what do you do if there are overlapping fields?

Rename overlapping fields (can’t have duplicate field names in the result rela;on)

conditional join


A |X | {aid > bid} B

– Only results in A X B with A.aid > B.bid are in result

equi join


A |X | {aid = bid} B

– Only results in AXB with A.aid == B.bid are in result


– The join fields (aid, bid) only appear once in result schema

natural join


A | X | B

– Equi-Join on all matching fields of A and B

Aggregation

Partitioning a relation

Correlated query

A query that is dependent on the row in another query

when to use HAVING

for a group qualification

queries with "some"

WHERE EXISTS

Find names for all critics that have rated some movie

SELECT C.cname


FROM Critics C


WHERE EXISTS


(SELECT *


FROM Ratings R


WHERE C.cid = R.cid)

Name of Critics who have reviewed every movies

SELECT C.cname


FROM Critics C


WHERE NOT EXISTS(


(SELECT M.mid


FROM Movies M)


EXCEPT


(SELECT R.mid


FROM Reviewers R


WHERE R.cid = C.cid)

selectivity

Pages retrieved from an operation

Access path

Method for obtaining tuples (records)

Reduction factor

number of tuples in result

index nested loop join

For each r in R:


use index to find matching tuples s in S and


add (s,r) to result

sort merge join

sort R and sort S on the join attributes


merge R & S applying join condition:


r first record of R, s first record of S


while still r and s values to consider:


if (r > s):


s++


else if (r < s):


r++


else:


add (r,s) to result loop over duplicates


advance r and s

cost of index nested loop join

Scan R + (for each r in R) (hash look-up to find matching S RIDs + file reads to get matching S records)

cost of page nested loop join

M + M*N

cost of sort merge join

sort cost: M(log M) + N (log N)


merge cost:


best case: M+N


worst case: M*N

range partitioning sort

given R partitioned at one or more nodes, but not partitioned on sort column(s)


1) in parallel, each node partitions its Ri on the sort attribute(s)


2) each node, in parallel, does a local external Sort of its partition

parallel hash join

1) partitions get distributed to different nodes


2) do second parallel join phase at each node



fastest sorting alg for sorting data that all fit in memory

Quick Sort & Merge Sort about the same



sort largest sized array completely in memory

Quick Sort and Selection Sort

Why is sorting important for DBMS?

Data can be larger than the size of memory

Sorting is used in which queries

Queries with DISTINCT (eliminate duplicates)


GROUP BY queries


Bulk loading a B+ Tree


faster join alg


bulk loading B+ Tree

cost of external merge sort

(1+ Log B-1(ceiling (N/B) ))*2N

good idea to use a B+ Tree index to sort?


(ie get RIDs in order by traversing leaf pages then using RIDs to fetch records from file)

if B+ tree is clustered? YES


if B+ tree is not clustered? NO

access methods

file scan


binary search (if file sorted)


index on selection fields + retrieve tuple

projection operator

1) scan to remove unwanted attributes


2) remove duplicates: SELECT DISTINCT

remove duplicates by sorting (projection)

remove unwanted attributes on pass 0


remove duplicates in merge passes

remove duplicates by hash (projection)

create hash partitions (h1)


eliminate duplicates (hash using h2, only insert value in bucket if no duplicate value already there, append new hash table to the result)



nested loop join

for each tuple r in R


for each tuple s in S


if r.a = s.a:


append (r,s) to reuslt

cost of nested loop join (not page)

M + (M*f)*N




f is tuples/page

block nested loop join

Use B-2 pages for R, 1 page for S, 1 page for output:




for each block of B-2 pages of R:


for each page of S:


for all in-memory r tuples of R, s of S


if r.a = s.a append (r,a) to result

cost of block nested loop join

M + ceiling[M/(B-2)]*N

joins that use partitioning

only compare r and s tuples in the same partition

lower bound for all join methods

O(M+N)

block nlj with p processors is how many times faster?

p times faster!

partitioned nested loop join

each outer tuple must be compared with each inner tuple that it might join with


partitioning phase on join column


local nested loop join on partitions

given R partitioned at one or more nodes, but not partitioned on the sort column

in parallel, each node partitions its Ri on the sort attribute (keep tuples in my range, send tuples in other ranges to other nodes)


each node, in parallel, does a local external sort of its partition

parallel hash join

first phase partitions distributed to different nodes


second parallel join phase at each node

what alg is almost always best for equi-join?

parallel hash join

properties of transactions

atomic


consistent


isolation


durability

concurrent execution

T1 and T2’s execu3on can overlap, be interleaved

parallel execution

T1 and T2 are concurrent and they are run on a parallel machine

serial schedule

any schedule where each transaction executes in whole before starting the next transaction

serializable schedule

any concurrent schedule that is equivalent to some serial schedule

2 Phase Locking

1. Acquire Locks on Objects


2. Release Locks on Objects




(Hold locks until XACT completes)

how to solve deadlocks

acquire in order

when do we lock?

on a call to access data in the buffer manager

distributed system benefits

sharing resources (clients can share same file)


availability (if service replicated, if one goes down, still availability at another)


scalability


efficiency

distributed system drawbacks

more complex


keep consistency of replicated states


locating resources


node crashes


security

two solutions to the fact that all distributed servers need to decide to either commit or abort the distributed XACT

designated coordinator


no designated coordinator

pro/con of designated coordinator

+ easier, fewer messages


- single botteneck and point of failure

pro/con of no designated coordinator

- more complicated protocol, more messages


+ distribute load, no single point of failure

2 phase commit (phase 1)

phase 1: prepare to commit


coordinator sends prepare message to all Si


site Si recieves message, makes decision then sends ready/abort back to coordinator





2 phase commit (phase 2)

phase 2: decide to commit or abort coordinator receives all votes(if all ready, commit, else abort)sends decision to all sites site performs the local action

relational algebra: when to use division

"for all" queries

when you want all of the results in a union

UNION ALL

how to remove duplicates from a SELECT

you can sort


(remove unwanted attributes on pass 0,


remove duplicates on merge passes)


you can hash into partitions

parallel tradeoff

amount of duplicated state


vs communication overhead

atomic

all or nothing affect on DB

consistency

moves DB from one consistent state to another

isolation

XACT appears as if no other simultaneous XACT running on DB

durability

committed XACTS are permanent

what does concurrency of transactions improve

throughput, average response time, time to completion

crash recovery

replay log


redo lost actions of committed XACTs


undo actions of incomplete or aborted XACTs

optimistic concurrency control

no locking, keep XACT state to detect conflicts


at commit time,


-if no conflicts, then commit XACT


-else, abort XACT

pro/cons of strict 2 phase locking

avoids cascading aborts, but restricts concurrency

multiple server options

same service replicated over multiple servers


service is partitioned over multiple servers

distributed DB act as one?

run software that implements an abstraction of a single system

multi server DBMS

multiple servers cooperatively run a query

issues with multi server DBMS

distributed algorithms


coordinating XACTs


lock management


keeping replicas consistent


fault tolerance