Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
Databases FINAL

Databases Final

by allisonryder, Dec. 2016

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/83

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

83 Cards in this Set

Front
Back

	static hashing	each bucket is one page number of buckets are fixed if bucket is filled, add overflow page
	dynamic hashing	split bucket when full, rehash single split buckets entries between old and new -use directory of pointers may need to double directory if no bucket chains available
	dynamic hashing: find the bucket for k	take the last global depth # bits of h(k)
	dynamic hashing: find the value in the bucket	take the last local depth # bits of h(k)
	when do you increase the local depth of a bucket?	when you split a bucket because the bucket is full, the new global depth = old global depth + 1
	when do you increase the global depth of the directory?	when you double the directory of pointers
	renaming can be done in two ways:	by position by name
	set difference (R-S)	all tuples in R that are not also in S
	cross product (RxS)	in Result for all r in R, s in S
	set division (R/S)	= { x \| for all y in S, (x, y) is in R } "for all" queries
	Natural Join (R \|X\| S)	– R and S must have at least one field with same domain
	cross product: what do you do if there are overlapping fields?	Rename overlapping fields (can’t have duplicate field names in the result rela;on)
	conditional join A \|X \| {aid > bid} B	– Only results in A X B with A.aid > B.bid are in result
	equi join A \|X \| {aid = bid} B	– Only results in AXB with A.aid == B.bid are in result – The join fields (aid, bid) only appear once in result schema
	natural join A \| X \| B	– Equi-Join on all matching fields of A and B
	Aggregation	Partitioning a relation
	Correlated query	A query that is dependent on the row in another query
	when to use HAVING	for a group qualification
	queries with "some"	WHERE EXISTS
	Find names for all critics that have rated some movie	SELECT C.cname FROM Critics C WHERE EXISTS (SELECT * FROM Ratings R WHERE C.cid = R.cid)
	Name of Critics who have reviewed every movies	SELECT C.cname FROM Critics C WHERE NOT EXISTS( (SELECT M.mid FROM Movies M) EXCEPT (SELECT R.mid FROM Reviewers R WHERE R.cid = C.cid)
	selectivity	Pages retrieved from an operation
	Access path	Method for obtaining tuples (records)
	Reduction factor	number of tuples in result
	index nested loop join	For each r in R: use index to find matching tuples s in S and add (s,r) to result
	sort merge join	sort R and sort S on the join attributes merge R & S applying join condition: r first record of R, s first record of S while still r and s values to consider: if (r > s): s++ else if (r < s): r++ else: add (r,s) to result loop over duplicates advance r and s
	cost of index nested loop join	Scan R + (for each r in R) (hash look-up to find matching S RIDs + file reads to get matching S records)
	cost of page nested loop join	M + M*N
	cost of sort merge join	sort cost: M(log M) + N (log N) merge cost: best case: M+N worst case: M*N
	range partitioning sort	given R partitioned at one or more nodes, but not partitioned on sort column(s) 1) in parallel, each node partitions its Ri on the sort attribute(s) 2) each node, in parallel, does a local external Sort of its partition
	parallel hash join	1) partitions get distributed to different nodes 2) do second parallel join phase at each node
	fastest sorting alg for sorting data that all fit in memory	Quick Sort & Merge Sort about the same
	sort largest sized array completely in memory	Quick Sort and Selection Sort
	Why is sorting important for DBMS?	Data can be larger than the size of memory
	Sorting is used in which queries	Queries with DISTINCT (eliminate duplicates) GROUP BY queries Bulk loading a B+ Tree faster join alg bulk loading B+ Tree
	cost of external merge sort	(1+ Log B-1(ceiling (N/B) ))*2N
	good idea to use a B+ Tree index to sort? (ie get RIDs in order by traversing leaf pages then using RIDs to fetch records from file)	if B+ tree is clustered? YES if B+ tree is not clustered? NO
	access methods	file scan binary search (if file sorted) index on selection fields + retrieve tuple
	projection operator	1) scan to remove unwanted attributes 2) remove duplicates: SELECT DISTINCT
	remove duplicates by sorting (projection)	remove unwanted attributes on pass 0 remove duplicates in merge passes
	remove duplicates by hash (projection)	create hash partitions (h1) eliminate duplicates (hash using h2, only insert value in bucket if no duplicate value already there, append new hash table to the result)
	nested loop join	for each tuple r in R for each tuple s in S if r.a = s.a: append (r,s) to reuslt
	cost of nested loop join (not page)	M + (Mf)N f is tuples/page
	block nested loop join	Use B-2 pages for R, 1 page for S, 1 page for output: for each block of B-2 pages of R: for each page of S: for all in-memory r tuples of R, s of S if r.a = s.a append (r,a) to result
	cost of block nested loop join	M + ceiling[M/(B-2)]*N
	joins that use partitioning	only compare r and s tuples in the same partition
	lower bound for all join methods	O(M+N)
	block nlj with p processors is how many times faster?	p times faster!
	partitioned nested loop join	each outer tuple must be compared with each inner tuple that it might join with partitioning phase on join column local nested loop join on partitions
	given R partitioned at one or more nodes, but not partitioned on the sort column	in parallel, each node partitions its Ri on the sort attribute (keep tuples in my range, send tuples in other ranges to other nodes) each node, in parallel, does a local external sort of its partition
	parallel hash join	first phase partitions distributed to different nodes second parallel join phase at each node
	what alg is almost always best for equi-join?	parallel hash join
	properties of transactions	atomic consistent isolation durability
	concurrent execution	T1 and T2’s execu3on can overlap, be interleaved
	parallel execution	T1 and T2 are concurrent and they are run on a parallel machine
	serial schedule	any schedule where each transaction executes in whole before starting the next transaction
	serializable schedule	any concurrent schedule that is equivalent to some serial schedule
	2 Phase Locking	1. Acquire Locks on Objects 2. Release Locks on Objects (Hold locks until XACT completes)
	how to solve deadlocks	acquire in order
	when do we lock?	on a call to access data in the buffer manager
	distributed system benefits	sharing resources (clients can share same file) availability (if service replicated, if one goes down, still availability at another) scalability efficiency
	distributed system drawbacks	more complex keep consistency of replicated states locating resources node crashes security
	two solutions to the fact that all distributed servers need to decide to either commit or abort the distributed XACT	designated coordinator no designated coordinator
	pro/con of designated coordinator	+ easier, fewer messages - single botteneck and point of failure
	pro/con of no designated coordinator	- more complicated protocol, more messages + distribute load, no single point of failure
	2 phase commit (phase 1)	phase 1: prepare to commit coordinator sends prepare message to all Si site Si recieves message, makes decision then sends ready/abort back to coordinator
	2 phase commit (phase 2)	phase 2: decide to commit or abort coordinator receives all votes(if all ready, commit, else abort)sends decision to all sites site performs the local action
	relational algebra: when to use division	"for all" queries
	when you want all of the results in a union	UNION ALL
	how to remove duplicates from a SELECT	you can sort (remove unwanted attributes on pass 0, remove duplicates on merge passes) you can hash into partitions
	parallel tradeoff	amount of duplicated state vs communication overhead
	atomic	all or nothing affect on DB
	consistency	moves DB from one consistent state to another
	isolation	XACT appears as if no other simultaneous XACT running on DB
	durability	committed XACTS are permanent
	what does concurrency of transactions improve	throughput, average response time, time to completion
	crash recovery	replay log redo lost actions of committed XACTs undo actions of incomplete or aborted XACTs
	optimistic concurrency control	no locking, keep XACT state to detect conflicts at commit time, -if no conflicts, then commit XACT -else, abort XACT
	pro/cons of strict 2 phase locking	avoids cascading aborts, but restricts concurrency
	multiple server options	same service replicated over multiple servers service is partitioned over multiple servers
	distributed DB act as one?	run software that implements an abstraction of a single system
	multi server DBMS	multiple servers cooperatively run a query
	issues with multi server DBMS	distributed algorithms coordinating XACTs lock management keeping replicas consistent fault tolerance

Share This Flashcard Set

Set the Language

Related Flashcards

Databases Final

Add to Folders

Upgrade to Cram Premium

Card Range To Study

83 Cards in this Set