• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/69

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

69 Cards in this Set

  • Front
  • Back

What is an algorithm?

An algorithm is a method or a process followed to solve a problem

What properties must an algorithm have?

-correctness


-composed of concrete unambiguous steps


-the number of steps must be finite


-must terminate

What is a data structure?

A data structure is a particular way of storing and organising data in a computer so that it can be used efficiently

Random Access Machine

1) Memory consists of an infinite array


2) Instructions executed sequentially one at a time


3) All instructions take unit time. Running time is the number of instructions executed

Arrays

Sequence of elements A1,A2,...,an is called an array


They're in consecutive memory cells, but we (usually) don't care where exactly.


All array elements are of the same type, eg, integer

Linked lists

- a list is made up of nodes


- each node stores an element (a piece of data) plus a pointer or "link" to another node


- the first node is called the head


- the last node, called the tail points to null


- the nodes may be scattered all over the memory

Doubly linked lists

A node in a doubly linked list stores two references:


- a next link, which points to the next node in the list


- a prev link, which points to the previous node in the list


To simplify, we add two dummy or sentinel modes at the ends of the double linked list:


- the header has a valid next reference but a null prev reference


- the trailer has a valid prev reference but a null next reference


A doubly linked list needs to store references to the two sentinel nodes and a size counter keeping track of the number of nodes in the list (not counting sentinels)


An empty list would have the two sentinel nodes pointing to each other

Circularly linked lists

- A circularly linked list has the same kind of nodes as a singly linked list. That is, each node has a next pointer and a reference to an element


- The circularly linked list has no beginning or end. You can think of it as a singly linked list, where the last nodes next pointer, instead of being NULL, points back to the first node.


- Instead of references to the head and the tail, we mark a node of the circularly linked list as the cursor. The cursor is the starting node when we transverse the list.

Stacks

- A stack is a collection of objects that are inserted and removed according to the last-in-first-out principle


- Objects can be inserted into a stack at any time, but only the most recently inserted object (the last) can be removed at any time

Stack methods

A stack supports the following methods:


- push(e) : insert element e at the top of the stack


- pop: remove and return the top element of the stack; an error occurs if the stack is empty


And possibly also:


- size : return the number of elements in the stack


- isEmpty : return a Boolean indicating if the stack is empty


- top : return the top element in the stack, without removing it; an error occurs if the stack is empty

Stacks : implementation using arrays

- the array based stack implementation is time efficient. The time taken by all methods does not depend on the size of the stack.


- However, the fixed size N of the array can be a serious limitation:


1) If the size of the stack is much less than the size of the array, we waste memory


2) If the size of the stack exceeds the size of the array, the implementation will generate an exception


- the array-based implementation of the stack has fixed capacity

Queues

- A queue is a collection of objects that are inserted and removed according to the first-in-first-out (FIFO) principle


- Element access and deletion are restricted to the first element in the sequence, which is called the front of the queue


- Element insertion is restricted to the end of the sequence, which is called the rear of the queue

Queues: Methods

A queue supports the following methods:


- enqueue(e) : insert element e at the rear of the queue


- dequeue : remove and return from the queue the element at the front; an error occurs if the queue is empty


And possible also:


- size : Return the number of elements in the queue


- isEmpty : Return a Boolean indicating if the queue is empty


- Front : Return the front element of the queue, without removing it, an error occurs if the queue is empty

Queues : Implementation using arrays

- We use two variables f and r, which have the following meaning:


1) f is an index to the cell of Q storing the front of the queue, unless the queue is empty, in which case f = r


2) r is an index to the next available array cell in Q, that is, the cell after the rear of Q, if Q is not empty.


- initially we assign f = r = 0, indicating that the queue is empty


- After each enqueue operation we increment r. After each dequeue operation we increment f


- r is incremented after each enqueue operation and never decremented. After N enqueue operations we would get an array-out-of-bounds error


- To avoid this problem, we let r and f wrap around the end of Q, by using modulo N arithmetic on them.


- If the size of the queue is N, then f = r and the isEmpty method returns true, even though the queue is not empty.


- We avoid this problem by keeping the maximum of elements that can be stored in the queue to N-1. See the FullQueueException in the enqueue Algorithm


- the array based implementation of th queue is time efficient. All methods run in constant time


- Similarly to the array based implementation of the stack, the capacity of array based implementation of the queue is fixed

Hash Tables

- A hash table consists of a bucket array and hash function


- A bucket array for a hash table is an array A of size N, where each cell of A is thought as a bucket storing a collection of key-value pairs.


- The size N of the array is called the capacity of the hash table


- A hash function h is a function mapping each key k to an integer in the range [0, N-1], where N is the capacity of the hash table


-The main idea is to use h(k) as an index into the bucket array A. That is, we store the key-value pair (k,v), in the bucket A[h(k)]


-if there are two keys with the same hash value h(k), then two different entries will be mapped to the same bucket A. In this case, we say that a collision has occurred

Hash functions

Ideal goal : scramble the keys uniformly. More practically:


- hash functions should be efficiently computable


- each table position equally likely for each key


Some compression functions:


- division: take integer mod N


- multiply add divide (MAD) : y maps to ay + b mod N where a and B are nonnegative numbers


There are several ways to deal with collisions


Which ever method we choose to deal with them, a large number of collisions reduces the performance of the hash table


A good hash function minimises the collisions as much as possible.

Separate chaining

Each bucket A[i] stores a list holding the entries (k,v) such that h(k) = i


Separate chaining performance:


- cost is proportional to length of list


- average length = N/M (N is amount of data, M is size of array)


- Worst case : all keys hash to the same list

Linear probing

- In linear probing, if we try to insert an entry (k,v) into a bucket A[i] that is already occupied, where i = h(k), then we try next at A[(i+1) mod N]


- If A[(i + 1) mod N] is also occupied, then we try A[(i + 2) mod N], and so on, until we find an empty bucket that can accept the new entry


- The name linear probing comes from the fact that accessing a cell of the bucket array can be viewed as a probe


Linear probing : Costs

- Insert and search cost depend on length of cluster


- average length of a cluster is N/M


- Worst case : all keys hash to same cluster

Quadratic probing

- Quadratic probing interatively tries the buckets A[i+f(j)) mod N] for j = 0,1,2,..., Where f(j) = j^2 until finding an empty bucket


- Quadratic probing avoids clustering patterns that occur with linear probing. However, it creates its own kind of clustering, called secondary clustering.


- The quadratic probing strategy may not find an empty bucket in A even if one exists.

Double hashing

-In this approach, we choose a secondary hash function, h', and if h maps some key k to a bucket A[i], with i = h(k), that is already occupied, then we iteratively try the buckets A[(i+ f(j)) mod N], for j = 0,1,2,..., where f(j) =jh'(k)


- A common choice of secondary hash function is h'(k) = q - (k mod q), for some prime number q < N


- The secondary hash function function should not evaluate to zero

Open addressing schemes

- linear probing


- Quadratic probing


- Double hashing


Store at most one entry in each bucket

Comparison

- the open addressing schemes are more memory efficient compared to separate chaining


- regarding running times, in experimental and theoretical analyses, the separate chaining method is either competitive or faster than the other methods


- if memory space is not a major issue, the collision-handling method of choice is separate chaining.

Deletions

- Deletions from the hash table must not hinder future searches. If a bucket is simply left empty this will hinder future probes.


- but the bucket should not be left unusable


- to solve these problems we use tombstones : a marker that is left in a bucket after a deletion


- if a tombstone is encountered when searching along a probe sequence to find a key, we know to continue.


- If a tombstone is encountered during insertion, then we should continue probing (to avoid creating duplicates), but then the new record can be placed in the bucket where the tombstone was found


- the use of tombstones lengthens the average probe sequence distance


- two possible remedies:


1) local reorganization : after deleting a key, continue to follow the probe sequence of that key and move records into the vacated bucket (this will not work for all collision resolution policies)


2) Periodically rehash the table by reinserting all records into a new hash table. And if you have a record of which keys are accessed most these can be placed where they will be found most easily

Recursive Algorithms

- A recursive algorithm must have a base case


- A recursive Algorithm must change its state and move toward the base case


- A recursive algorithm must call itself, recursively

A recursive technique : Backtracking

General idea : build up the solution one step at a time, backtracking when unable to continue

Cool log trick

Loga(X) = logb (X) / log b (a)

Time complexity

The time complexity of an algorithm can be expressed in terms of the number of basic operations used by the algorithm when the input has particular size

Examples of basic operations

- additions


- multiplications


- comparisons of two numbers


- swaps


- assignments of values to variables

Space complexity

The space complexity of an algorithm is expressed in terms of the memory required by the algorithm for an input of a particular size

Worst-case time complexity

The worst-case time complexity of an algorithm can be expressed in terms of the largest number of basic operations used by the algorithm when the input has a particular size.


Worst-case analysis tells us how many operations an algorithm requires to guarantee that it will produce a solution.


It is difficult to compute the exact number of operations.


We are more interested in upper bounds for worst-case analysis.


These bounds should give us the possibility to estimate growth of the number of operations when the input size increases.


It is important to estimate the number of operations when that input size is large.

Average-case analysis

Here we are interested in the average number of operations over all inputs of a given size. It is difficult to compute the exact number of operations.

Big - O notation

Let f(X) and g(X) be functions from the set of integers or the set of real numbers to the set of real numbers. We stay that f(X) is O(g(X)) I'd there are constants C and k such that


|F(X)| <= C.|g(X)|


Whenever x >= k.


The definition says that after a certain point, namely after k, the absolute value of f(X) is bounded from above by C times the absolute value of g(X).


The constants C and k in the definition are called witnesses to the relationship f(X) is O(g(X))


If there is a pair of witnesses to the relationship f(X) is O(g(X)), then there are infinitely pairs of witnesses to that relationship

Theorem (the sum rule)

If F1(X) is O(G1(X)) and f2(X) is O(G2(X)), then F1(X) + f2(X) is O(max{|G1(X)|,|G2(X)|}).


Proof


Let k = max{k1,K2} and C = c1 + c2


Then for X > k we have |F1(x) + f2(X)| < = |F1(X)| + |f2(X)| <= c1. |G1(X)| + c2.|G2(X)| <= C.max{|G1(X)|, |G2(X)|}

Theorem (the product rule)

If F1(X) is O(G1(X) and f2(X) is O(G2(X)), then f1(X).f2(X) is O(G1(X).G2(X)).


Proof


Let k = max{k1,K2} and C = c1.c2


Then for X > k we have |F1(X).f2(X)| = |F1(X)|• |f2(X)| < = c1•|G1(X)|•c2•|G2(X)| = c•|G1(X)•g2(X)|. This proves the theorem.

Big- Omega notation

We say that f(X) is omega(g(X)) if there are positive constants X and k such that


|F(X)| >= C•|G(X)| whenever x > k. Note that this implies that f(X) is omega(g(X)) if and only if g(X) is O(f(X))

Theta

We say that f(X) is theta(g(X)) if f(X) if O(g(X)) and f(X) is omega(g(X))


This is equivalent to saying that f is O(g(X)) and g(X) is O(f(X))

Little-o

We say that f(X) is o(g(X)) when


Lim X➡️(infinity) f(X)/g(X) = 0


This clearly shows that f(X) is o(g(X)) implies f(X) is O(g(X))

Sublinear

A function f(X) is called sublinear if f(X) is o(X),


So if lim X ➡️(infinity) f(X)/X = 0

Little omega

w is to o what omega is to O

Insertion Sort

We know:


- when j has certain value, it inserts the j-th element into already sorted sequence


- can be proved correct using invariant "after j-th iteration first j+1 elements are in order"


- running time between n-1 and (n(n-1))/2 - worst case O(n^2)

Selection Sort

Invariant: after i-th iteration positions 1,...,I contain the overall I many smallest elements in order


Not necessarily the first i elements

Bubble sort

The i-th iteration of the outer loop "bubbles" the i-th largest element to where it belongs

Merge sort

- if given a sequence of no more than one element then we're done


- otherwise (length > 1), split sequence into two shorter sequences of equal length, sort then recursively, and merge the two resulting sequences


Mergesort always requires approximately nlog(n) steps to sort any numbers

Quicksort

At the beginning of each recursive call, quicksort picks one elements from the current sequence, the pivot.


Partitioning will be done w.r.t the pivot:


- each element smaller than the pivot does int the left part


- each element bigger than the pivot goes int the right part


- parts may have very different sizes


Ω(nlogn)Theorem

For any comparison-based sorting algorithm A and any n ∈ IN large enough there exists an input of length n that requires A to perform Ω(nlogn) comparisons.

Radix Sort

• have as many buckets as you’ve got different digits, that is, for base-10 you’ll have 10 of them


• repeatedly bucket-sort by given digit


• number of rounds will depend on values (the longer thebase-10 representations, the more rounds), but number of buckets only depends on number of different digits


Running time: Θ(d ·n) where d is number of rounds

Binary Search

1 Peek right into the middle of the given array, at position p = dn/2e


2 If A[p] = x then we’re lucky & done, and can return p


3 Otherwise, if x is greater than A[p], we may focus oursearch on stuff to the right of A[p] and may completely ignore anything to its left


4 Likewise, if x is smaller than A[p], we may focus oursearch on stuff to the left of A[p] and may completely ignore anything to its right.This is called binary search because in each (unsuccessful) step it at least halves the search space


For binary search we have the recurrence T(n) = T(n/2) + O(1) = O(logn),

Quick Select

• recursive


• Partition() function for selecting pivot and partitioning into LOW and HIGH (those smaller/bigger than pivot)


• not two recursive calls to sort but now only one: diving into the part (LOW or HIGH) where we know the i-th smallest element is to be found know that after partitioning, pivot is in correct position w.r.t. overall sortedness, so can simply compare pivot position after partitioning with sought index i


→ T(n) = T(n−1) + O(n), which means T(n) = Θ(n^2)(worst case)

Median-of-Medians

0 If length(A)≤ 5 then sort and return i-th smallest


1 Divide n elements into bn/5c groups of 5 elements each,plus at most one group containing the remaining n mod 5 < 5 elements


2 Find median of each of the dn/5e groups by sorting each one, and then picking median from sorted group elements


3 Call Select recursively on set of dn/5e medians found above,giving median-of-medians, x


4 Partition entire input around x. Let k be # of elements on low side plus one (simply count after partitioning):


• x is k-th smallest element, and


• there are n−k elements on high side of partition


5 If i = k, return x. Otherwise use Selectr ecursively to find i-th smallest element of low side if i < k, or (i −k)-the smallest on high side if i > k

Binary search tree

A binary search tree (BST) is a tree in which no node has more than two children (not necessarily exactly two).You must build and maintain the tree such that it’s true for every node v of the tree that:


• all elements in its left sub-tree are “smaller” than v


• all elements in its right sub-tree are “bigger” than v

Tree traversals

• in-order (for binary trees): recurse into left sub-tree, print payload, recurse into right sub-tree


• pre-order: print payload, recurse into sub-trees


• post-order: recurse into sub-trees, print payload

Deleting from a BST

We first do a lookup on the element that we wish to remove. If that returns “not found” then we’re done (element not in tree).Otherwise, three cases to be considered:


• If node is a leaf (no child) then simply remove it


• If node has one child (left or right) then remove and replace it with that one child – lift (the one) sub-tree up.


• If node has two children then...what?

Red-black trees

Idea simple: a red-black tree is an ordinary BST with one extra bit of storage per node: its colour, red or black Constraining how nodes can be coloured results in (approximate) balancedness of tree

A BST is a red-black tree if:

1 every node is either red or black


2 the root is black


3 every leaf (Null) is black


4 red nodes have black children


5 for all nodes, all paths from node to descendant leaves contain same # black nodes


(A red-black tree with n internal nodes hash height at most 2log(n + 1))

Red-Black Tree worst case complexities

1 space: O(n)


2 lookup: O(logn)


3 insert: O(logn)


4 delete: O(logn)

Heaps

Heaps are trees as well, but typically assumed to be stored in a flat array


• each tree node corresponds to an element of the array


• the tree is complete except perhaps the lowest level, filled left-to-right


• heap property: for all nodes v in the tree, v.parent.data>= v.data (assuming we have a node class like for BSTs)


• This is a max-heap (for min-heaps, v.parent.data =< v.data)


1 the root is in A[1]


2 parent(i) = A[i/2] – integer division, rounds down


3 left(i) = A[2i]


4 right(i) = A[2i+1]

Heapify

1 starting at the root, identify largest of current node v and its children


2 suppose largest element is in w


3 if w 6= v


· swap A[w] and A[v]


. recurse into w (contains now what root contained previously)

Running time for all algorithms studied

SelectionSort O(n^2)


InsertionSort O(n^2)


BubbleSort O(n^2)


MergeSort O(nlogn)


QuickSort O(n^2)


BucketSort O(n + K)


RadixSort O(nlogK)


HeapSort O(nlogn)

Basic properties of MSTs:

· have|V|−1 edges;


· have no cycles;


. might not be unique.

Kruskal’s Algorithm

1 Sort the edges by weight.


2 Let A = ∅.


3 Consider edges in increasing order of weight. For each edge e, add e to A unless this would create a cycle.


(Running time is O(E logV).)

Prim’s Algorithm

1 Let U = {u} where u is some vertex chosen arbitrarily.


2 Let A = ∅.


3 Until U contains all vertices: find the least-weight edge e that joins a vertex v in U to a vertex w not in U and add e to A and w to U. (Running time is O(V logV +E).)

String Matching Problem

· Given a (long) string T (over some (finite) alphabet Σ),


· and a (short) pattern P over the same alphabet (or subset),


. find one or all occurrences of P in T, if any

Shifts

We say s is


· a valid shift if P occurs with shift s in T,


. and an invalid shift otherwise

RobinHood Hashing

This is a variation of linear probing. If, during probing with a new key, an existing key is found that is “closer to home” than the new key, then the existing key is displaced and replaced by the new key.

RobinHood hashing theorems

(Assuming truly random hash functions) the expected worst-case cost of a lookup in a hash table with Robin Hood hashing is Ω(logn).(Assuming truly random hash functions) the variance of the expected number of probes required in Robin Hood hashing is O(loglogn).

Cuckoo Hashing

· With cuckoo hashing there are two tables, T1 and T2, and two corresponding hash functions h1 and h2.


· Each keyk is stored in either T1[h1(k)] or T2[h2(k)].


· To lookupor delete a key can be done in time O(1) since only two locations need bechecked.


· A key k is stored in T1[h1(k)] if it is empty... ... else it is stored in T2[h2(k)]


· if that is empty If both T1[h1(k)] and T2[h2(k) ]are occupied remove the key k’ that is in T1[h1(k)] and put k in T1[h1(k)],


· and move k’ to T2[h2(k')] if T2[h2(k’)] is occupied by another key k’’


. then move that to T1[h1(k'')] and so on ... each time a key is displaced move it to the other table.

The Cuckoo Graph

· The cuckoo graph is a bipartite graph derived from a cuckoo hash table.


· The vertices are the buckets, and for each key k, there is an edge from T1[h1(k)] to T2[h2(k)] (so we can think of keys as being the edges of the graph).


. An insertion into the table can be described by a path through the cuckoo graph.

Cuckoo hashing fails

If any connected component of the cuckoo graph has more than one cycle.