• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/25

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

25 Cards in this Set

  • Front
  • Back

Hash Table

- A data structure used to implement an associative array, a structure that can map keys to values


- Uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found


- O(1)

Hash Function

- Function which, when applied to the key, produces an integer which can be used as an address in a hash table (index into an array of buckets)

Hash Index

- Organizes the search keys with their associated pointers into a hash file


- Consists of a collection of buckets organized in an array


- Through linked lists, multiple items can be associated with one index because of this Hash indices are often called buckets

Operations for Creating a Hash Table

1. Creating an array: The actual table where the data is stored


2. Creating the Hash Function

Uniform Hash

When the indices produced by the hash function (into an array) are equally likely to be generated

Problems with Hash Function

- Ideally, the hash function will assign each key to a unique bucket (index), but this ideal situation is rarely achievable in practice


- Most hash table designs assume that hash collisions—different keys that are assigned by the hash function to the same bucket—will occur

Complexity of Hashing

- In many situations, hash tables turn out to be more efficientthan search trees or any other table lookup structure


- For this reason, they are widely used in many kinds ofcomputer software, particularly for associative arrays,database indexing, caches, and sets

Procedure for Hash Index

- Puts a pointer to a record in the bucket based on the hash of that particular record’s key


- A hash function “h” maps a search-key value to an address of a bucket


- Commonly used hash function: hash value mod n_B where n_B is the number of buckets

Procedure for Hash Function

1. Creating the original hash code for the hash key


2. Transferring this number to a smaller number to be used as index for the key

Other Applications of Hash Functions

- Used to build caches for large data sets stored in slow media such as a slow Hard Drive. Frequently accessed data is saved in cache and can be accessed in a very short time


- Computer operating systems use hash functions for fast access of key words and commands

Good Hash Function

- Create hash codes that permits the widest distribution of keys to various index values of the hash table.


- It should uniformly distribute the keys

Perfect Hash Function

- Create such a unique code for each key in the data that the probability of two keys having the same code is zero.


- Such a hash function is ideal but rarely does it exist

Collision

- When 2 keys hash to the same location in the hash table


- Note: a good hash function distributes the elements uniformly over the hash table, this keeps the number of collisions within manageable limits

Chaining

- Collision Resolution Method


- When there is a collision (key hashed to an occupied slot), the new key is stored in an overflow list.


- Each incoming duplicate entry is linked at the end of this list chained to the index


- Solution: make the size of the hash table (m) larger than the number of key value pairs (33%)

Linear Probing

- Collision Resolution Method


- Will look onto the subsequent hash elements until the first free space is found


- The size of the hash table (m) must be bigger than the number of elements so that some empty slots areavailable

Hashing with Databases

- Oracle does not have hash indices built in


- MySQL does

Indexing

- Something to facilitate search


- Used in Databases


- Purpose: improve the speed of data retrieval

Two Main types of Indexing

- Ordered Index


- Hash Index

Ordered Index: Primary Index

- If the file containing the records is sequentially ordered, the index whose search key specifies the sequential order of the file

Ordered Index: Secondary Index

- Indices whose search key specifies an order different from the sequential order of the file


- Improve the performance of queries on nonprimary keys


- Impose serious overhead on database modification: whenever a file is updated, every index must be updated

Dense Index

- An index record appears for every search key value in file


- This record contains search key value and a pointer to the actual record

Sparse Index

- Created only for some of the records


- To locate a record, we find the index record with the largest search key value less than or equal to the search key value we are looking for


- We start at that record pointed to by the index record, and proceed along the pointers in the file (that is, sequentially) until11 we find the desired record

Dense Indicies

- Don’t give us much of a bonus on an ordered table –a primary index that is dense on an ordered table is redundant!


- A primary index on an ordered table is always sparse


- Otherwise (unordered) use dense or secondary indices


- Dense indices are faster in general

How much indexing to do?

- Read-heavy DBs – can index a lot (if got the memory to do it)


- Write-heavy DBs – index sparingly! (take a balanced approach)


- Write-ONLY DBs – one or no index (e.g. log table)

Two-Level Sparse Index

- If index is too large to be kept in main memory, one search results in several disk reads


- If there are no overflow blocks in the index, we can use binary search


- If index has overflow blocks, then sequential search typically used


- Solution: Construct a sparse index on the index