lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

hashing.md (1823B)


      1 +++
      2 title = "Hashing"
      3 template = "page-math.html"
      4 +++
      5 
      6 # Hashing
      7 
      8 - hash tables are good for dictionaries
      9 - keys starting at 0
     10 - direct address table:
     11 
     12     $\text{keys} \in {0,…,m-1}$
     13 
     14     $T[k] = \begin{cases} \text{nil} & \text{if no item with key k}\\\\
     15     \text{pointer to element with key k} & \text{otherwise}\end{cases}$
     16 
     17     - insert: set value at key
     18     - delete: set value to nil
     19     - drawbacks: lots of memory, keys must be int
     20     - identity function key —> index
     21 - for other purposes, change the mapping function
     22 - simple example:
     23     - keys are first names, value phone number
     24     - hash function is name length mod 5
     25 - choosing a hash function
     26     - division: k ➝ (k mod m)
     27         - easy but not good for all values
     28         - so better to take prime not close to power of 2
     29     - multiplication
     30 - problem of collision: different keys hashed to same slot
     31     - for p items, hash table size m, so mp possibilities for hash function
     32     - $\frac{m!}{(m-p)!}$  possibilities for hashing without collision
     33 - solving collisions
     34     - chaining: put items that hash to same value in a linked list
     35         - insertion in O(1)
     36         - deletion in O(1) for doubly-linked (you wouldn’t use singly linked)
     37         - searching in O(n) with n dictionary size, average in ϴ(1+α)
     38             - α is load factor
     39             - for n keys, m slots,
     40 
     41                 $\alpha = \frac{n}{m}$
     42 
     43     - open addressing
     44         - make probe sequence: h: U × { 0…m-1} ➝ {0…m-1}
     45         - linear probing
     46             - for next probe, next address mod m
     47             - h(k,i) = h’(k) + i mod m
     48             - keep increasing i until empty place found
     49             - but you get clustering, and removal hurts on a physical level
     50         - double hashing, use two functions when the first function gives a collision