lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

data-structures-and-algorithms-used-in-network-designs.md (2241B)


      1 +++
      2 title = 'Data structures and algorithms used in network designs'
      3 +++
      4 # Data structures and algorithms used in network designs
      5 ## How to quickly decide if element exists in a large set
      6 Used in tasks involving membership detection:
      7 - access control list (decides if IP address is in blocklist)
      8 - IP multicast (decides if router port should replicate packet)
      9 - load balancer (decides if source IP has been assigned to server)
     10 
     11 Trivial solutions:
     12 - unordered list, linear search. O(n) time.
     13 - ordered list, binary search. O(log n) time.
     14 
     15 Hashing:
     16 - map data to fixed-size values with function
     17 - hash table indexed by hash values, 1-to-1 mapping
     18 - constant O(1) time, just computing a function
     19 - problems: hash collision -- multiple data entries mapped to same hash value
     20 - choosing good hash functions: deterministic (always same value for same key), efficiently computable, minimize collisions
     21   - modulo hashing: $h(x) = x % table_size$, with $table_size$ a prime number
     22     - with non-uniformly distributed data, data sharing factors with $table_size$ will be hashed to same values → use prime numbers, which have least factors
     23     - other hash functions, e.g. MD5, SHA256
     24 - handling hash collisions:
     25   - data structure that resolves hash collisions
     26     - separate chaining: create list of keys that map to same hash value
     27     - open addressing: shift values following a given strategy when a collision happens. when trying to find(x), hash and keep looking for x using the strategy until an empty slot is found.
     28     - cuckoo hashing: use two hash functions to generate two possible slots for each key, use two hash tables
     29     - bloom filter
     30 
     31 ## How to efficiently count occurrences for large set of elements
     32 Counting bloom filter:
     33 - increment counters corresponding to hash values on insert
     34 - on read, look up counters corresponding to hash values with minimum count
     35 
     36 Count-min sketch:
     37 - three hash functions, each mapped to array of counters (hash tables)
     38 
     39 ## How to quickly find match on prefix of a string
     40 Trie: a binary tree where nodes have a part of the prefix
     41 
     42 PATRICIA trie: radix trie with radix 2
     43 
     44 Direct trie: combine segments in entries
     45 
     46 TCAM: hardware implementation
     47 - supports matching on set of records in constant time