data-structures-and-algorithms-used-in-network-designs.md (2241B)
1 +++ 2 title = 'Data structures and algorithms used in network designs' 3 +++ 4 # Data structures and algorithms used in network designs 5 ## How to quickly decide if element exists in a large set 6 Used in tasks involving membership detection: 7 - access control list (decides if IP address is in blocklist) 8 - IP multicast (decides if router port should replicate packet) 9 - load balancer (decides if source IP has been assigned to server) 10 11 Trivial solutions: 12 - unordered list, linear search. O(n) time. 13 - ordered list, binary search. O(log n) time. 14 15 Hashing: 16 - map data to fixed-size values with function 17 - hash table indexed by hash values, 1-to-1 mapping 18 - constant O(1) time, just computing a function 19 - problems: hash collision -- multiple data entries mapped to same hash value 20 - choosing good hash functions: deterministic (always same value for same key), efficiently computable, minimize collisions 21 - modulo hashing: $h(x) = x % table_size$, with $table_size$ a prime number 22 - with non-uniformly distributed data, data sharing factors with $table_size$ will be hashed to same values → use prime numbers, which have least factors 23 - other hash functions, e.g. MD5, SHA256 24 - handling hash collisions: 25 - data structure that resolves hash collisions 26 - separate chaining: create list of keys that map to same hash value 27 - open addressing: shift values following a given strategy when a collision happens. when trying to find(x), hash and keep looking for x using the strategy until an empty slot is found. 28 - cuckoo hashing: use two hash functions to generate two possible slots for each key, use two hash tables 29 - bloom filter 30 31 ## How to efficiently count occurrences for large set of elements 32 Counting bloom filter: 33 - increment counters corresponding to hash values on insert 34 - on read, look up counters corresponding to hash values with minimum count 35 36 Count-min sketch: 37 - three hash functions, each mapped to array of counters (hash tables) 38 39 ## How to quickly find match on prefix of a string 40 Trie: a binary tree where nodes have a part of the prefix 41 42 PATRICIA trie: radix trie with radix 2 43 44 Direct trie: combine segments in entries 45 46 TCAM: hardware implementation 47 - supports matching on set of records in constant time