index.md (3144B)
1 +++ 2 title = 'Reliability & Performance' 3 +++ 4 # Reliability & Performance 5 ## How to ensure reliability 6 what are the threats? 7 8 - disk failures: bad blocks, whole-disk errors 9 - power failures: (meta)data inconsistently written to disk 10 - software bugs: bad (meta)data written to disk 11 - user errors: `rm *.o` vs `rm * .o`; `dd if=/dev/zero of=zeros bs=1M # fill disk quota` 12 13 backups: incremental vs full, online vs offline, physical vs logical (on filesystem level), compressed vs uncompressed, local vs remote 14 15 RAID: redundant array of independent (originally inexpensive) disks 16 17 - virtualise addressing on top of multiple disks (as single address space) 18 - RAID control operates just like MMU in memory 19 - options: 20 - mirroring (RAID 1) -- no real slowdown or advantage for writing. but reading can be done in parallel from two different disks. 21 - striping (RAID 0) -- scatter accross disks. no reliability benefits, but very good performance. 22 - hybrid -- first few you stripe. the last disk, you store parity bits. 23 - ![](a7298bb639635540af0873ab67b18f2c.png) 24 - [Wikipedia page](https://en.wikipedia.org/wiki/Nested_RAID_levels) 25 26 fsck (File System Consistency Check) 27 28 - you need invariants. so exploit redundancy in existing filesystems. 29 - ![](9f1775d17b641473033931c7009a2fa0.png) 30 31 ## Improve filesystem performance: 32 minimize disk access: 33 34 - caching: buffer cache, inode cache (literally cache of inodes stored in memory), direntry cache (for e.g. path name lookups) 35 - buffer cache: 36 - build list recently used queue. end is most recently used, front is least recently used. 37 - periodically evict from front. hash table pointing to indicies (don't have to go through whole list to search) 38 - write-through caching (if doing write on block, will do on cache, and immediately persist on disk) vs. periodic syncing (periodically write back blocks in buffer cache, typically with daemon) 39 - ![](b46b5bb2c4b52cf4be0ce975af65fb60.png) 40 - block read ahead (anticipate access patterns 41 42 minimize seek time (stay in the same section of memory more or less): 43 44 - try to alloc files contiguously 45 - spread i-nodes over disk 46 - store small file data 'inline' in i-node (as metadata kind of) 47 - defragment disk 48 49 ## Different file system options: 50 log-structured filesystems: 51 52 - optimise for frequent small writes 53 - collect pending writes in log segment, flush to disk sequentially 54 - segment can contain anything (inodes, dir entries, blocks, whatever) and can be e.g. 1 MB in size 55 - relies on inode index to find inodes in log efficiently 56 - garbage collection to reclaim stale log entries 57 58 journaling filesystems: 59 60 - use 'logs' for crash recovery 61 - first write transactional operations in log: 62 - remove file from its dir 63 - release inode to pool of free inodes 64 - return all disk blocks to pool of free disk blocks 65 - after crash, replay operations from log 66 - requires single operations to be *idempotent* 67 - should support multiple, arbitrary crashes 68 - journaling is widely used in modern filesystems 69 70 virtual filesystems (VFS): 71 72 - ![](bf1de095db2a876b4fc51249dbeff88f.png) 73 - ![](c306c2cc8e85ffa6074cac359f53d93a.png)