lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

index.md (3144B)


      1 +++
      2 title = 'Reliability & Performance'
      3 +++
      4 # Reliability & Performance
      5 ## How to ensure reliability
      6 what are the threats?
      7 
      8 - disk failures: bad blocks, whole-disk errors
      9 - power failures: (meta)data inconsistently written to disk
     10 - software bugs: bad (meta)data written to disk
     11 - user errors: `rm *.o` vs `rm * .o`; `dd if=/dev/zero of=zeros bs=1M # fill disk quota`
     12 
     13 backups: incremental vs full, online vs offline, physical vs logical (on filesystem level), compressed vs uncompressed, local vs remote
     14 
     15 RAID: redundant array of independent (originally inexpensive) disks
     16 
     17 - virtualise addressing on top of multiple disks (as single address space)
     18 - RAID control operates just like MMU in memory
     19 - options:
     20     - mirroring (RAID 1) -- no real slowdown or advantage for writing. but reading can be done in parallel from two different disks.
     21     - striping (RAID 0) -- scatter accross disks. no reliability benefits, but very good performance.
     22     - hybrid -- first few you stripe. the last disk, you store parity bits.
     23 - ![](a7298bb639635540af0873ab67b18f2c.png)
     24 - [Wikipedia page](https://en.wikipedia.org/wiki/Nested_RAID_levels)
     25 
     26 fsck (File System Consistency Check)
     27 
     28 - you need invariants. so exploit redundancy in existing filesystems.
     29 - ![](9f1775d17b641473033931c7009a2fa0.png)
     30 
     31 ## Improve filesystem performance:
     32 minimize disk access:
     33 
     34 - caching: buffer cache, inode cache (literally cache of inodes stored in memory), direntry cache (for e.g. path name lookups)
     35     - buffer cache:
     36         - build list recently used queue. end is most recently used, front is least recently used.
     37         - periodically evict from front. hash table pointing to indicies (don't have to go through whole list to search)
     38         - write-through caching (if doing write on block, will do on cache, and immediately persist on disk) vs. periodic syncing (periodically write back blocks in buffer cache, typically with daemon)
     39         - ![](b46b5bb2c4b52cf4be0ce975af65fb60.png)
     40 - block read ahead (anticipate access patterns
     41 
     42 minimize seek time (stay in the same section of memory more or less):
     43 
     44 - try to alloc files contiguously
     45 - spread i-nodes over disk
     46 - store small file data 'inline' in i-node (as metadata kind of)
     47 - defragment disk
     48 
     49 ## Different file system options:
     50 log-structured filesystems:
     51 
     52 - optimise for frequent small writes
     53 - collect pending writes in log segment, flush to disk sequentially
     54 - segment can contain anything (inodes, dir entries, blocks, whatever) and can be e.g. 1 MB in size
     55 - relies on inode index to find inodes in log efficiently
     56 - garbage collection to reclaim stale log entries
     57 
     58 journaling filesystems:
     59 
     60 - use 'logs' for crash recovery
     61 - first write transactional operations in log:
     62     - remove file from its dir
     63     - release inode to pool of free inodes
     64     - return all disk blocks to pool of free disk blocks
     65 - after crash, replay operations from log
     66 - requires single operations to be *idempotent*
     67 - should support multiple, arbitrary crashes
     68 - journaling is widely used in modern filesystems
     69 
     70 virtual filesystems (VFS):
     71 
     72 - ![](bf1de095db2a876b4fc51249dbeff88f.png)
     73 - ![](c306c2cc8e85ffa6074cac359f53d93a.png)