multiprocessing.md (4770B)
1 +++ 2 title = 'Multiprocessing' 3 +++ 4 # Multiprocessing 5 ## Process creation 6 ### Fork 7 fork: creates new child process by duplicating calling process 8 9 Simple part: 10 - duplicate task: copy most info, with exceptions (e.g. PID) 11 - allocate and initialize new kernel stack: setup `thread_info`, copy trap frame and update 12 - and copy some other stuff 13 14 `copy_mm`: 15 - duplicate descriptor of address space of parent (`mm` descriptor): copy basic info, initialize empty address space (new page directory) 16 - duplicate address space: copy VMA info, page tables, and fix up page table entries 17 - `MAP_SHARED`: share page frames 18 - `MAP_PRIVATE (R)`: share page frames 19 - `MAP_PRIVATE (R/W)`: copy-on-write page frames 20 - shared memory: permissions of VMA are identical to permissions of PTE 21 - COW: permissions of VMA are RW, permissions of PTE are read-only 22 - duplicate page frme and remap new private copy into faulting PTE on demand 23 24 COW: copy on write, share the same resource until one copy is written to, at which point you duplicate the resource. 25 - `MAP_PRIVATE` file pages - dedupes binary pages 26 - `MAP_PRIVATE` anonymous forked pages - dedupes pages in process hierarchy 27 - `MAP_PRIVATE` anonymous forked pages - dedupes zero pages for unrelated processes 28 29 ### Exec 30 Executes program pointed by filename. 31 32 implementation: 33 - input and permission checking 34 - load binary headers in memory 35 - find binary format 36 - flush old resources: 37 - reinit `task_struct` and empty `mm` 38 - flush VMAs, page tables, page frames 39 - load binary (binary-format specific) 40 - parse headers and sections 41 - create corresponding VMAs (data, text, stack, etc.) 42 - update `%rsp` and `%rip` in trap frame 43 - `%rsp` is top of user stack 44 - `%rip`: program entry point for statically linked, dynamic linker's entry point for dynamically linked 45 - page tables initially empty 46 - even binary files are demand paged 47 - PF handler maps them from cache using COW-based strategy 48 49 ## Scheduling 50 easy version: 51 - hardware raises periodic timer interrupts 52 - timer interrupt handler invokes simple scheduler 53 - simple scheduler 54 - FIFO scheduling queue 55 - enqueue interrupted process at tail 56 - dequeue process at heat and run on CPU 57 58 Time management: 59 - hardware offers clock and timer circuits 60 - clock circuits: expose counters incd at given frequency, can be used for time measurements and tracking time of day 61 - timer circuits: issue periodic interrupts at given frequency, can be used for scheduling and to implement timers 62 63 Linux clock sources: 64 - tsc: timestamp counter 65 - hpet: high-precision event timer 66 - acpi_pm: ACPI power management timer 67 - real time clock (RTC): persistent (battery-powered), low-precision (seconds) 68 69 Linux clock event devices: 70 - abstraction of timer circuit 71 - clock event device programmed to issue interrupts at `CONFIG_HZ` frequency 72 - both user and kernel preemption possible 73 74 Scheduling per-task building blocks: 75 - state: running, runnable, sleeping 76 - quantum or time slice: max number of jiffies a task can run on CPU 77 - priority 78 - scheduling policy: fair, real-time, etc. 79 80 Linux O(1) scheduler: 81 - preemptive round-robin priority scheduler 82 - preemptive: uses timer interrupts to preempt execution of user processes and maybe reschedule before a process is done (when it runs out of its time-slice/quantum) 83 - round-robin: goes around all processes 84 - priority: some processes are more important 85 - maintains N run queues (1 per priority level) 86 - strategy: 87 - find highest priority queue with runnable task 88 - find first task on that queue and dequeue it 89 - calculate its time slice size based on priority 90 - run it 91 - when time's up, enqueue 92 - repeat 93 - improving fairness: 94 - priorities adjusted based on sleep time 95 - bonuses for IO vs CPU bound processes 96 - all operations are O(1) 97 98 Linux CFS scheduler: 99 - CFS: tasks get a completely fair CPU share 100 - with N tasks: 101 - record how much CPU time each task has 102 - schedule task with biggest delta to `tot_CPU_time/N` 103 - always schedule the task that has spent least CPU time 104 - virtualise time to deal with priorities 105 - increase virtual runtime faster for lower-priority tasks 106 - no heuristics to distinguish tasks 107 - uses red-black tree instead of run queues 108 109 ## IPC 110 communications, sync, signals 111 112 Shared mem: 113 - System V: get/create shmem segment by key, attach/detach 114 115 semaphores: 116 - counter shared across processes 117 - atomic wait (decrement), release (increment) if val >= -sem\_op 118 119 message queue: 120 - mailbox with senders and receivers 121 - get/create queue by key & permission 122 - block if queue full (on send) or empty (on receive) 123 124 POSIX: uses names instead of keys, thread safety, ref counting, shmem is file oriented