lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

datacenter-transport.md (1844B)


      1 +++
      2 title = 'Datacenter transport'
      3 +++
      4 # Datacenter transport
      5 In datacenters, network is extremely high speed and low latency.
      6 
      7 In TCP:
      8 - reliable, in-order delivery with seq numbers and acks
      9 - don't overrun receiver (receiving window `rwnd`) and network (congestion window `cwnd`)
     10   - what can be sent is `min(rwnd, cwnd)`
     11 
     12 TCP incast problem: client-facing query might have to collect data from many servers → packet drops because capacity overrun at shared commodity switches
     13 
     14 Ethernet flow control:
     15 - overwhelmed ethernet receiver can send "PAUSE" frame to sender, upon which sender stops transmission for certain amount of time
     16 - designed for end-host overruns, not switches
     17 - blocks all transmission at Ethernet level (port level)
     18 
     19 Priority-based flow control:
     20 - enhancement over PAUSE frames
     21 - 8 virtual traffic lanes, one can be selectively stoppe. timeout is configurable.
     22 - but only 8 lanes, might lead to deadlocks in large networks, and unfairness
     23 
     24 Datacenter TCP (DCTCP):
     25 - pass information about switch queue buildup to senders
     26 - at sender, react by slowing down transmission
     27 - Explicit Congestion Notification: standard way of passing "presence of congestion"
     28   - part of IP packet header
     29   - for queue size of N, when queue occupancy goes beyond K, mark passing packet's ECN bit as "yes"
     30 - after threshold K, start marking packets with ECN
     31   - typical ECN receiver marks acks with ECE flag, until sender acks back with CWR flag bit
     32   - DCTCP receiver: only mark acks corresponding to ECN packet
     33 
     34 TIMELY
     35 - use round trip time (RTT) as indication of congestion signal
     36 - multi-bit, indicating end-to-end congestion through network → no explicit switch support needed to do marking
     37 - assumes ack-based protocol (TCP)
     38 - absolute RTTs not used, only gradient. positive means rising RTT, so queue buildup. negative means opposite.