datacenter-transport.md (1844B)
1 +++ 2 title = 'Datacenter transport' 3 +++ 4 # Datacenter transport 5 In datacenters, network is extremely high speed and low latency. 6 7 In TCP: 8 - reliable, in-order delivery with seq numbers and acks 9 - don't overrun receiver (receiving window `rwnd`) and network (congestion window `cwnd`) 10 - what can be sent is `min(rwnd, cwnd)` 11 12 TCP incast problem: client-facing query might have to collect data from many servers → packet drops because capacity overrun at shared commodity switches 13 14 Ethernet flow control: 15 - overwhelmed ethernet receiver can send "PAUSE" frame to sender, upon which sender stops transmission for certain amount of time 16 - designed for end-host overruns, not switches 17 - blocks all transmission at Ethernet level (port level) 18 19 Priority-based flow control: 20 - enhancement over PAUSE frames 21 - 8 virtual traffic lanes, one can be selectively stoppe. timeout is configurable. 22 - but only 8 lanes, might lead to deadlocks in large networks, and unfairness 23 24 Datacenter TCP (DCTCP): 25 - pass information about switch queue buildup to senders 26 - at sender, react by slowing down transmission 27 - Explicit Congestion Notification: standard way of passing "presence of congestion" 28 - part of IP packet header 29 - for queue size of N, when queue occupancy goes beyond K, mark passing packet's ECN bit as "yes" 30 - after threshold K, start marking packets with ECN 31 - typical ECN receiver marks acks with ECE flag, until sender acks back with CWR flag bit 32 - DCTCP receiver: only mark acks corresponding to ECN packet 33 34 TIMELY 35 - use round trip time (RTT) as indication of congestion signal 36 - multi-bit, indicating end-to-end congestion through network → no explicit switch support needed to do marking 37 - assumes ack-based protocol (TCP) 38 - absolute RTTs not used, only gradient. positive means rising RTT, so queue buildup. negative means opposite.