lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

commit 2d1ded28e9ce6265deebaeb4e38a4f9571f4d726
parent 403e831d1fd285f6680a87792a460e661bc89b54
Author: Alex Balgavy <alex@balgavy.eu>
Date:   Sat, 20 Nov 2021 16:02:03 +0100

ACN notes update

Diffstat:
Mcontent/acn-notes/_index.md | 2++
Acontent/acn-notes/datacenter-networking/fat-tree-topology.png | 0
Acontent/acn-notes/datacenter-networking/index.md | 8++++++++
Acontent/acn-notes/datacenter-networking/portland.png | 0
Acontent/acn-notes/datacenter-networking/tree-based-datacenter-network.png | 0
Acontent/acn-notes/datacenter-transport.md | 38++++++++++++++++++++++++++++++++++++++
Mcontent/acn-notes/network-transport.md | 51+++++++++++++++++++++++++++++++++++++++++++++++++++
7 files changed, 99 insertions(+), 0 deletions(-)

diff --git a/content/acn-notes/_index.md b/content/acn-notes/_index.md @@ -7,3 +7,5 @@ title = 'Advanced Computer Networks' 2. [Networking basics](networking-basics) 3. [Data structures and algorithms used in network designs](data-structures-and-algorithms-used-in-network-designs) 4. [Network transport](network-transport) +5. [Datacenter networking](datacenter-networking) +6. [Datacenter transport](datacenter-transport) diff --git a/content/acn-notes/datacenter-networking/fat-tree-topology.png b/content/acn-notes/datacenter-networking/fat-tree-topology.png Binary files differ. diff --git a/content/acn-notes/datacenter-networking/index.md b/content/acn-notes/datacenter-networking/index.md @@ -0,0 +1,8 @@ ++++ +title = 'Datacenter networking' ++++ + +# Datacenter networking +Why not a single giant switch? Limited port density, broadcast storms, isolation. + +Tree-based data center network: diff --git a/content/acn-notes/datacenter-networking/portland.png b/content/acn-notes/datacenter-networking/portland.png Binary files differ. diff --git a/content/acn-notes/datacenter-networking/tree-based-datacenter-network.png b/content/acn-notes/datacenter-networking/tree-based-datacenter-network.png Binary files differ. diff --git a/content/acn-notes/datacenter-transport.md b/content/acn-notes/datacenter-transport.md @@ -0,0 +1,38 @@ ++++ +title = 'Datacenter transport' ++++ +# Datacenter transport +In datacenters, network is extremely high speed and low latency. + +In TCP: +- reliable, in-order delivery with seq numbers and acks +- don't overrun receiver (receiving window `rwnd`) and network (congestion window `cwnd`) + - what can be sent is `min(rwnd, cwnd)` + +TCP incast problem: client-facing query might have to collect data from many servers → packet drops because capacity overrun at shared commodity switches + +Ethernet flow control: +- overwhelmed ethernet receiver can send "PAUSE" frame to sender, upon which sender stops transmission for certain amount of time +- designed for end-host overruns, not switches +- blocks all transmission at Ethernet level (port level) + +Priority-based flow control: +- enhancement over PAUSE frames +- 8 virtual traffic lanes, one can be selectively stoppe. timeout is configurable. +- but only 8 lanes, might lead to deadlocks in large networks, and unfairness + +Datacenter TCP (DCTCP): +- pass information about switch queue buildup to senders +- at sender, react by slowing down transmission +- Explicit Congestion Notification: standard way of passing "presence of congestion" + - part of IP packet header + - for queue size of N, when queue occupancy goes beyond K, mark passing packet's ECN bit as "yes" +- after threshold K, start marking packets with ECN + - typical ECN receiver marks acks with ECE flag, until sender acks back with CWR flag bit + - DCTCP receiver: only mark acks corresponding to ECN packet + +TIMELY +- use round trip time (RTT) as indication of congestion signal +- multi-bit, indicating end-to-end congestion through network → no explicit switch support needed to do marking +- assumes ack-based protocol (TCP) +- absolute RTTs not used, only gradient. positive means rising RTT, so queue buildup. negative means opposite. diff --git a/content/acn-notes/network-transport.md b/content/acn-notes/network-transport.md @@ -13,3 +13,54 @@ Reaching equilibrium quickly: TCP slow-start - upon receiving ACK, increase congestion window by 1 Packet loss is not good indicator of congestion. Instead, provide a model, estimate parameters for the model based on probing, and decide sending rate using the model. + +## Multi-path transport +Multiple paths, e.g. cellular and wifi simultaneously. Higher throughput, failover from one path to another, and seamless mobility. + +To not modify apps, present same socket API and expectations. + +During TCP 3-way handshake, set option `MP_CAPABLE` in TCP header for multipath. + +To add sub-flows, use token hashed from exchanged key, and use HMAC (code) for auth based on exchanged keys. +Associate sub-flows with existing connection. + +Middleboxes: network equipment that apply special operations on path of network packets. +- examples: firewall, NAT +- some inspect TCP traffic and check sequence numbers +- so, need to have per-flow sequence numbers + +All sub-flows share same receive buffer and use same receive window. + +MPTCP congestion control goals: +- be fair to TCP at bottleneck links: take as much capacity as TCP at bottleneck link +- use efficient paths: each connection should send all traffic on least-congested paths, but keep some traffic on alternative paths as probe +- perform as well as TCP + +Congestion control mechanism: +- congestion window for each sub-flow +- increase window for sub-flow for each ACK on that path +- halve window for each drop on that path + +## HTTP +### HTTP/1 +1 round trip to set up TCP, 2 for TLS1.2. + +After setup, only one request/response possible at a time, so Head-of-Line (HoL) blocking. + +### HTTP/1.1 +Avoids HoL blocking using multiple TCP connections, allowing concurrent requests/responses. + +### HTTP/2 +Multiple streams (each for object) multiplexed on same TCP connection. + +Even multiple domains can share same TCP connection. + +Supports priority of streams, using dependency tree. + +But still has HoL blocking on _TCP_ connection -- packet retransmission for one object delays transmissions of others. + +## QUIC +Protocol to make streaming faster. + +No round trip to known server, or if crypto keys not new. And connections survive change of IP address. +Uses multiple streams over UDP.