commit 2d1ded28e9ce6265deebaeb4e38a4f9571f4d726
parent 403e831d1fd285f6680a87792a460e661bc89b54
Author: Alex Balgavy <alex@balgavy.eu>
Date: Sat, 20 Nov 2021 16:02:03 +0100
ACN notes update
Diffstat:
7 files changed, 99 insertions(+), 0 deletions(-)
diff --git a/content/acn-notes/_index.md b/content/acn-notes/_index.md
@@ -7,3 +7,5 @@ title = 'Advanced Computer Networks'
2. [Networking basics](networking-basics)
3. [Data structures and algorithms used in network designs](data-structures-and-algorithms-used-in-network-designs)
4. [Network transport](network-transport)
+5. [Datacenter networking](datacenter-networking)
+6. [Datacenter transport](datacenter-transport)
diff --git a/content/acn-notes/datacenter-networking/fat-tree-topology.png b/content/acn-notes/datacenter-networking/fat-tree-topology.png
Binary files differ.
diff --git a/content/acn-notes/datacenter-networking/index.md b/content/acn-notes/datacenter-networking/index.md
@@ -0,0 +1,8 @@
++++
+title = 'Datacenter networking'
++++
+
+# Datacenter networking
+Why not a single giant switch? Limited port density, broadcast storms, isolation.
+
+Tree-based data center network:
diff --git a/content/acn-notes/datacenter-networking/portland.png b/content/acn-notes/datacenter-networking/portland.png
Binary files differ.
diff --git a/content/acn-notes/datacenter-networking/tree-based-datacenter-network.png b/content/acn-notes/datacenter-networking/tree-based-datacenter-network.png
Binary files differ.
diff --git a/content/acn-notes/datacenter-transport.md b/content/acn-notes/datacenter-transport.md
@@ -0,0 +1,38 @@
++++
+title = 'Datacenter transport'
++++
+# Datacenter transport
+In datacenters, network is extremely high speed and low latency.
+
+In TCP:
+- reliable, in-order delivery with seq numbers and acks
+- don't overrun receiver (receiving window `rwnd`) and network (congestion window `cwnd`)
+ - what can be sent is `min(rwnd, cwnd)`
+
+TCP incast problem: client-facing query might have to collect data from many servers → packet drops because capacity overrun at shared commodity switches
+
+Ethernet flow control:
+- overwhelmed ethernet receiver can send "PAUSE" frame to sender, upon which sender stops transmission for certain amount of time
+- designed for end-host overruns, not switches
+- blocks all transmission at Ethernet level (port level)
+
+Priority-based flow control:
+- enhancement over PAUSE frames
+- 8 virtual traffic lanes, one can be selectively stoppe. timeout is configurable.
+- but only 8 lanes, might lead to deadlocks in large networks, and unfairness
+
+Datacenter TCP (DCTCP):
+- pass information about switch queue buildup to senders
+- at sender, react by slowing down transmission
+- Explicit Congestion Notification: standard way of passing "presence of congestion"
+ - part of IP packet header
+ - for queue size of N, when queue occupancy goes beyond K, mark passing packet's ECN bit as "yes"
+- after threshold K, start marking packets with ECN
+ - typical ECN receiver marks acks with ECE flag, until sender acks back with CWR flag bit
+ - DCTCP receiver: only mark acks corresponding to ECN packet
+
+TIMELY
+- use round trip time (RTT) as indication of congestion signal
+- multi-bit, indicating end-to-end congestion through network → no explicit switch support needed to do marking
+- assumes ack-based protocol (TCP)
+- absolute RTTs not used, only gradient. positive means rising RTT, so queue buildup. negative means opposite.
diff --git a/content/acn-notes/network-transport.md b/content/acn-notes/network-transport.md
@@ -13,3 +13,54 @@ Reaching equilibrium quickly: TCP slow-start
- upon receiving ACK, increase congestion window by 1
Packet loss is not good indicator of congestion. Instead, provide a model, estimate parameters for the model based on probing, and decide sending rate using the model.
+
+## Multi-path transport
+Multiple paths, e.g. cellular and wifi simultaneously. Higher throughput, failover from one path to another, and seamless mobility.
+
+To not modify apps, present same socket API and expectations.
+
+During TCP 3-way handshake, set option `MP_CAPABLE` in TCP header for multipath.
+
+To add sub-flows, use token hashed from exchanged key, and use HMAC (code) for auth based on exchanged keys.
+Associate sub-flows with existing connection.
+
+Middleboxes: network equipment that apply special operations on path of network packets.
+- examples: firewall, NAT
+- some inspect TCP traffic and check sequence numbers
+- so, need to have per-flow sequence numbers
+
+All sub-flows share same receive buffer and use same receive window.
+
+MPTCP congestion control goals:
+- be fair to TCP at bottleneck links: take as much capacity as TCP at bottleneck link
+- use efficient paths: each connection should send all traffic on least-congested paths, but keep some traffic on alternative paths as probe
+- perform as well as TCP
+
+Congestion control mechanism:
+- congestion window for each sub-flow
+- increase window for sub-flow for each ACK on that path
+- halve window for each drop on that path
+
+## HTTP
+### HTTP/1
+1 round trip to set up TCP, 2 for TLS1.2.
+
+After setup, only one request/response possible at a time, so Head-of-Line (HoL) blocking.
+
+### HTTP/1.1
+Avoids HoL blocking using multiple TCP connections, allowing concurrent requests/responses.
+
+### HTTP/2
+Multiple streams (each for object) multiplexed on same TCP connection.
+
+Even multiple domains can share same TCP connection.
+
+Supports priority of streams, using dependency tree.
+
+But still has HoL blocking on _TCP_ connection -- packet retransmission for one object delays transmissions of others.
+
+## QUIC
+Protocol to make streaming faster.
+
+No round trip to known server, or if crypto keys not new. And connections survive change of IP address.
+Uses multiple streams over UDP.