lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

index.md (2478B)


      1 +++
      2 title = 'Datacenter networking'
      3 +++
      4 
      5 # Datacenter networking
      6 Why not a single giant switch? Limited port density, broadcast storms, isolation.
      7 
      8 Tree-based data center network:
      9 
     10 ![Diagram of tree-based network](tree-based-datacenter-network.png)
     11 
     12 Bottleneck is in the top 2 layers.
     13 
     14 Performance metrics:
     15 - bisection width: minimum number of links cut to divide network into two halves
     16 - bisection bandwidth: minimum bandwidth of links that divide network into two halves
     17 - full bisection bandwidth: one half of nodes can communicate at the same time with other half of nodes
     18 
     19 Oversubscription: ratio -- (worst-case required aggregate bandwidth among end-hosts) : (total bisection bandwidth of network topology)
     20 - 1:1 -- all hosts can use full uplink capacity
     21 - 5:1 -- only 20% of host bandwidth may be available
     22 
     23 ## Fat-tree
     24 Fat-tree topology: emulate single huge switch with many smaller switches
     25 
     26 ![Fat-tree topology diagram](fat-tree-topology.png)
     27 
     28 Needs to be backward compatible with IP/Ethernet, so routing algorithms naively choose shortest path, leading to bottleneck. And you get complex wiring.
     29 
     30 Addressing:
     31 - 10.0.0.0/8 private address block
     32 - pod switches: 10.pod.switch.1
     33 - core switches: 10.k.j.i, with i and j core positions in (k/2)² core switches
     34 - hosts: 10.pod.switch.id
     35 
     36 Forwarding with two-level lookup table:
     37 - prefix used for forwarding intra-pod traffic
     38 - suffixes for forwarding inter-pod traffic
     39 
     40 Routing:
     41 - prefixes in two-level lookup table prevent intra-pod traffic from leaving pod
     42 - each host-to-host communication has single static path
     43 
     44 Flow collision can lead to bottleneck:
     45 - use equal-cost multi-path (ECMP): static path for each flow
     46 - or flow scheduling: have centralised scheduler to assign flows to paths
     47 
     48 To solve cabling issue, organize switches into pod racks.
     49 
     50 Unaddressed issues:
     51 - no support for seamless VM migration, because IPs location-dependent
     52 - plug-and-play not possible: IPs pre-assigned to switches and hosts
     53 
     54 ## PortLand: layer 2 system
     55 Intuition: separate node location from node identifier.
     56 - IP is node identifier
     57 - Pseudo MAC is node location
     58 
     59 Fabric manager maintains IP → PMAC mapping, and facilitates fault tolerance.
     60 
     61 
     62 Switches self-discover location by exchanging Location Discover Messages (LDMs):
     63 - tree level/role: based on neighbor identify
     64 - pod number: get from Fabric manager
     65 - position number: aggregation switches help top-of-rack switches choose unique number
     66 
     67 ![PortLand workflow](portland.png)