lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

Shared-memory multiprocessors.html (3682B)


      1 <?xml version="1.0" encoding="UTF-8"?>
      2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      3 <html><head><link rel="stylesheet" href="sitewide.css" /><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 6.13.1 (455785)"/><meta name="altitude" content="-0.09378249198198318"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2017-12-20 8:08:38 PM +0000"/><meta name="latitude" content="52.37354362350699"/><meta name="longitude" content="4.836282974170986"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2017-12-20 8:52:24 PM +0000"/><title>Shared-memory multiprocessors</title></head><body><div>multiprocessor system has a lot of processors that can do different tasks at the same time</div><div>in a shared-memory multiprocessor, all processors have access to the same memory (probably large)</div><div>memory is distributed across multiple modules, connected by an interconnection network</div><div><br/></div><div>when memory is physically separate processors, all requests go through a network, introducing latency</div><div>if you have the same latency for memory access from all processors, you have a Uniform Memory Access (UMA) multiprocessor (but latency doesn’t magically go away)</div><div><br/></div><div>to improve performance, put a memory module next to each processor</div><div>leads to collection of “nodes”, each with a processor and memory module</div><div>each node is connected to network. no network latency when memory request is local, but if remote, it has to go through the network</div><div>these are Non-Uniform Memory Access (<a href="https://youtu.be/jRx5PrAlUdY?t=1m39s">NUMA</a>) processors</div><div><br/></div><div><img src="Shared-memory%20multiprocessors.resources/screenshot.png" height="676" width="532"/></div><div><br/></div><div><br/></div><div><b>Interconnection networks</b></div><div>suitability is judged in terms of:</div><div><ul><li>bandwidth — capacity of a transmission link to transfer data (bits or bytes per second)</li><li>effective throughput — actual rate of data transfer</li><li>packets — form of data (fixed length and specified format, ideally handled in one clock cycle)</li></ul><div><br/></div></div><div>types commonly used:</div><div><ul><li>buses — set of wires that provide a single shared path for info transfer</li><ul><li>suitable for small number of processors (low contention)</li><li>does not allow new request until the response for the current request is provided</li><li>alternative is split-transaction bus, where request and response can have other events in between them</li></ul><li>ring — point-to-point connections between nodes</li><ul><li>low-latency option 1: bidirectional ring</li><ul><li>halves latency, doubles bandwidth</li><li>increases complexity</li></ul><li>low-latency option 2: hierarchy of rings</li><ul><li>upper-level ring connects lower-level rings</li><li>average latency is reduced</li><li>upper-level ring may become a bottleneck if low-level rings communicate frequently</li></ul></ul><li>crossbar — direct link between any pair of units</li><ul><li>used in UMA multiprocessors to connect processors to memory modules</li><li>enables many simultaneous transfers, if one destination doesn’t get multiple requests</li></ul><li>mesh — like a net over all nodes</li><ul><li>each node connects to its horizontal and vertical neighbours</li><li>wraparound connections can be introduced at edges — “torus”</li></ul><li><br/></li></ul></div><div><br/></div></body></html>