lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

Summarising data.html (6869B)


      1 <?xml version="1.0" encoding="UTF-8"?>
      2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      3 <html><head><link rel="stylesheet" href="sitewide.css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.6 (457297)"/><meta name="altitude" content="-4.208069801330566"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2018-12-16 00:43:33 +0000"/><meta name="latitude" content="52.30035400390625"/><meta name="longitude" content="4.988170682800604"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2018-12-16 01:27:26 +0000"/><title>Summarising data</title></head><body><h1>Summarising data</h1><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div><span style="font-weight: bold;">data distribution:</span> we want to know what the data looks like</div><div>
      4 a good summary needs to show location, spread, range, extremes, gaps/holes, symmetry, etc.</div></div><h2>Graphical summaries</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-weight: bold;">Frequency distribution (table)</span></div><table style="border-collapse: collapse; min-width: 100%;"><colgroup><col style="width: 130px;"/><col style="width: 130px;"/></colgroup><tbody><tr><td style="width: 130px; padding: 8px; border: 1px solid;">Grade</td><td style="width: 130px; padding: 8px; border: 1px solid;">Frequency</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">5</td><td style="width: 130px; padding: 8px; border: 1px solid;">2</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">6</td><td style="width: 130px; padding: 8px; border: 1px solid;">1</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">7</td><td style="width: 130px; padding: 8px; border: 1px solid;">3</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">8</td><td style="width: 130px; padding: 8px; border: 1px solid;">2</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">9</td><td style="width: 130px; padding: 8px; border: 1px solid;">1</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">10</td><td style="width: 130px; padding: 8px; border: 1px solid;">2</td></tr></tbody></table><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-weight: bold;">Bar chart</span></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Summarising%20data.resources/1025F6BC-BC40-466C-8EA5-D814F6ED68E7.png" height="357" width="752"/><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div><span style="font-weight: bold;">Pareto bar chart</span></div><div>
      5 orders categories based on frequency. only for nominal level of measurement</div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Summarising%20data.resources/DF440523-F774-4FC5-8D67-99AA4637CC66.png" height="293" width="743"/><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div><span style="font-weight: bold;">Pie chart</span></div><div>
      6 size of pieces of pie shows frequency of category.</div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Summarising%20data.resources/BB20B4B3-C7EE-4AF9-BAEE-8E1F18E5EDDC.png" height="213" width="230"/><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div><span style="font-weight: bold;">Histogram</span></div><div>
      7 size of bar shows frequency of that category.</div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Summarising%20data.resources/95ACBC8B-9387-4A03-8A89-475258CBC80A.png" height="315" width="789"/><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div><span style="font-weight: bold;">Time series</span></div><div>
      8 shows quantity that varies over time.</div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Summarising%20data.resources/0885D1E7-DBA9-4897-89EB-7C5416C48486.png" height="267" width="760"/><br/></div><h2>Descriptive summaries</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">qualitative description:</div><ul><li><div>shape:</div><div>
      9  </div><div>
     10 <img src="Summarising%20data.resources/5606E2CB-CE6C-4438-9FE8-F9EDA4144CAE.png" height="407" width="783"/></div></li><li><div>location: position on x axis (around 0, around 10, etc.)</div></li><li><div>dispersion: spread out graph == large dispersion</div></li></ul><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">numerical description:</div><ul><li><div>location: measure of center
     11 </div></li><ul><li><div>mean: average (sum everything, divide by the total number)</div></li><li><div>median: sort, find the middle number</div></li><li><div>mode: most often occurring value (highest frequency)
     12 </div></li><ul><li><div>unimodal: unique mode</div></li><li><div>bimodal: two modes</div></li><li><div>multimodal: more than two modes</div></li></ul></ul><li><div>dispersion:
     13 </div></li><ul><li><div>measures of variation
     14 </div></li><ul><li><div>sample standard deviation (how much values deviate from mean)
     15 </div></li><ul><li><div>same units as data (unlike variance)</div></li><li><div>standard deviation is 
     16 <img src="Summarising%20data.resources/EA168496-8A42-4DF8-AC6C-F0D5811C25BA.png" height="16" width="25"/></div></li><li><div>
     17 <img src="Summarising%20data.resources/91E716FB-58B4-4D1A-921F-73100090781C.png" height="33" width="132"/></div></li><li><div>for population:
     18 <img src="Summarising%20data.resources/1D9573FE-46B1-48A9-BB45-DFE435C80568.png" height="16" width="33"/></div></li></ul><li><div>range
     19 </div></li><ul><li><div>
     20 <img src="Summarising%20data.resources/D1AC7FAB-71A0-4065-A141-87E6BD0B14DA.png" height="11" width="145"/></div></li><li><div>sensitive to extreme values</div></li></ul></ul><li><div>relative standing
     21 </div></li><ul><li><div>percentiles, quartiles (special percentiles: Q1, Q2 (median), Q3)</div></li><li><div>IQR: interquartile range = 
     22 <img src="Summarising%20data.resources/212AC4F9-52D4-49F3-B2E5-F7CAEF56AC7F.png" height="14" width="52"/></div></li><li><div>5-number summary: min, Q1, median (Q2), Q3, max
     23 </div></li><ul><li><div>boxplot is graph of this</div></li><li><div>whiskers are lines from box (by default, not more than 
     24 <img src="Summarising%20data.resources/F67961D8-6174-4803-8055-23510B9410BF.png" height="14" width="64"/>)</div></li><li><div>outliers: points outside of whiskers</div><div>
     25  </div><div>
     26 <img src="Summarising%20data.resources/DB880F6E-D4CA-4F2F-BA06-0A56ECFBDCC0.png" height="234" width="694"/></div></li></ul></ul></ul></ul><div><br/></div></body></html>