index.md (2311B)
1 +++ 2 title = 'Summarising data' 3 template = 'page-math.html' 4 +++ 5 6 # Summarising data 7 8 **data distribution:** we want to know what the data looks like 9 10 a good summary needs to show location, spread, range, extremes, gaps/holes, symmetry, etc. 11 12 ## Graphical summaries 13 14 ### Frequency distribution (table) 15 16 | Grade | Frequency | 17 | --- | --- | 18 | 5 | 2 | 19 | 6 | 1 | 20 | 7 | 3 | 21 | 8 | 2 | 22 | 9 | 1 | 23 | 10 | 2 | 24 25 ### Bar chart 26 ![](1be3b41077a33b1704f30d44a6e6f2a3.png) 27 28 ### Pareto bar chart 29 orders categories based on frequency. only for nominal level of measurement 30 31 ![](6d7b91f79d3d9d8dfea9b17bc06a0b94.png) 32 33 ### Pie chart 34 size of pieces of pie shows frequency of category. 35 36 ![](c712f8daf000f5fb759e01c0e0cae513.png) 37 38 ### Histogram 39 size of bar shows frequency of that category. 40 41 ![](f30ee8b3f6ad23ca7a4a2967d3200a47.png) 42 43 ### Time series 44 shows quantity that varies over time. 45 46 ![](353a35bb43541880822a45b4aedccc33.png) 47 48 ## Descriptive summaries 49 qualitative description: 50 51 - shape: 52 53 ![](121a30a0247a9ef2c8d6f222df0e39ba.png) 54 55 - location: position on x axis (around 0, around 10, etc.) 56 - dispersion: spread out graph == large dispersion 57 58 numerical description: 59 60 - location: measure of center 61 - mean: average (sum everything, divide by the total number) 62 - median: sort, find the middle number 63 - mode: most often occurring value (highest frequency) 64 - unimodal: unique mode 65 - bimodal: two modes 66 - multimodal: more than two modes 67 - dispersion: 68 - measures of variation 69 - sample standard deviation (how much values deviate from mean) 70 - same units as data (unlike variance) 71 - standard deviation is $\sqrt{s^{2}}$ 72 - $s^{2} = \frac{\sum_{i=1} n(x_{i} - \bar{x}^{2})}{n-1}$ 73 - for population: σ², σ 74 - range 75 - (minimum - maximum) 76 - sensitive to extreme values 77 - relative standing 78 - percentiles, quartiles (special percentiles: Q1, Q2 (median), Q3) 79 - IQR: interquartile range = (Q3 - Q1) 80 - 5-number summary: min, Q1, median (Q2), Q3, max 81 - boxplot is graph of this 82 - whiskers are lines from box (by default, not more than 1.5 × IQR 83 - outliers: points outside of whiskers 84 85 ![](2622ba4db3e301150ce401c70344ceba.png)