index.md (2311B)
1 +++ 2 title = 'Summarising data' 3 template = 'page-math.html' 4 +++ 5 6 # Summarising data 7 8 **data distribution:** we want to know what the data looks like 9 10 a good summary needs to show location, spread, range, extremes, gaps/holes, symmetry, etc. 11 12 ## Graphical summaries 13 14 ### Frequency distribution (table) 15 16 | Grade | Frequency | 17 | --- | --- | 18 | 5 | 2 | 19 | 6 | 1 | 20 | 7 | 3 | 21 | 8 | 2 | 22 | 9 | 1 | 23 | 10 | 2 | 24 25 ### Bar chart 26  27 28 ### Pareto bar chart 29 orders categories based on frequency. only for nominal level of measurement 30 31  32 33 ### Pie chart 34 size of pieces of pie shows frequency of category. 35 36  37 38 ### Histogram 39 size of bar shows frequency of that category. 40 41  42 43 ### Time series 44 shows quantity that varies over time. 45 46  47 48 ## Descriptive summaries 49 qualitative description: 50 51 - shape: 52 53  54 55 - location: position on x axis (around 0, around 10, etc.) 56 - dispersion: spread out graph == large dispersion 57 58 numerical description: 59 60 - location: measure of center 61 - mean: average (sum everything, divide by the total number) 62 - median: sort, find the middle number 63 - mode: most often occurring value (highest frequency) 64 - unimodal: unique mode 65 - bimodal: two modes 66 - multimodal: more than two modes 67 - dispersion: 68 - measures of variation 69 - sample standard deviation (how much values deviate from mean) 70 - same units as data (unlike variance) 71 - standard deviation is $\sqrt{s^{2}}$ 72 - $s^{2} = \frac{\sum_{i=1} n(x_{i} - \bar{x}^{2})}{n-1}$ 73 - for population: σ², σ 74 - range 75 - (minimum - maximum) 76 - sensitive to extreme values 77 - relative standing 78 - percentiles, quartiles (special percentiles: Q1, Q2 (median), Q3) 79 - IQR: interquartile range = (Q3 - Q1) 80 - 5-number summary: min, Q1, median (Q2), Q3, max 81 - boxplot is graph of this 82 - whiskers are lines from box (by default, not more than 1.5 × IQR 83 - outliers: points outside of whiskers 84 85 