lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

index.md (2311B)


      1 +++
      2 title = 'Summarising data'
      3 template = 'page-math.html'
      4 +++
      5 
      6 # Summarising data
      7 
      8 **data distribution:** we want to know what the data looks like
      9 
     10 a good summary needs to show location, spread, range, extremes, gaps/holes, symmetry, etc.
     11 
     12 ## Graphical summaries
     13 
     14 ### Frequency distribution (table)
     15 
     16 | Grade | Frequency |
     17 | --- | --- |
     18 | 5   | 2   |
     19 | 6   | 1   |
     20 | 7   | 3   |
     21 | 8   | 2   |
     22 | 9   | 1   |
     23 | 10  | 2   |
     24 
     25 ### Bar chart
     26 ![](1be3b41077a33b1704f30d44a6e6f2a3.png)
     27 
     28 ### Pareto bar chart
     29 orders categories based on frequency. only for nominal level of measurement
     30 
     31 ![](6d7b91f79d3d9d8dfea9b17bc06a0b94.png)
     32 
     33 ### Pie chart
     34 size of pieces of pie shows frequency of category.
     35 
     36 ![](c712f8daf000f5fb759e01c0e0cae513.png)
     37 
     38 ### Histogram
     39 size of bar shows frequency of that category.
     40 
     41 ![](f30ee8b3f6ad23ca7a4a2967d3200a47.png)
     42 
     43 ### Time series
     44 shows quantity that varies over time.
     45 
     46 ![](353a35bb43541880822a45b4aedccc33.png)
     47 
     48 ## Descriptive summaries
     49 qualitative description:
     50 
     51 - shape:
     52 
     53     ![](121a30a0247a9ef2c8d6f222df0e39ba.png)
     54 
     55 - location: position on x axis (around 0, around 10, etc.)
     56 - dispersion: spread out graph == large dispersion
     57 
     58 numerical description:
     59 
     60 - location: measure of center
     61     - mean: average (sum everything, divide by the total number)
     62     - median: sort, find the middle number
     63     - mode: most often occurring value (highest frequency)
     64         - unimodal: unique mode
     65         - bimodal: two modes
     66         - multimodal: more than two modes
     67 - dispersion:
     68     - measures of variation
     69         - sample standard deviation (how much values deviate from mean)
     70             - same units as data (unlike variance)
     71             - standard deviation is $\sqrt{s^{2}}$
     72             - $s^{2} = \frac{\sum_{i=1} n(x_{i} - \bar{x}^{2})}{n-1}$
     73             - for population: σ², σ
     74         - range
     75             - (minimum - maximum)
     76             - sensitive to extreme values
     77     - relative standing
     78         - percentiles, quartiles (special percentiles: Q1, Q2 (median), Q3)
     79         - IQR: interquartile range = (Q3 - Q1)
     80         - 5-number summary: min, Q1, median (Q2), Q3, max
     81             - boxplot is graph of this
     82             - whiskers are lines from box (by default, not more than 1.5 × IQR
     83             - outliers: points outside of whiskers
     84 
     85 ![](2622ba4db3e301150ce401c70344ceba.png)