lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

Introduction_ Data.md (2195B)


      1 +++
      2 title = 'Introduction: Data'
      3 template = 'page-math.html'
      4 +++
      5 
      6 # Introduction: Data
      7 
      8 statistics: the science of data - collecting, organising, analysing, interpreting, presenting
      9 
     10 sample: a selected subcollection from the population
     11 
     12 ## Collecting sample data
     13 
     14 concepts:
     15 
     16 - variables:
     17     - independent: might cause the effect being studied
     18     - dependent: represents the effect being studied
     19 - confounding: when there’s too many variables and you have no clue wtf is causing the effect
     20 
     21 sampling methods:
     22 
     23 - voluntary response: subjects decide to be included
     24 - random: each *member* from population has equal probability to be selected
     25 - simple random: each *sample of size n* has equal probability to be selected
     26 - systematic: after starting point, select every k-th member (based on a system)
     27 - convenience: choose what’s convenient
     28 - startified: split population into subgroups with same characteristics, simple random sample each group
     29 - cluster: split population into clusters, then randomly select some of them
     30 
     31 types of studies:
     32 
     33 - observational study: subjects observed, not modified
     34     - retrospective: data from past
     35     - cross-sectional: data from one point in time
     36     - prospective: data to be collected (future)
     37 - experiment: some treatment applied to subjects
     38     - sometimes control and treatment group
     39     - gotta watch out for placebo and observer effects
     40 
     41 ## Types of data
     42 
     43 What to do with data?
     44 
     45 - parameter: numerical measurement of *population* (in Greek: μ, σ, ...)
     46 - statistic: numerical measurement of *sample* (in English: $\bar{x}$, s, ...)
     47 
     48 data can be:
     49 
     50 - qualitative: names or labels (strings)
     51 - quantitative: numbers (ints, floats)
     52     - discrete: countable
     53     - continuous: not countable (on a continuous scale like length, weight, distance)
     54 
     55 you have different levels of measurement:
     56 
     57 - qualitative:
     58     - nominal: no ordering (gender, eye color)
     59     - ordinal: ordering, but differences between categories have no meaning (e.g. agree/disagree)
     60 - quantitative:
     61     - interval: ordering, differences, but no natural zero point (year of birth, temperatures in F/C)
     62     - ratio: ordering, differences, natural zero point (body length, marathon times)