lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

hypothesis-testing.md (4089B)


      1 +++
      2 title = 'Hypothesis testing'
      3 template = 'page-math.html'
      4 +++
      5 
      6 # Hypothesis testing
      7 
      8 If σ is known, use Z scores. If not, use T scores and $s_{n}$ (or if sample size is below 30).
      9 
     10 ## The steps
     11 
     12 1. Choose population parameter
     13 2. Formulate null and alternative hypotheses. Choose significance level.
     14 
     15     - H0: parameter = some value
     16     - HA: depends, can be two-tailed or one-tailed
     17         - one-tailed: param < value or param > value
     18         - two-tailed: param ≠ value
     19 
     20 3. Collect data.
     21 
     22 4. Choose test statistic (based on parameter) and identify its distribution under H0
     23 
     24 5. Calculate value of test statistic.
     25 6. Find p-value, or critical region based on significance.
     26 
     27     - watch out for the critical region. if one-tailed test, have to divide significance by 2 first.
     28 
     29 1. Decide whether or not to reject the null hypothesis:
     30 
     31     - p-value:
     32         - if p-value ≤ significance, reject
     33         - otherwise, fail to reject
     34     - critical values:
     35         - if Z-score or T-score not in critical region, fail to reject
     36         - otherwise, reject
     37 
     38 **YOU NEVER ACCEPT HYPOTHESES**
     39 
     40 ## Errors in testing
     41 
     42 |     |     |     |
     43 | --- | --- | --- |
     44 |     | H0 true | H0 false |
     45 | reject H0 | Type I | fine |
     46 | not reject H0 | fine | type II |
     47 
     48 - P(Type I error) = α (significance level)
     49 - P(Type II error) = β (depends on sample size and actual population parameter)
     50 
     51 ## Proportion test
     52 
     53 test statistic:
     54 
     55 $Z = \frac{\hat{P}_{n} - p}{\sqrt{\frac{p(1-p)}{n}}}$
     56 
     57 ## Mean test
     58 
     59 **Test statistic iff σ known:**
     60 
     61 $Z = \frac{\bar{X}_{n} - \mu}{\frac{\sigma}{\sqrt{n}}}$
     62 
     63 has standard normal distribution under null hypothesis.
     64 
     65 **Test statistic otherwise:**
     66 
     67 basically just replace σ with its estimator $\frac{s_n}{\sqrt{n}}$
     68 
     69 $T = \frac{\bar{X}_{n} - \mu}{\frac{s_n}{\sqrt{n}}}$
     70 
     71 has t-distribution with n−1 degrees of freedom under null hypothesis.
     72 
     73 **Confidence interval (1−α) for μ:**
     74 
     75 
     76 $\text{lower, upper} = \bar{x}\_{n} \pm t\_{n-1, \alpha/2} \times \frac{s_n}{\sqrt{n}}$
     77 
     78 What does $t_{n-1, \alpha / 2}$ mean? Well, we need a t-score, with n−1 degrees of freedom. Divide significance by 2 because α is the full area (both tails) and since we’re adding/subtracting a t-score, we want to find the score corresponding to the area in one tail.
     79 
     80 ## Two samples
     81 
     82 ### Dependent
     83 
     84 dependent: values in one sample are related to values in the other sample, or form natural matched pairs
     85 
     86 to test, we look at the *difference* of means.
     87 
     88 null hypothesis can be either no difference, or that difference is a certain value. alternative hypothesis can basically be whatever.
     89 
     90 calculate the differences for each x, then have a sample mean of differences $\bar{D}$ and standard deviation of differences $s_{d}$.
     91 
     92 test statistic:
     93 
     94 $T_{d} = \frac{\bar{D} - (\mu_{1} - \mu_{2})}{\frac{s_{d}}{\sqrt{n}}}$
     95 
     96 which under null hypothesis has t-distribution with n−1 degrees of freedom.
     97 
     98 ### Independent
     99 
    100 independent: no relationship between two samples
    101 
    102 #### Assuming equal σ
    103 
    104 if sample randomly drawn from same population, we assume that σ₁ = σ₂.
    105 
    106 test statistic:
    107 
    108 $T\_{2}^{eq} = \frac{(\bar{X}\_{1} - \bar{X}\_{2}) - (\mu\_{1} - \mu\_{2})}{\sqrt{\frac{s^{2}\_{p}}{n\_{1}} + \frac{s^{2}\_{p}}{n\_{2}}}}$
    109 
    110 the pooled sample variance is:
    111 
    112 $s\_{p}^{2} = \frac{(n\_{1} - 1) s\_{1}^{2} + (n\_{2} - 1) s\_{2}^{2}}{n\_{1} + n\_{2} - 2}$
    113 
    114 #### Not assuming equal σ
    115 
    116 test statistic:
    117 
    118 $T\_{2} = \frac{(\bar{X}\_{1} - \bar{X}\_{2}) - (\mu\_{1} - \mu\_{2})}{\sqrt{\frac{s\_{1}^{2}}{n\_{1}} + \frac{s\_{2}^{2}}{n\_{2}}}}$
    119 
    120 which under null hypothesis has t-distribution with $\bar{n}$ degrees of freedom. $\bar{n}$ at the exam is the smallest of the two sample sizes.
    121 
    122 ## Two proportions
    123 
    124 H0: p1 = p2
    125 
    126 test statistic:
    127 
    128 $z\_{p} = \frac{(\hat{p}\_{1} - \hat{p}\_{2})}{\sqrt{\frac{\bar{p} (1-\bar{p})}{n\_{1}} + \frac{\bar{p}(1-\bar{p})}{n\_{2}}}}$
    129 
    130 (1−α) CI for p1−p2:
    131 
    132 $(\hat{p}_{1} - \hat{p}_{2}) \pm E$ where
    133 
    134 $E = z\_{\alpha / 2} \times \sqrt{\frac{\hat{p}\_{1} (1-\hat{p}\_{1})}{n\_{1}} + \frac{\hat{p}\_{2} (1-\hat{p}\_{2})}{n\_{2}}}$
    135 
    136 P(Type I error) = α (significance level)