hypothesis-testing.md (4089B)
1 +++ 2 title = 'Hypothesis testing' 3 template = 'page-math.html' 4 +++ 5 6 # Hypothesis testing 7 8 If σ is known, use Z scores. If not, use T scores and $s_{n}$ (or if sample size is below 30). 9 10 ## The steps 11 12 1. Choose population parameter 13 2. Formulate null and alternative hypotheses. Choose significance level. 14 15 - H0: parameter = some value 16 - HA: depends, can be two-tailed or one-tailed 17 - one-tailed: param < value or param > value 18 - two-tailed: param ≠ value 19 20 3. Collect data. 21 22 4. Choose test statistic (based on parameter) and identify its distribution under H0 23 24 5. Calculate value of test statistic. 25 6. Find p-value, or critical region based on significance. 26 27 - watch out for the critical region. if one-tailed test, have to divide significance by 2 first. 28 29 1. Decide whether or not to reject the null hypothesis: 30 31 - p-value: 32 - if p-value ≤ significance, reject 33 - otherwise, fail to reject 34 - critical values: 35 - if Z-score or T-score not in critical region, fail to reject 36 - otherwise, reject 37 38 **YOU NEVER ACCEPT HYPOTHESES** 39 40 ## Errors in testing 41 42 | | | | 43 | --- | --- | --- | 44 | | H0 true | H0 false | 45 | reject H0 | Type I | fine | 46 | not reject H0 | fine | type II | 47 48 - P(Type I error) = α (significance level) 49 - P(Type II error) = β (depends on sample size and actual population parameter) 50 51 ## Proportion test 52 53 test statistic: 54 55 $Z = \frac{\hat{P}_{n} - p}{\sqrt{\frac{p(1-p)}{n}}}$ 56 57 ## Mean test 58 59 **Test statistic iff σ known:** 60 61 $Z = \frac{\bar{X}_{n} - \mu}{\frac{\sigma}{\sqrt{n}}}$ 62 63 has standard normal distribution under null hypothesis. 64 65 **Test statistic otherwise:** 66 67 basically just replace σ with its estimator $\frac{s_n}{\sqrt{n}}$ 68 69 $T = \frac{\bar{X}_{n} - \mu}{\frac{s_n}{\sqrt{n}}}$ 70 71 has t-distribution with n−1 degrees of freedom under null hypothesis. 72 73 **Confidence interval (1−α) for μ:** 74 75 76 $\text{lower, upper} = \bar{x}\_{n} \pm t\_{n-1, \alpha/2} \times \frac{s_n}{\sqrt{n}}$ 77 78 What does $t_{n-1, \alpha / 2}$ mean? Well, we need a t-score, with n−1 degrees of freedom. Divide significance by 2 because α is the full area (both tails) and since we’re adding/subtracting a t-score, we want to find the score corresponding to the area in one tail. 79 80 ## Two samples 81 82 ### Dependent 83 84 dependent: values in one sample are related to values in the other sample, or form natural matched pairs 85 86 to test, we look at the *difference* of means. 87 88 null hypothesis can be either no difference, or that difference is a certain value. alternative hypothesis can basically be whatever. 89 90 calculate the differences for each x, then have a sample mean of differences $\bar{D}$ and standard deviation of differences $s_{d}$. 91 92 test statistic: 93 94 $T_{d} = \frac{\bar{D} - (\mu_{1} - \mu_{2})}{\frac{s_{d}}{\sqrt{n}}}$ 95 96 which under null hypothesis has t-distribution with n−1 degrees of freedom. 97 98 ### Independent 99 100 independent: no relationship between two samples 101 102 #### Assuming equal σ 103 104 if sample randomly drawn from same population, we assume that σ₁ = σ₂. 105 106 test statistic: 107 108 $T\_{2}^{eq} = \frac{(\bar{X}\_{1} - \bar{X}\_{2}) - (\mu\_{1} - \mu\_{2})}{\sqrt{\frac{s^{2}\_{p}}{n\_{1}} + \frac{s^{2}\_{p}}{n\_{2}}}}$ 109 110 the pooled sample variance is: 111 112 $s\_{p}^{2} = \frac{(n\_{1} - 1) s\_{1}^{2} + (n\_{2} - 1) s\_{2}^{2}}{n\_{1} + n\_{2} - 2}$ 113 114 #### Not assuming equal σ 115 116 test statistic: 117 118 $T\_{2} = \frac{(\bar{X}\_{1} - \bar{X}\_{2}) - (\mu\_{1} - \mu\_{2})}{\sqrt{\frac{s\_{1}^{2}}{n\_{1}} + \frac{s\_{2}^{2}}{n\_{2}}}}$ 119 120 which under null hypothesis has t-distribution with $\bar{n}$ degrees of freedom. $\bar{n}$ at the exam is the smallest of the two sample sizes. 121 122 ## Two proportions 123 124 H0: p1 = p2 125 126 test statistic: 127 128 $z\_{p} = \frac{(\hat{p}\_{1} - \hat{p}\_{2})}{\sqrt{\frac{\bar{p} (1-\bar{p})}{n\_{1}} + \frac{\bar{p}(1-\bar{p})}{n\_{2}}}}$ 129 130 (1−α) CI for p1−p2: 131 132 $(\hat{p}_{1} - \hat{p}_{2}) \pm E$ where 133 134 $E = z\_{\alpha / 2} \times \sqrt{\frac{\hat{p}\_{1} (1-\hat{p}\_{1})}{n\_{1}} + \frac{\hat{p}\_{2} (1-\hat{p}\_{2})}{n\_{2}}}$ 135 136 P(Type I error) = α (significance level)