notes.tex (32197B)
1 \documentclass[12pt,a4paper,oneside,fleqn]{article} 2 \usepackage{amsmath} 3 \usepackage{amssymb} 4 \usepackage{hyperref} 5 \usepackage[left=1in, right=1in, top=1in, bottom=1in]{geometry} 6 7 \usepackage{fancyhdr} 8 \setlength{\headheight}{15.2pt} \pagestyle{fancy} 9 \rhead{Alex Balgavy} 10 11 % \given{A}{B} ("A given B") % 12 \makeatletter 13 \newcommand{\@givenstar}[2]{\left(#1\;\middle|\;#2\right)} 14 \newcommand{\@givennostar}[3][]{#1(#2\;#1|\;#3#1)} 15 \newcommand{\given}{\@ifstar\@givenstar\@givennostar} 16 \makeatother 17 %%%%%%%%%%%%%%%%%%%%%%%%%%%%% 18 19 \title{A Likelihood Approach to Statistics: Notes} 20 \author{Alex Balgavy} 21 \date{April-May 2019} 22 23 \begin{document} 24 \maketitle 25 \section{Introduction} 26 When can we say that there is sufficient evidence? 27 A large issue is with the phrasing of conditional probability. 28 There's a difference between \\ 29 $P\given{\text{winning}}{\text{not committed fraud}})$ and $P\given{\text{committed fraud}}{\text{winning}}$. 30 The relative probabilities are important. 31 32 \begin{align*} 33 H_1, H_2\text{: hypotheses}\\ 34 E\text{: evidence/data}\\ 35 \\ 36 \frac{P\given{H_1}{E}}{P\given{H_2}{E}} &= \frac{P(H_1 \cap E)}{P(H_2 \cap E} \\ 37 &= \frac{P\given{E}{H_1} P(H_1)}{P\given{E}{H_2} P(H_2)} \\ 38 &= \frac{P\given{E}{H_1}}{P\given{E}{H_2}} \times \frac{P(H_1)}{P(H_2)}\\ 39 \\ 40 \therefore \underbrace{\frac{P\given{H_1}{E}}{P\given{H_2}{E}}}_\text{posterior odds} &= \underbrace{\frac{P\given{E}{H_1}}{P\given{E}{H_2}}}_\text{likelihood ratio} \times \underbrace{\frac{P(H_1)}{P(H_2)}}_\text{prior odds} 41 \end{align*} 42 43 Questions: 44 45 \begin{itemize} 46 \item When do observations support a hypothesis? 47 \item What does this mean? 48 \item What should I do next? What should I believe? 49 \end{itemize} 50 51 \textbf{Evidence} are data that \textbf{make you change your assessment of the hypotheses of interest}. 52 It doesn't tell you what to believe, but how to change your belief. 53 What to do depends on the risks and consequences. 54 55 The \textbf{likelihood ratio} is \textbf{the extent to which you should change your mind}. 56 57 The \textbf{evidence } is what determines the likelihood ratio. 58 59 \newpage 60 61 \subsection{Exercise 3.13} 62 $E$: driver tests positive on breathalyzer 63 64 $+$: too much alcohol 65 66 $-$: below limit 67 68 \[ 69 LR = \frac{P \given{E}{+}}{P \given{E}{-}} = \frac{0.99}{0.10} = 9.9 70 \] 71 72 Then: 73 74 \begin{align*} 75 \frac{P \given{+}{E}}{P \given{-}{E}} &= LR \times \text{prior odds} = 9.9 \times \overbrace{\frac{P(+)}{P(-)}}^\text{given in ex.}\\ 76 &= 9.9 \times \frac{0.1}{9.9} = 1.1 \\ 77 \frac{P \given{+}{E}}{P \given{-}{E}} &= \frac{x}{1-x} = 1.1 \\ 78 x &= \frac{11}{21} \\ 79 \end{align*} 80 81 \section{Benchmarking} 82 How do you quantify the likelihood ratio? Do a benchmark experiment. 83 84 Example with two hypotheses: 85 86 $H_1$: box has all white balls 87 88 $H_2$: box has 50\% white, 50\% black balls 89 90 $E$: drawing 5 white balls in a row (with replacement) 91 92 \[ 93 \frac{P \given{E}{H_1}}{P \given{E}{H_2}} = \frac{1}{\frac{1}{32}} = 32 = LR 94 \] 95 96 Then, if some experiment has $LR = 357$, compared to benchmark, it's about as likely as drawing 8-9 white balls in a row. 97 \textit{But} you still can't say whether the situation is $H_1$ or $H_2$, because it depends on the prior odds. 98 99 100 \section{General LR properties} 101 102 The likelihood ratio cannot be \textit{wrong}: given the evidence, the LR points a certain way. 103 But it can be \textit{misleading} and point towards a hypothesis that's not true. 104 105 Probability theory depends on the available information. 106 Image placing bets for or against something -- that's a first indication of the probabilities. 107 \textbf{How often can the LR be misleading, and to what degree?} 108 109 Example: throw a pin, lands pin up with 110 $\begin{cases} H_1: p = \frac{1}{2}\\ 111 H_2: p = \frac{3}{4} 112 \end{cases}$ 113 114 \newpage 115 116 One throw: 117 118 \vspace{1em} 119 \renewcommand{\arraystretch}{1.5} 120 \begin{tabular}{| l | c |} 121 \hline 122 \textbf{outcome} & \textbf{LR} \\ \hline 123 up & $(\frac{1}{2})/(\frac{3}{4}) = \frac{2}{3}$ \\ \hline 124 down & $(\frac{1}{2})/(\frac{1}{4}) = 2$ \\ \hline 125 \end{tabular} 126 \renewcommand{\arraystretch}{1} 127 \vspace{1em} 128 129 If $H_1$ is true, the average LR is \[ 130 \frac{1}{2} \times \frac{2}{3} + \frac{1}{2} \times 2 = \frac{1}{3} + 1 = \frac{4}{3} 131 \] 132 133 If $H_2$ is true, the average LR is \[ 134 \frac{3}{4} \times \frac{2}{3} + \frac{1}{4} \times 2 = 1 135 \] 136 137 With two throws, the average LR will be: 138 139 \begin{itemize} 140 \item if $H_1$ true, $(\frac{4}{3})^2$ 141 \item if $H_2$ true, 1 142 \end{itemize} 143 144 Average LR: $\sum P(\text{outcome}) \times LR_\text{outcome}$ 145 146 If we compute LR for $H_1$ vs $H_2$ then: 147 148 \begin{itemize} 149 \item If $H_1$ is true, on average $LR > 1$ 150 \item If $H_2$ is true, on average $LR = 1$ 151 \item The following always holds: 152 \[ \frac{P \given{LR = x}{H_1}}{P \given{LR = x}{H_2}} = x \] 153 \end{itemize} 154 The LR is a sufficient statistic for the two hypotheses, you won't learn more from seeing the evidence. 155 So if we know $LR(E)$, we don't need to know $E$ itself. 156 157 The probability of misleading evidence is \[ 158 P \given{LR_{H_1, H_2} (E) \geq k}{H_2} \leq \frac{1}{k} 159 \] regardless of $H_1$ and $H_2$. 160 161 LR is not additive, but multiplicative: 162 163 \begin{align*} 164 LR(E_1, E_2) &= LR(E_1) \times LR \given{E_2}{E_1} \\ 165 &= LR(E_1) \times LR(E_2) &&\text{[if independent]} 166 \\\\ 167 \log(LR(E_1, E_2)) &= \log(LR(E_1)) + \log(LR \given{E_2}{E_1}) 168 \\\\ 169 \log(LR) &> 0: \quad \text{support $H_1$}\\ 170 \log(LR) &= 0: \quad \text{no evidence either way}\\ 171 \log(LR) &< 0: \quad \text{support $H_2$} 172 \end{align*} 173 174 \subsection{Exercise 4.1} 175 \subsubsection{What's the likelihood ratio in favour of accused's guilt?} 176 177 \begin{align*} 178 P \given{E}{H_1} &= 1 \\ 179 P \given{E}{H_2} &= \frac{1}{10000} \\ 180 LR &= \frac{1}{\frac{1}{10000}} = 10000 181 \end{align*} 182 183 \subsubsection{How can the value be used?} 184 All you can do with the LR is to update the prior odds by multiplying. 185 186 \subsubsection{What difference would it make if there were a 1\% chance of matching results when in reality they are different?} 187 \begin{align*} 188 P\given{E}{H_1} &= 1 \quad \text{unchanged} \\ 189 P\given{E}{H_2} &= 0.01 \\ 190 LR &= \frac{1}{0.01} = 100 191 \end{align*} 192 193 \subsubsection{Extra: what if, in 1\% of chases, lab mistakenly says there is no match when there really is one?} 194 \begin{align*} 195 P\given{E}{H_1} &= 0.99 \\ 196 P\given{E}{H_2} &= \frac{1}{10000} \quad \text{unchanged from original} \\ 197 LR &= \frac{0.99}{\frac{1}{10000}} = 9900 198 \end{align*} 199 200 \section{Assignment 1 Review} 201 Definitions: 202 203 \begin{align*} 204 H(1)&: \quad \text{All cards in deck are labelled 1}\\ 205 H(i)&: \quad \text{All cards in deck are labelled $i$}\\ 206 H_n&: \quad \text{Deck is normal}\\ 207 E&: \quad \text{Choosing a card with label 1} 208 \end{align*} 209 210 How do you derive the result directly? 211 212 \begin{align*} 213 P\given{E}{H(1)} &= 1 \\ 214 P\given{E}{\text{not } H(1)} &= \frac{P(E \cap \text{not } H(1)}{\text{not } H(1)} \\ 215 &= \frac{p \times \frac{1}{52}}{1-\frac{1-p}{52}} &&\text{normal deck and choosing 1 out of it} \\ 216 &= \frac{p}{52-(1-p)} &&\text{multiply by 52} \\ 217 &= \frac{p}{51+p} \\ 218 LR &= \frac{P\given{E}{H(1)}}{P\given{E}{\text{not } H(1)}} \\ 219 &= \frac{1}{\frac{p}{51+p}} \\ 220 &= \frac{51+p}{p} 221 \end{align*} 222 223 \section{From Data to Decision} 224 The question now is, ``what should I do?'' 225 226 \subsection{With prior probabilities: Bayes rule} 227 Example -- nuchal scan of fetus, to assess probability of trisomy 21. 228 Scan produces evidence $E$. Hypotheses $H_1: $ trisomy 21, $H_2: $ no trisomy 21. $P(H_1)$ is given, based on age of mother: 229 230 \[ 231 \text{Young mother:} \qquad \frac{P(H_1)}{P(H_2)} = \frac{1}{10000}, \quad \text{action A1 if $LR \geq 40$} \\ 232 \] 233 \[ 234 \text{Old mother:} \qquad \frac{P(H_1)}{P(H_2)} = \frac{1}{5}, \quad \text{action A1 if $LR \geq \frac{1}{50}$} \\ 235 \] 236 237 Then compute posterior odds: 238 239 \[ 240 \frac{P\given{H_1}{E}}{P\given{H_2}{E}} = LR \times \frac{P(H_1)}{P(H_2)} 241 \] 242 243 If $LR = \frac{P\given{E}{H_1}}{P\given{E}{H_2}}$ large enough, make a decision: 244 245 \begin{itemize} 246 \item \textbf{A1:} Further testing, or 247 \item \textbf{A2:} no action 248 \end{itemize} 249 250 In The Netherlands, ``large enough'' means $\geq \frac{1}{250}$. 251 252 For young mothers, no more tests unless strong evidence for trisomy 21. 253 For old mothers, do further test unless strong evidence against trisomy 21. 254 Result depends not just on LR, but on product of LR with prior odds. 255 256 \subsection{Without prior probabilities: frequentist approach} 257 Suppose, if $H_1$ (the ``null hypothesis'') is true, we take action A1, and if $H_2$ is true, we take action A2. 258 The options are: 259 260 \vspace{1em} 261 \renewcommand{\arraystretch}{1.5} 262 \begin{tabular}{| c | c | c |} 263 \hline 264 & \textbf{A1} & \textbf{A2} \\ \hline 265 \textbf{H1} & true positive, sensitivity & false positive, type I error \\ \hline 266 \textbf{H2} & false negative, type II error & true negative, specificity \\ \hline 267 \end{tabular} 268 \renewcommand{\arraystretch}{1} 269 \vspace{1em} 270 271 Often we like to control the probability of a type I error, called $\alpha$. $\beta$ is the probability of a type II error ($P\given{\text{decide A1}}{H_2}$). 272 273 \subsubsection{Decision procedure} 274 \begin{enumerate} 275 \item Make probability distributions for evidence $E$ you will gather 276 \item Define a way to decide -- a rejection region $R$ (subset of all possible evidence) 277 \item If evidence $E$ turns out to be in $R$, reject $H_1$ (choose action A2). Otherwise, choose A1. 278 \end{enumerate} 279 280 Such that: 281 282 \begin{itemize} 283 \item If $H_1$ is true, $P\given{E \in R}{H_1} = \alpha$ (fixed, often 0.05). 284 \item Preferably $P\given{E \in R}{H_2} = 1 - \beta$ as large as possible (this is sometimes called the ``power of the test''). 285 \end{itemize} 286 287 \subsubsection{Example} 288 The same thumbtack, with $H_1: p = \frac{1}{4}$, collecting data from 30 trials. 289 If $H_1$ is true, X successes, observe X = x. 290 How do you choose rejection region $R$? 291 One way is to select the most unlikely outcomes in $R$ until their joint probability to happen is too large. 292 E.g. $R = \{ 0, 1, 2, 3, 13, 14, 15 \ldots 30 \}$, this would give $\alpha = 0.03$. 293 294 Well, if there is no alternative, $\beta$ does not exist. So, take $H_2: p \frac{3}{4}$. 295 296 If there are $x$ successes, then 297 298 \begin{align*} 299 LR(X) &= \frac{P\given{X = x}{p = \frac{1}{4}}}{P\given{X = x}{p = \frac{3}{4}}} \\ 300 &= \frac{\binom{30}{x} (\frac{1}{4})^x (\frac{3}{4})^{30-x} }{\binom{30}{x} (\frac{3}{4})^x (\frac{1}{4})^{30-x} } &&\text{[binomial distribution $X \sim B\Big(30, \frac{1}{4}\Big)$]} \\ 301 &= \frac{\frac{1}{4^x}\frac{3^{30-x}}{4^{30-x}}}{\frac{3^x}{4^x}\frac{1}{4^{30-x}}} \\ 302 &= \frac{\frac{3^{30-x}}{4^{30}}}{\frac{3^x}{4^{30}}} \\ 303 &= \frac{3^{30-x}}{3^x} \\ 304 &= 3^{30} - 2x 305 \end{align*} 306 307 We also have the property that 308 309 \[ 310 P \given{X = x}{p = \frac{1}{4}} = P \given{X = 30 - x}{p = \frac{3}{4}} 311 \] 312 Based on the result, then decide: 313 314 \begin{itemize} 315 \item If $x < 15$, $LR > 1$ and supports $H_1: p = \frac{1}{4}$ 316 \item If $x = 15$, $LR = 1$ 317 \item If $x > 15$, $LR < 1$ and supports $H_2: p = \frac{3}{4}$ 318 \end{itemize} 319 320 The table for this binomial distribution with the corresponding LRs is 321 322 \vspace{1em} 323 \renewcommand{\arraystretch}{1.5} 324 \begin{tabular}{| l | c | r | r | r |} 325 \hline 326 \textbf{LR} & \textbf{Rejection region R} & \textbf{$\alpha$} & \textbf{$\beta$} & \textbf{$\alpha + \beta$}\\ \hline 327 $LR \leq 729$ & $x \geq 12$ & $0.0506$ & $\approx 0$ & $0.0506$ \\ \hline 328 $LR \leq 81$ & $x \geq 13$ & $\sim 0.0215$ & $\approx 0$ & $0.0215$ \\ \hline 329 $LR \leq 9$ & $x \geq 14$ & $\sim 0.0081$ & $0.0002$ & $0.0083$ \\ \hline 330 $LR \leq 1$ & $x \geq 15$ & $0.0027$ & $0.0008$ & $0.0035$ \\ \hline 331 $LR \leq \frac{1}{9}$ & $x \geq 16$ & $0.0008$ & $0.0027$ & $0.0035$ \\ \hline 332 & $x \geq 17$ & $0.0002$ & $0.0081$ & $0.0083$ \\ \hline 333 & etc. & & & \\ \hline 334 & $\{0, 1, 2, 3, 13 \ldots 30\}$ & $\sim 0.03$ & $\approx 0$ & $\approx 0.03$ \\ \hline 335 \end{tabular} 336 \renewcommand{\arraystretch}{1} 337 \vspace{1em} 338 339 \subsubsection{Neyman-Pearson lemma} 340 If LR-threshold is used for decision making, eg. 341 \[ 342 R_t = \{ E | LR(E) \leq t \} \qquad \text{t is threshold, whatever number} 343 \] 344 345 Then you get some 346 \begin{align*} 347 \begin{cases} 348 \alpha_t &= P \given{LR(E) \leq t}{H_1} \\ 349 \beta_t &= P \given{LR(E) > t}{H_2} \qquad (< \frac{1}{t} ) 350 \end{cases} 351 \end{align*} 352 353 Suppose I have another procedure with rejection region $R$ and error rates $\alpha_R$ and $\beta_R$. 354 If $\alpha_R < \alpha_t$, then $\beta_R > \beta_t$. 355 356 So, LR is optimal in the sense that it is impossible to improve upon \textit{both} $\alpha_t$ and $\beta_t$ at the same time. 357 Therefore, there is no conceptual reason to use a different procedure (though there may be a practical reason). 358 359 With $t = 1$, the sum of $\alpha + \beta$ is minimal. 360 361 In the example, with $\alpha = 0.05$, we get $5 = 729$. 362 \[ 363 R_{729} = \{ x \geq 12 \} 364 \] 365 366 That is, if $x \geq 12$, $LR = 729$. Evidence supports $H_1$, but $H_1$ is rejected. 367 368 Error rates are predictive, they belong to a procedure for decision making: 369 370 \begin{itemize} 371 \item If $H_1$ true: probability $\alpha$ of error. 372 \item If $H_2$ true: probability $\beta$ of error. 373 \end{itemize} 374 375 It's not true that if you decide for $H_1$, there is a probability $\alpha$ that \textit{you} made an error! 376 377 \subsubsection{Frequentist vs Bayesian statistics} 378 \vspace{1em} 379 \renewcommand{\arraystretch}{1.5} 380 \begin{tabular}{| c | c |} 381 \hline 382 \textbf{Frequentist} & \textbf{Bayesian} \\ \hline 383 no priors & priors \\ 384 predicting data & explaining data \\ 385 LRs for decision making & LRs for updating odds, \textit{then} decision making 386 \\ \hline 387 \end{tabular} 388 \renewcommand{\arraystretch}{1} 389 \vspace{1em} 390 391 If you don't have priors and no good way to estimate them, it may be better to go with the frequentist approach and accept the errors that come with it. 392 393 \section{Neyman-Pearson} 394 To recap, the LR ``decides'' which hypothesis best explains data. 395 Data-driven hypotheses are allowed, but since the posterior odds identity is true, a high LR is compensated by small prior odds. 396 397 Procedure: 398 399 \begin{enumerate} 400 \item Choose $\alpha$ 401 \item Choose t, with t such that 402 \[ 403 P \given{LR < t}{H_1} = \alpha 404 \] 405 406 Choose $A_1$ if $LR_{H_1, H_2})(E) \geq t$ 407 \end{enumerate} 408 409 This means that we choose $A_2$ while there is evidence for $H_1$. 410 411 \subsection{Example (building on the binomial coin from the previous lecture)} 412 $A_1: \theta = \frac{1}{4}, \quad H_2: \theta = \frac{3}{4}, \alpha = 0.05$ 413 414 Choose $A_1$ if $LR \geq 729$ ($\#successes \geq 12$). Why? Because you insist on a small $\alpha$. 415 416 \section{What if only the final decision is given?} 417 What happens if you only get ``an expert's opinion'' and the final decision they took? 418 You can still figure out the evidential value. 419 420 \[ 421 \frac{P \given{E_1}{H_1}}{P \given{E_1}{H_2}} = \frac{1-\alpha}{\beta} \qquad \qquad 422 \frac{P \given{E_2}{H_1}}{P \given{E_2}{H_2}} = \frac{\alpha}{1-\beta} 423 \] 424 425 If $\beta$ is small, LR increases and you get high evidential value. 426 427 \section{P-values: what's wrong with them?} 428 \subsection{Example: researchers' experiments} 429 Goal is to disprove success probability $p = \frac{1}{2}$. 430 20 experiments, the result is 14 successes. 431 $\alpha = 0.05$, $H_1: \enspace p = \frac{1}{2}$, $H_2: \enspace p \neq \frac{1}{2}$. 432 433 Compute $P \given{\geq 14 \cup \leq 6}{H_1} = 0.23$. 434 Since this is $> \alpha$, not significant enough so can't reject $H_1$. 435 But 15 successes would have done it, with probability of 0.0412. 436 So do 20 more trials, with 19 successes. 437 Then $P \given{\geq 33 \cup \leq 7}{H_1} = 0.000422$. 438 439 But rejection of $H_1$ is incorrect here! 440 After 40 experiments, $P \given{\geq 27 \cup \leq 13}{H_1} = 0.05$. 441 The total critical region is: 442 443 \begin{itemize} 444 \item $\leq 5 \cup \geq 15$ if 20 experiments 445 \item $\leq 13 \cup \geq 27$ if 40 experiments 446 \end{itemize} 447 448 Total probability is $\geq 0.05$ 449 450 So what if we do 20 experiments, possibly stopping after 10? 451 Reject $H_1$ if either: 452 453 \begin{itemize} 454 \item After 10 exp. $\geq 9 \cup \leq 1$ successes 455 \item After 20 exp. $\geq 16 \cup \leq 4$ successes 456 \end{itemize} 457 458 Probability under $H_1$ is $\leq 0.05$. 459 460 Then, results are: after 10, 3 successes; after 20, 5 successes. So $H_1$ is not rejected. 461 462 Another researcher only looks after 20 experiments, so for them, 5 successes means reject! 463 464 It's strange that we are using probabilities of outcomes we never saw to interpret the evidence. 465 The LR approach doesn't have this problem. 466 467 \subsection{Example: ability to see color} 468 You have 20 colors. 469 Experiment 1: for people that don't see green, reject $p = \frac{1}{2}$ if \#successes = \{0,1,9,10\}. 470 Experiment 2: $H_1: \enspace p = \frac{1}{2}$ for all colors, reject $H_1$ if at least one person gets 0 or 10. 471 472 Result: experiment with green has 9 successes, experiment with all others in \{1,2,...9\}. 473 What then? 474 Reject and don't reject at the same time? 475 476 Other experiments should not have an effect on evidential value of an experiment. 477 478 \subsection{Example: one-tailed vs. two-tailed} 479 Take $p$ to be unknown success probability. 480 $\alpha = 0.05$, 100 experiments. 481 $H: \enspace p = \frac{1}{2}$, reject H if $\text{successes} \leq 59 \cup \geq 61$. 482 $H': \enspace p \leq \frac{1}{2}$, reject $H'$ if $\text{successes} \geq 59$. 483 484 Suppose 60 successes. Reject $p \leq \frac{1}{2}$ but not $p = \frac{1}{2}$? Wtf? 485 486 \subsection{Example: changing alpha} 487 $H: \enspace p = \frac{1}{2}$, 40 experiments, $\alpha = 0.05$. 488 $P \given{\geq 29 \cup \leq 11}{H} = 0.0003$. 489 490 The researcher sees that $\alpha = 0.01$ would also be ok, so they claim to reject H at level $\alpha = 0.01$. 491 492 This is wrong! $\alpha$ belongs to the whole experiment, it does not relate to an individual outcome. 493 By changing $\alpha$ from experiment to experiment, it loses the \textit{only} interpretation it has. 494 495 \section{P-values of LRs} 496 Suppose $LR = 47$. 497 The p-value is then $P \given{LR \geq 47}{H_2}$. 498 The idea is that, if p-value is very small, then the LR of 47 is extreme for $H_2$. 499 If it's large, then the 47 is `normal' for $H_2$. 500 501 However, this still has no \textit{evidential} value. 502 LR measures strength of evidence. 503 The p-value tells you how rare such a LR is. 504 However, once you have evidence, it doesn't matter how frequently evidence of that strength occurs. 505 506 \subsection{Example: genomes} 507 Two people with genomes g1, g2. $H_1: siblings$, $H_2: unrelated$. 508 509 You can take different types of LRs: 510 511 \begin{align*} 512 LR_{H_1, H_2} (g_1, g_2) &= \frac{P \given{g_1, g_2}{H_1}}{P \given{g_1,g_2}{H_2}} \\ 513 LR' &= \frac{P \given{g_2}{g_1, H_1}}{P \given{g_2}{g_1, H_2}} \\ 514 LR'' &= \frac{P \given{g_1}{g_2, H_1}}{P \given{g_1}{g_2 H_2}} 515 \end{align*} 516 517 These are all basically the same. 518 Take notation $p_1 = P(g_1)$, $p_2 = P(g_2)$. 519 $p_1(g_2) = P(g_1 \text{ for a sibling of someone with } g_2)$. 520 $p_2(g_1) = P(g_2 \text{ for a sibling of someone with } g_1)$. 521 522 Then we can rewrite the LRs from above: 523 524 \begin{align*} 525 LR_{H_1, H_2} (g_1, g_2) &= \frac{p_1 p_2(g_1)}{p_1 p_2} = \frac{p_2(g_1)}{p_2} \\ 526 LR' &= \frac{p_2 (g_1)}{p_2} \\ 527 LR'' &= \frac{p_2 (g_1)}{p_2} 528 \end{align*} 529 530 The p values will be different though, because depending on the fixed genome, the frequency of how often it occurs will be different. 531 This would lead to different actions, even though the LRs are identical, and thus so is the evidence. 532 533 \subsection{Example: disease and test results} 534 Take $H_1: \enspace \text{disease present}$, $H_2: \enspace \text{disease absent}$. 535 536 Experiment 1: 537 538 \vspace{1em} 539 \renewcommand{\arraystretch}{1.5} 540 \begin{tabular}{| l | c | r |} 541 \hline 542 \textbf{} & \textbf{+} & \textbf{-} \\ \hline 543 $H_1$ & 0.94 & 0.06 \\ \hline 544 $H_2$ & 0.02 & 0.98 \\ \hline 545 \end{tabular} 546 \renewcommand{\arraystretch}{1} 547 \vspace{1em} 548 549 \begin{align*} 550 LR(+) &= \frac{P \given{+}{H_1}}{P \given{+}{H_2}} = \frac{0.94}{0.02} = 47 \\ 551 LR(-) &= \frac{P \given{-}{H_1}}{P \given{-}{H_2}} = \frac{0.06}{0.98} = \frac{1}{16} \\ 552 \end{align*} 553 554 Experiment 2 (``0'' means experiment is not carried out): 555 556 \vspace{1em} 557 \renewcommand{\arraystretch}{1.5} 558 \begin{tabular}{| l | c | r | r |} 559 \hline 560 \textbf{} & \textbf{+} & \textbf{0} & \textbf{-} \\ \hline 561 $H_1$ & 0.47 & 0.5 & 0.03 \\ \hline 562 $H_2$ & 0.01 & 0.5 & 0.49 \\ \hline 563 \end{tabular} 564 \renewcommand{\arraystretch}{1} 565 \vspace{1em} 566 567 \begin{align*} 568 LR(+) &= 47 \\ 569 LR(-) &= \frac{1}{16} \\ 570 \end{align*} 571 572 Experiment 3 (``*'' is negative result or no experiment): 573 574 \vspace{1em} 575 \renewcommand{\arraystretch}{1.5} 576 \begin{tabular}{| l | c | r |} 577 \hline 578 \textbf{} & \textbf{+} & \textbf{*} \\ \hline 579 $H_1$ & 0.47 & 0.53 \\ \hline 580 $H_2$ & 0.01 & 0.99 \\ \hline 581 \end{tabular} 582 \renewcommand{\arraystretch}{1} 583 \vspace{1em} 584 585 \[ 586 LR(+) = 47 587 \] 588 589 All of the LRs are the same. 590 So essentially, if a ``+'' is obtained, the evidential value is always the same no matter how it was obtained. 591 592 Per experiment, $P \given{LR \geq 47}{H_2} = $ 593 594 \begin{enumerate} 595 \item 0.02 596 \item 0.01 597 \item 0.01 598 \end{enumerate} 599 600 These are not all the same! 601 602 The p-value relates to the \textit{entire} procedure, that's why it's not the same. 603 The LR relates to an individual outcome, so it's always the same. 604 605 \section{Why confidence intervals are similarly fucked} 606 Recall testing H1 vs H2: define rejection region R, s.t. if sampled data are in R, you ``reject H1'' (take some action). 607 Otherwise, do not reject. 608 609 P-values define R in terms of what might happen if H1 is true, s.t. total probability for data to be in R is $\alpha$. 610 The point is that you can't interpret data in R as evidence against H1. 611 612 Neyman-Pearson: define R using LR threshold t. $R \{ E | LR(E) \leq t\}$ gives you optimality. 613 614 Why p-values suck (recap): 615 616 \begin{itemize} 617 \item do not measure strength of evidence in E against H1 618 \item they are ambiguous (several ways of defining them) 619 \item the probability $\alpha$ is a property of the procedure that you do (how data are gathered), not of the obtained data 620 \end{itemize} 621 622 \subsection{Confidence intervals} 623 Say we have a model (e.g. a Binomial distribution) that generates the data and has unknown param $\theta$ that we want to estimate. 624 Example: $\theta$ mean height of people, model $N(\theta, \sigma^2)$. 625 626 A CI of $1-\alpha$ consists of two functions on data that can be obtained, $\theta_{min}$ and $\theta_{max}$, such that if $\theta$ is true value of the param of interest, it lies between $\theta_{min} (E)$ and $\theta_{max} (E)$ with probability $1-\alpha$ if we repeat sampling of E. 627 628 \subsubsection{Commonly encountered 95\% CI} 629 For data from $N(\theta, \sigma^2)$, if I sample n points $x_1,\ldots,x_n$, estimate $\theta$ by \[ 630 \hat{\theta} = \overline{x} = \frac{1}{n} \sum_{i=1}^n x_i 631 \] 632 633 And take as 95\% CI $\big[ \overline{x} - 1.96 \frac{\sigma}{\sqrt{n}}, \quad \overline{x} + 1.96 \frac{\sigma}{\sqrt{n}} \big]$. 634 $1.96$ is the z-score for the CI. 635 636 Why? It gives the smallest 95\% CI for such data. 637 638 \subsubsection{Binomial data (2 possible outcomes)} 639 Data $x_1,\ldots,x_n$, interested in ``success prob'' $p$ of $p = P(x = 1)$. 640 641 With success probability p, in n points $x_1,\ldots,x_n$, there are k successes (ones) with prob \[ 642 P (X = k) = \binom{n}{k} p^k (1-p)^{n-k} 643 \] (Binomial distribution probability). 644 645 A 95\% CI can b e computed with this, but a good approximation is \[ 646 \theta_{min,max} = \hat{p} \pm 1.96 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} 647 \] where $\hat{p} = \frac{k}{n}$. 648 649 The CI at level $1-\alpha$ contains exactly the values that would not lead to rejection with significance level $\alpha$ (i.e. p-value $\geq \alpha$). 650 651 \subsection{Problems with CIs} 652 CIs suffer from the same problems as p-values: 653 654 \begin{itemize} 655 \item $\alpha$ is a property of the procedure, not of any realized outcome 656 \item ambiguity: lots of choices possible 657 \end{itemize} 658 659 \subsubsection{Example} 660 Want to estimate $\theta$, gather data $x$. 661 662 \[ 663 P(x | \theta) = \begin{cases} \frac{1}{2} \quad x = \theta \\ \frac{1}{2} \quad x = \theta+1 \end{cases} 664 \] 665 666 Gather two points $x_1, x_2$. CI defined as $\big[ \theta_{min} = \min(x_1, x_2), \quad \theta_{max} = \max(x_1, x_2) \big]$ 667 668 This is a 75\% CI. 669 670 But if data are $x_1 = 28$, $x_2 = 29$, then CI is $[28, 29]$ and definitely contains $\theta$. 671 If $x_1 = x_2 = 30$, then CI is $[30]$, $\theta$ could be 29 or 30. 672 If the values for $\theta$ are equally likely, 50\% chance to contain $\theta$. 673 674 \subsubsection{Example} 675 With n points from $N(\mu, \sigma^2)$ normal dist, 95\% CI is $\overline{x} \pm 1.96 \frac{\sigma}{\sqrt{n}}$. 676 677 Let $n=1$ or $n=100$ with probability $\frac{1}{2}$. 678 What's a good 95\% CI? 679 680 \begin{enumerate} 681 \item If $n=1$, 95\% CI is $x_1 \pm 1.96 \sigma$. 682 If $n=100$, 95\% CI is $\overline{x} \pm 1.96 \frac{\sigma}{10}$. 683 But you can do better. 684 \item If $n=1$, $x_1 \pm 1.62\sigma$ (91\% CI). 685 If $n=100$, $\overline{x} \pm 2.72 \frac{\sigma}{10}$ (99\% CI). 686 Overall, this is also 95\% CI. 687 \end{enumerate} 688 689 Why number 2? Expected width of intervals: 690 691 \begin{enumerate} 692 \item $\frac{1}{2}(2 \times 1.96 \sigma) + \frac{1}{2}(2 \times 1.96 \frac{\sigma}{10} = 1.96\sigma + 0.196\sigma = 2.156\sigma$ 693 \item $\frac{1}{2}(2 \times 1.62\sigma) + \frac{1}{2}(2*2.72 \frac{\sigma}{10} = 1.62\sigma + 0.272\sigma = 1.892\sigma$ 694 \end{enumerate} 695 696 \subsubsection{Example} 697 Heart/lung problems with newborns. 698 Conventional medical treatment not very adequate, survival rate not precisely known but ~20\%. 699 New, promising treatment ECMO, survival rate estimated possibly around 80\%. 700 Study ECMO vs CMT. 701 How large should it be? 702 703 Take n patients, number of recoveries is x. 704 705 Say, test $\theta_{ECMO} = 0.8$ vs $\theta_{ECMO} = 0.2$. 706 If x recoveries, LR is 707 708 \begin{align*} 709 \frac{P \given{x}{\theta = 0.8}}{P \given{x}{\theta = 0.2}} &= \frac{\binom{n}{x} (0.8)^x (0.2)^{n-x}}{\binom{n}{x} (0.2)^x (0.8)^{n-x}} \\ 710 &= \frac{4^x}{4^{n-x}} = 4^{2x-n} \\ 711 &= 2^{4x-n} \\ 712 \end{align*} 713 714 If I want $LR \geq 32 (\geq 2^5)$, I need $4x -2n \geq 5$. So $x\geq \frac{2n+5}{4}$. 715 716 We can compute probability to get sufficiently strong evidence in favor of the true hypothesis, or probability of strongly misleading evidence, or probability of not obtaining strong evidence. 717 718 Now suppose 13 out of 17 recoveries. What does this say about $\theta_{ECMO}$? 719 720 We could CI that shit, but CIs have problems. 721 722 The best $\theta_{ECMO} = \frac{13}{17} = 0.76$. 723 How much better than $\theta = 0.5$? 724 725 \[ 726 \frac{P \given{\text{13 out of 17}}{\theta = 0.8}}{P \given{\text{13 out of 17}}{\theta = 0.5}} = 11.5 727 \] 728 729 A likelihood interval. E.g. $\frac{1}{32}$ LI is all values $\theta$ such that LR for $\theta = \frac{13}{17}$ vs $\theta = \theta_0$ is at most 32. 730 731 \section{Notes from Ioannidis} 732 These are notes from the class when discussing the article ``Why Most Published Research Findings Are False'' by Ioannidis. 733 734 $S$ : significant result ($p < 0.05$ ). 735 736 \begin{align*} 737 \frac{P \given{\overline{H_0}}{S}}{P \given{H_0}{S}} &= \frac{P \given{S}{\overline{H_0}}}{P \given{S}{H_0}} \times \underbrace{\frac{P(\overline{H_0})}{P(H_0)}}_\text{$R$ in the article} \\ 738 \frac{P \given{S}{\overline{H_0}}}{P \given{S}{H_0}} &= \frac{1-\beta}{\alpha} 739 \end{align*} 740 741 So $\frac{1-\beta}{\alpha} \times R > 1$ for $H_0$ to be false. 742 743 In notation of Ioannidis, $(1-\beta)R > \alpha$. 744 745 Odds $\frac{P \overline{H_0}}{P(H_0)} = R$ are equivalent to \[ 746 P (\overline{H_0}) = \frac{R}{R+1} \\ 747 P (H_0) = \frac{1}{R+1} 748 \]. 749 750 Total number of research questions $c$ is then: 751 752 \begin{align*} 753 \begin{cases} 754 c \frac{R}{R+1} \quad \text{if $\overline{H_0}$ true} \longrightarrow 755 \begin{cases} 756 S(\overline{H_0}\text{ true}) = (1-\beta) c \frac{R}{R+1} \\ 757 \overline{S}(\overline{H_0}\text{ true}) = \beta c \frac{R}{R+1} 758 \end{cases} 759 \\ 760 c \frac{1}{R+1} \quad \text{if $H_0$ true} \longrightarrow 761 \begin{cases} 762 S(H_0\text{ true}) = \alpha c \frac{1}{R+1} \\ 763 \overline{S}(H_0\text{ true}) = (1-\alpha) c (\frac{1}{R+1}) 764 \end{cases} 765 \end{cases} 766 \end{align*} 767 768 Ioannidis: $\beta = 0.2$, $\alpha = 0.05$. So \[ 769 LR(S) = \frac{1-0.2}{0.05} = \frac{0.8}{0.05} = 16 770 \] 771 772 \subsection{Bias} 773 Bias is when you get more significant findings than warranted by the data. 774 E.g. you try to `clean up the data'. 775 But then your original error rates don't apply anymore. 776 777 Originally, \begin{gather*} 778 P \given{S}{H_0} = \alpha \\ 779 P \given{S}{\overline{H_0}} = 1-\beta 780 \end{gather*} 781 782 Now, \begin{gather*} 783 P \given{S}{H_0} = \alpha + (1-\alpha)u \\ 784 P \given{S}{\overline{H_0}} = (1-\beta) + \beta u 785 \end{gather*} where $u$ is the probability of data becoming significant when they are not. 786 787 Now, with bias, LR of S for $\overline{H_0}$ vs $H_0$ becomes \[ 788 \frac{P \given{S}{\overline{H_0}}}{P \given{S}{H_0}} = \frac{1-\beta+\beta u}{\alpha + (1-\alpha) u} 789 \]. 790 791 $PPV = P \given{\overline{H_0}}{S}$. 792 Plot y-axis odds $\frac{P \given{\overline{H_0}}{S}}{P \given{H_0}{S}}$, x-axis $u$. 793 794 \vspace{1em} 795 Suppose several teams: 796 797 \begin{itemize} 798 \item all the same research question 799 \item all the same $\alpha$ and $\beta$ 800 \item result is published as soon as at least 1 team finds statistically significant result 801 \end{itemize} 802 803 $S$ : at least one team has $p < 0.05$. \[ 804 \frac{P \given{S}{\overline{H_0}}}{P \given{S}{H_0}} = \frac{1 - \beta^n}{1 - (1-\alpha)^n} \\ 805 \] 806 807 As $n$ goes to infinity, the result tends towards 1. 808 809 Corollaries: 810 \begin{itemize} 811 \item smaller studies = less likely for findings to be true 812 \item smaller effect sizes = less likely for findings to be true 813 \item greater number and less selection of tested relationships = less likely for findings to be true 814 \item greater flexibility = less likely for findings to be true 815 \end{itemize} 816 817 \section{The Paradox of the Ravens (Hempel)} 818 H: all ravens are black. 819 Equivalent to saying ``all not black things are not ravens''. 820 So observation of non-black should be evidence for H. 821 822 Suppose two vases, one with only ravens (R, amount $n_R $), and one with only non-ravens (NR, amount $n_{NR}$ ). 823 $P_R$ is probability of black in raven vase, $P_{NR}$ is probability of black in non-ravens. 824 825 X is a draw from R. $H_A: P_R = 1$, $H_B: P_R = p < 1$. 826 827 Evidence is that X is black. 828 829 \begin{align*} 830 LR_{A,B} (E) &= \frac{P \given{X is black}{A}}{P \given{X is black}{B}} \\ 831 &= \frac{1}{p} > 1 832 \end{align*} 833 834 So this is evidence that all ravens are black. 835 836 Y is a draw from NR. Evidence is that Y is white. 837 838 \begin{align*} 839 LR_{A,B} (E') &= \frac{P \given{Y is white}{A}}{P \given{Y is white}{B}} \\ 840 &= 1 \quad \text{A,B do not affect NR, just R} 841 \end{align*} 842 843 That's not evidence for H. It's neutral. 844 845 \subsection{But there's a big but} 846 What if we do this: 847 848 \begin{enumerate} 849 \item Mix all the things 850 \item Choose non-black object from the mix 851 \item Suppose this non-black object came from NR 852 \item Claim this is evidence for H 853 \end{enumerate} 854 855 Z is outcome, R or NR. 856 857 \begin{align*} 858 LR_{A,B} (Z = NR) &= \frac{P \given{Z = NR}{A}}{P \given{Z = NR}{B}} \\ 859 &= \frac{1}{P \given{Z = NR}{B}} \\ 860 P (Z = NR) &= \frac{\text{num of non-black objects in NR}}{\text{total num of non-black objects}} \\ 861 &= \frac{n_{NR} (1-P_{NR}}{n_{NR} (1-P_{NR}) + n_R (1-P_R} \\ 862 \therefore LR_{A,B} (Z = NR) &= \frac{1}{P(Z = NR} \\ 863 &= \frac{n_{NR} (1-P_{NR}) + n_R (1-P_R}{n_{NR} (1-P_{NR}} \\ 864 &= 1 + \frac{n_R (1-P_R)}{n_{NR} (1-P_{NR}} \\ 865 &> 1, \text{ if $P_R$ is not 1 (assumed)} \\ 866 \end{align*} 867 868 So this \textit{is} evidence for H! (though not very strong evidence) 869 870 So, two ways of sampling, which one you use \textit{is} definitely relevant. 871 If you select something that you \textit{know} is not a raven, and see that it's not black, that' snot evidence. 872 If you randomly select something that's not black, and see that it's not a raven, it \textit{is} evidence. 873 \end{document}