notes.tex - lectures.alex.balgavy.eu - Lecture notes from university.

notes.tex (32197B)
      1 \documentclass[12pt,a4paper,oneside,fleqn]{article}
      2 \usepackage{amsmath}
      3 \usepackage{amssymb}
      4 \usepackage{hyperref}
      5 \usepackage[left=1in, right=1in, top=1in, bottom=1in]{geometry}
      6 
      7 \usepackage{fancyhdr}
      8 \setlength{\headheight}{15.2pt} \pagestyle{fancy}
      9 \rhead{Alex Balgavy}
     10 
     11 % \given{A}{B} ("A given B") %
     12 \makeatletter
     13 \newcommand{\@givenstar}[2]{\left(#1\;\middle|\;#2\right)}
     14 \newcommand{\@givennostar}[3][]{#1(#2\;#1|\;#3#1)}
     15 \newcommand{\given}{\@ifstar\@givenstar\@givennostar}
     16 \makeatother
     17 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     18 
     19 \title{A Likelihood Approach to Statistics: Notes}
     20 \author{Alex Balgavy}
     21 \date{April-May 2019}
     22 
     23 \begin{document}
     24 \maketitle
     25 \section{Introduction}
     26 When can we say that there is sufficient evidence?
     27 A large issue is with the phrasing of conditional probability.
     28 There's a difference between \\
     29 $P\given{\text{winning}}{\text{not committed fraud}})$ and $P\given{\text{committed fraud}}{\text{winning}}$.
     30 The relative probabilities are important.
     31 
     32 \begin{align*}
     33   H_1, H_2\text{: hypotheses}\\
     34   E\text{: evidence/data}\\
     35   \\
     36   \frac{P\given{H_1}{E}}{P\given{H_2}{E}} &= \frac{P(H_1 \cap E)}{P(H_2 \cap E} \\
     37                                           &= \frac{P\given{E}{H_1} P(H_1)}{P\given{E}{H_2} P(H_2)} \\
     38                                           &= \frac{P\given{E}{H_1}}{P\given{E}{H_2}} \times \frac{P(H_1)}{P(H_2)}\\
     39                                           \\
     40   \therefore \underbrace{\frac{P\given{H_1}{E}}{P\given{H_2}{E}}}_\text{posterior odds} &= \underbrace{\frac{P\given{E}{H_1}}{P\given{E}{H_2}}}_\text{likelihood ratio} \times \underbrace{\frac{P(H_1)}{P(H_2)}}_\text{prior odds}
     41 \end{align*}
     42 
     43 Questions:
     44 
     45 \begin{itemize}
     46   \item When do observations support a hypothesis?
     47   \item What does this mean?
     48   \item What should I do next? What should I believe?
     49 \end{itemize}
     50 
     51 \textbf{Evidence} are data that \textbf{make you change your assessment of the hypotheses of interest}.
     52 It doesn't tell you what to believe, but how to change your belief.
     53 What to do depends on the risks and consequences.
     54 
     55 The \textbf{likelihood ratio} is \textbf{the extent to which you should change your mind}.
     56 
     57 The \textbf{evidence } is what determines the likelihood ratio.
     58 
     59 \newpage
     60 
     61 \subsection{Exercise 3.13}
     62 $E$: driver tests positive on breathalyzer
     63 
     64 $+$: too much alcohol
     65 
     66 $-$: below limit
     67 
     68 \[
     69   LR = \frac{P \given{E}{+}}{P \given{E}{-}} = \frac{0.99}{0.10} = 9.9
     70 \]
     71 
     72 Then:
     73 
     74 \begin{align*}
     75   \frac{P \given{+}{E}}{P \given{-}{E}} &= LR \times \text{prior odds} = 9.9 \times \overbrace{\frac{P(+)}{P(-)}}^\text{given in ex.}\\
     76                                         &= 9.9 \times \frac{0.1}{9.9} = 1.1 \\
     77   \frac{P \given{+}{E}}{P \given{-}{E}} &= \frac{x}{1-x} = 1.1 \\
     78   x &= \frac{11}{21} \\
     79 \end{align*}
     80 
     81 \section{Benchmarking}
     82 How do you quantify the likelihood ratio? Do a benchmark experiment.
     83 
     84 Example with two hypotheses:
     85 
     86 $H_1$: box has all white balls
     87 
     88 $H_2$: box has 50\% white, 50\% black balls
     89 
     90 $E$: drawing 5 white balls in a row (with replacement)
     91 
     92 \[
     93   \frac{P \given{E}{H_1}}{P \given{E}{H_2}} = \frac{1}{\frac{1}{32}} = 32 = LR
     94 \]
     95 
     96 Then, if some experiment has $LR = 357$, compared to benchmark, it's about as likely as drawing 8-9 white balls in a row.
     97 \textit{But} you still can't say whether the situation is $H_1$ or $H_2$, because it depends on the prior odds.
     98 
     99 
    100 \section{General LR properties}
    101 
    102 The likelihood ratio cannot be \textit{wrong}: given the evidence, the LR points a certain way.
    103 But it can be \textit{misleading} and point towards a hypothesis that's not true.
    104 
    105 Probability theory depends on the available information.
    106 Image placing bets for or against something -- that's a first indication of the probabilities.
    107 \textbf{How often can the LR be misleading, and to what degree?}
    108 
    109 Example: throw a pin, lands pin up with
    110 $\begin{cases} H_1: p = \frac{1}{2}\\
    111   H_2: p = \frac{3}{4}
    112 \end{cases}$
    113 
    114 \newpage
    115 
    116 One throw:
    117 
    118 \vspace{1em}
    119 \renewcommand{\arraystretch}{1.5}
    120 \begin{tabular}{| l | c |}
    121   \hline
    122   \textbf{outcome} & \textbf{LR} \\ \hline
    123   up & $(\frac{1}{2})/(\frac{3}{4}) = \frac{2}{3}$ \\ \hline
    124   down & $(\frac{1}{2})/(\frac{1}{4}) = 2$ \\ \hline
    125 \end{tabular}
    126 \renewcommand{\arraystretch}{1}
    127 \vspace{1em}
    128 
    129 If $H_1$ is true, the average LR is \[
    130   \frac{1}{2} \times \frac{2}{3} + \frac{1}{2} \times 2 = \frac{1}{3} + 1 = \frac{4}{3}
    131 \]
    132 
    133 If $H_2$ is true, the average LR is \[
    134   \frac{3}{4} \times \frac{2}{3} + \frac{1}{4} \times 2 = 1
    135 \]
    136 
    137 With two throws, the average LR will be:
    138 
    139 \begin{itemize}
    140   \item if $H_1$ true, $(\frac{4}{3})^2$
    141   \item if $H_2$ true, 1
    142 \end{itemize}
    143 
    144 Average LR: $\sum P(\text{outcome}) \times LR_\text{outcome}$
    145 
    146 If we compute LR for $H_1$ vs $H_2$ then:
    147 
    148 \begin{itemize}
    149   \item If $H_1$ is true, on average $LR > 1$
    150   \item If $H_2$ is true, on average $LR = 1$
    151   \item The following always holds:
    152     \[ \frac{P \given{LR = x}{H_1}}{P \given{LR = x}{H_2}} = x \]
    153 \end{itemize}
    154 The LR is a sufficient statistic for the two hypotheses, you won't learn more from seeing the evidence.
    155 So if we know $LR(E)$, we don't need to know $E$ itself.
    156 
    157 The probability of misleading evidence is \[
    158   P \given{LR_{H_1, H_2} (E) \geq k}{H_2} \leq \frac{1}{k}
    159 \] regardless of $H_1$ and $H_2$.
    160 
    161 LR is not additive, but multiplicative:
    162 
    163 \begin{align*}
    164   LR(E_1, E_2) &= LR(E_1) \times LR \given{E_2}{E_1} \\
    165                &= LR(E_1) \times LR(E_2) &&\text{[if independent]}
    166                \\\\
    167   \log(LR(E_1, E_2)) &= \log(LR(E_1)) + \log(LR \given{E_2}{E_1})
    168   \\\\
    169   \log(LR) &> 0: \quad \text{support $H_1$}\\
    170   \log(LR) &= 0: \quad \text{no evidence either way}\\
    171   \log(LR) &< 0: \quad \text{support $H_2$}
    172 \end{align*}
    173 
    174 \subsection{Exercise 4.1}
    175 \subsubsection{What's the likelihood ratio in favour of accused's guilt?}
    176 
    177 \begin{align*}
    178   P \given{E}{H_1} &= 1 \\
    179   P \given{E}{H_2} &= \frac{1}{10000} \\
    180   LR &= \frac{1}{\frac{1}{10000}} = 10000
    181 \end{align*}
    182 
    183 \subsubsection{How can the value be used?}
    184 All you can do with the LR is to update the prior odds by multiplying.
    185 
    186 \subsubsection{What difference would it make if there were a 1\% chance of matching results when in reality they are different?}
    187 \begin{align*}
    188   P\given{E}{H_1} &= 1 \quad \text{unchanged} \\
    189   P\given{E}{H_2} &= 0.01 \\
    190   LR &= \frac{1}{0.01} = 100
    191 \end{align*}
    192 
    193 \subsubsection{Extra: what if, in 1\% of chases, lab mistakenly says there is no match when there really is one?}
    194 \begin{align*}
    195   P\given{E}{H_1} &= 0.99 \\
    196   P\given{E}{H_2} &= \frac{1}{10000} \quad \text{unchanged from original} \\
    197   LR &= \frac{0.99}{\frac{1}{10000}} = 9900
    198 \end{align*}
    199 
    200 \section{Assignment 1 Review}
    201 Definitions:
    202 
    203 \begin{align*}
    204   H(1)&: \quad \text{All cards in deck are labelled 1}\\
    205   H(i)&: \quad \text{All cards in deck are labelled $i$}\\
    206   H_n&: \quad \text{Deck is normal}\\
    207   E&: \quad \text{Choosing a card with label 1}
    208 \end{align*}
    209 
    210 How do you derive the result directly?
    211 
    212 \begin{align*}
    213   P\given{E}{H(1)} &= 1 \\
    214   P\given{E}{\text{not } H(1)} &= \frac{P(E \cap \text{not } H(1)}{\text{not } H(1)} \\
    215                                &= \frac{p \times \frac{1}{52}}{1-\frac{1-p}{52}} &&\text{normal deck and choosing 1 out of it} \\
    216                                &= \frac{p}{52-(1-p)} &&\text{multiply by 52} \\
    217                                &= \frac{p}{51+p} \\
    218   LR &= \frac{P\given{E}{H(1)}}{P\given{E}{\text{not } H(1)}} \\
    219      &= \frac{1}{\frac{p}{51+p}} \\
    220      &= \frac{51+p}{p}
    221 \end{align*}
    222 
    223 \section{From Data to Decision}
    224 The question now is, ``what should I do?''
    225 
    226 \subsection{With prior probabilities: Bayes rule}
    227 Example -- nuchal scan of fetus, to assess probability of trisomy 21.
    228 Scan produces evidence $E$.  Hypotheses $H_1: $ trisomy 21, $H_2: $ no trisomy 21. $P(H_1)$ is given, based on age of mother:
    229 
    230 \[
    231   \text{Young mother:} \qquad \frac{P(H_1)}{P(H_2)} = \frac{1}{10000}, \quad \text{action A1 if $LR \geq 40$} \\
    232 \]
    233 \[
    234   \text{Old mother:} \qquad \frac{P(H_1)}{P(H_2)} = \frac{1}{5}, \quad \text{action A1 if $LR \geq \frac{1}{50}$} \\
    235 \]
    236 
    237 Then compute posterior odds:
    238 
    239 \[
    240   \frac{P\given{H_1}{E}}{P\given{H_2}{E}} = LR \times \frac{P(H_1)}{P(H_2)}
    241 \]
    242 
    243 If $LR = \frac{P\given{E}{H_1}}{P\given{E}{H_2}}$ large enough, make a decision:
    244 
    245 \begin{itemize}
    246   \item \textbf{A1:} Further testing, or
    247   \item \textbf{A2:} no action
    248 \end{itemize}
    249 
    250 In The Netherlands, ``large enough'' means $\geq \frac{1}{250}$.
    251 
    252 For young mothers, no more tests unless strong evidence for trisomy 21.
    253 For old mothers, do further test unless strong evidence against trisomy 21.
    254 Result depends not just on LR, but on product of LR with prior odds.
    255 
    256 \subsection{Without prior probabilities: frequentist approach}
    257 Suppose, if $H_1$ (the ``null hypothesis'') is true, we take action A1, and if $H_2$ is true, we take action A2.
    258 The options are:
    259 
    260 \vspace{1em}
    261 \renewcommand{\arraystretch}{1.5}
    262 \begin{tabular}{| c | c | c |}
    263   \hline
    264       & \textbf{A1} & \textbf{A2} \\ \hline
    265   \textbf{H1} & true positive, sensitivity & false positive, type I error \\ \hline
    266   \textbf{H2} & false negative, type II error & true negative, specificity \\ \hline
    267 \end{tabular}
    268 \renewcommand{\arraystretch}{1}
    269 \vspace{1em}
    270 
    271 Often we like to control the probability of a type I error, called $\alpha$. $\beta$ is the probability of a type II error ($P\given{\text{decide A1}}{H_2}$).
    272 
    273 \subsubsection{Decision procedure}
    274 \begin{enumerate}
    275   \item Make probability distributions for evidence $E$ you will gather
    276   \item Define a way to decide -- a rejection region $R$ (subset of all possible evidence)
    277   \item If evidence $E$ turns out to be in $R$, reject $H_1$ (choose action A2). Otherwise, choose A1.
    278 \end{enumerate}
    279 
    280 Such that:
    281 
    282 \begin{itemize}
    283   \item If $H_1$ is true, $P\given{E \in R}{H_1} = \alpha$ (fixed, often 0.05).
    284   \item Preferably $P\given{E \in R}{H_2} = 1 - \beta$ as large as possible (this is sometimes called the ``power of the test'').
    285 \end{itemize}
    286 
    287 \subsubsection{Example}
    288 The same thumbtack, with $H_1: p = \frac{1}{4}$, collecting data from 30 trials.
    289 If $H_1$ is true, X successes, observe X = x.
    290 How do you choose rejection region $R$?
    291 One way is to select the most unlikely outcomes in $R$ until their joint probability to happen is too large.
    292 E.g. $R = \{ 0, 1, 2, 3, 13, 14, 15 \ldots 30 \}$, this would give $\alpha = 0.03$.
    293 
    294 Well, if there is no alternative, $\beta$ does not exist. So, take $H_2: p \frac{3}{4}$.
    295 
    296 If there are $x$ successes, then
    297 
    298 \begin{align*}
    299   LR(X) &= \frac{P\given{X = x}{p = \frac{1}{4}}}{P\given{X = x}{p = \frac{3}{4}}} \\
    300         &= \frac{\binom{30}{x} (\frac{1}{4})^x (\frac{3}{4})^{30-x} }{\binom{30}{x} (\frac{3}{4})^x (\frac{1}{4})^{30-x} } &&\text{[binomial distribution $X \sim B\Big(30, \frac{1}{4}\Big)$]} \\
    301         &= \frac{\frac{1}{4^x}\frac{3^{30-x}}{4^{30-x}}}{\frac{3^x}{4^x}\frac{1}{4^{30-x}}} \\
    302         &= \frac{\frac{3^{30-x}}{4^{30}}}{\frac{3^x}{4^{30}}} \\
    303         &= \frac{3^{30-x}}{3^x} \\
    304         &= 3^{30} - 2x
    305 \end{align*}
    306 
    307 We also have the property that
    308 
    309 \[
    310   P \given{X = x}{p = \frac{1}{4}} = P \given{X = 30 - x}{p = \frac{3}{4}}
    311 \]
    312 Based on the result, then decide:
    313 
    314 \begin{itemize}
    315   \item If $x < 15$, $LR > 1$ and supports $H_1: p = \frac{1}{4}$
    316   \item If $x = 15$, $LR = 1$
    317   \item If $x > 15$, $LR < 1$ and supports $H_2: p = \frac{3}{4}$
    318 \end{itemize}
    319 
    320 The table for this binomial distribution with the corresponding LRs is
    321 
    322 \vspace{1em}
    323 \renewcommand{\arraystretch}{1.5}
    324 \begin{tabular}{| l | c | r | r | r |}
    325   \hline
    326   \textbf{LR} & \textbf{Rejection region R} & \textbf{$\alpha$} & \textbf{$\beta$} & \textbf{$\alpha + \beta$}\\ \hline
    327   $LR \leq 729$ & $x \geq 12$ & $0.0506$ & $\approx 0$ & $0.0506$ \\ \hline
    328   $LR \leq 81$ & $x \geq 13$ & $\sim 0.0215$ & $\approx 0$ & $0.0215$ \\ \hline
    329   $LR \leq 9$ & $x \geq 14$ & $\sim 0.0081$ & $0.0002$ & $0.0083$ \\ \hline
    330   $LR \leq 1$ & $x \geq 15$ & $0.0027$ & $0.0008$ & $0.0035$ \\ \hline
    331   $LR \leq \frac{1}{9}$ & $x \geq 16$ & $0.0008$ & $0.0027$ & $0.0035$ \\ \hline
    332                         & $x \geq 17$ & $0.0002$ & $0.0081$ & $0.0083$ \\ \hline
    333                         & etc. & & & \\ \hline
    334                         & $\{0, 1, 2, 3, 13 \ldots 30\}$ & $\sim 0.03$ & $\approx 0$ & $\approx 0.03$ \\ \hline
    335 \end{tabular}
    336 \renewcommand{\arraystretch}{1}
    337 \vspace{1em}
    338 
    339 \subsubsection{Neyman-Pearson lemma}
    340 If LR-threshold is used for decision making, eg.
    341 \[
    342   R_t = \{ E | LR(E) \leq t \} \qquad \text{t is threshold, whatever number}
    343 \]
    344 
    345 Then you get some
    346 \begin{align*}
    347   \begin{cases}
    348     \alpha_t &= P \given{LR(E) \leq t}{H_1} \\
    349     \beta_t &= P \given{LR(E) > t}{H_2} \qquad (< \frac{1}{t} )
    350   \end{cases}
    351 \end{align*}
    352 
    353 Suppose I have another procedure with rejection region $R$ and error rates $\alpha_R$ and  $\beta_R$.
    354 If $\alpha_R < \alpha_t$, then $\beta_R > \beta_t$.
    355 
    356 So, LR is optimal in the sense that it is impossible to improve upon \textit{both} $\alpha_t$ and $\beta_t$ at the same time.
    357 Therefore, there is no conceptual reason to use a different procedure (though there may be a practical reason).
    358 
    359 With $t = 1$, the sum of $\alpha + \beta$ is minimal.
    360 
    361 In the example, with $\alpha = 0.05$, we get $5 = 729$.
    362 \[
    363   R_{729} = \{ x \geq 12 \}
    364 \]
    365 
    366 That is, if $x \geq 12$, $LR = 729$. Evidence supports $H_1$, but $H_1$ is rejected.
    367 
    368 Error rates are predictive, they belong to a procedure for decision making:
    369 
    370 \begin{itemize}
    371   \item If $H_1$ true: probability $\alpha$ of error.
    372   \item If $H_2$ true: probability $\beta$ of error.
    373 \end{itemize}
    374 
    375 It's not true that if you decide for $H_1$, there is a probability $\alpha$ that \textit{you} made an error!
    376 
    377 \subsubsection{Frequentist vs Bayesian statistics}
    378 \vspace{1em}
    379 \renewcommand{\arraystretch}{1.5}
    380 \begin{tabular}{| c | c |}
    381   \hline
    382   \textbf{Frequentist} & \textbf{Bayesian} \\ \hline
    383   no priors & priors \\
    384   predicting data & explaining data \\
    385   LRs for decision making & LRs for updating odds, \textit{then} decision making
    386   \\ \hline
    387 \end{tabular}
    388 \renewcommand{\arraystretch}{1}
    389 \vspace{1em}
    390 
    391 If you don't have priors and no good way to estimate them, it may be better to go with the frequentist approach and accept the errors that come with it.
    392 
    393 \section{Neyman-Pearson}
    394 To recap, the LR ``decides'' which hypothesis best explains data.
    395 Data-driven hypotheses are allowed, but since the posterior odds identity is true, a high LR is compensated by small prior odds.
    396 
    397 Procedure:
    398 
    399 \begin{enumerate}
    400   \item Choose $\alpha$
    401   \item Choose t, with t such that
    402     \[
    403       P \given{LR < t}{H_1} = \alpha
    404     \]
    405 
    406     Choose $A_1$ if $LR_{H_1, H_2})(E) \geq t$
    407 \end{enumerate}
    408 
    409 This means that we choose $A_2$ while there is evidence for $H_1$.
    410 
    411 \subsection{Example (building on the binomial coin from the previous lecture)}
    412 $A_1: \theta = \frac{1}{4}, \quad H_2: \theta = \frac{3}{4}, \alpha = 0.05$
    413 
    414 Choose $A_1$ if $LR \geq 729$ ($\#successes \geq 12$). Why? Because you insist on a small $\alpha$.
    415 
    416 \section{What if only the final decision is given?}
    417 What happens if you only get ``an expert's opinion'' and the final decision they took?
    418 You can still figure out the evidential value.
    419 
    420 \[
    421   \frac{P \given{E_1}{H_1}}{P \given{E_1}{H_2}} = \frac{1-\alpha}{\beta} \qquad \qquad
    422   \frac{P \given{E_2}{H_1}}{P \given{E_2}{H_2}} = \frac{\alpha}{1-\beta}
    423 \]
    424 
    425 If $\beta$ is small, LR increases and you get high evidential value.
    426 
    427 \section{P-values: what's wrong with them?}
    428 \subsection{Example: researchers' experiments}
    429 Goal is to disprove success probability $p = \frac{1}{2}$.
    430 20 experiments, the result is 14 successes.
    431 $\alpha = 0.05$, $H_1: \enspace p = \frac{1}{2}$, $H_2: \enspace p \neq \frac{1}{2}$.
    432 
    433 Compute $P \given{\geq 14 \cup \leq 6}{H_1} = 0.23$.
    434 Since this is $> \alpha$, not significant enough so can't reject $H_1$.
    435 But 15 successes would have done it, with probability of 0.0412.
    436 So do 20 more trials, with 19 successes.
    437 Then $P \given{\geq 33 \cup \leq 7}{H_1} = 0.000422$.
    438 
    439 But rejection of $H_1$ is incorrect here!
    440 After 40 experiments, $P \given{\geq 27 \cup \leq 13}{H_1} = 0.05$.
    441 The total critical region is:
    442 
    443 \begin{itemize}
    444   \item $\leq 5 \cup \geq 15$ if 20 experiments
    445   \item $\leq 13 \cup \geq 27$ if 40 experiments
    446 \end{itemize}
    447 
    448 Total probability is $\geq 0.05$
    449 
    450 So what if we do 20 experiments, possibly stopping after 10?
    451 Reject $H_1$ if either:
    452 
    453 \begin{itemize}
    454   \item After 10 exp. $\geq 9 \cup \leq 1$ successes
    455   \item After 20 exp. $\geq 16 \cup \leq 4$ successes
    456 \end{itemize}
    457 
    458 Probability under $H_1$ is $\leq 0.05$.
    459 
    460 Then, results are: after 10, 3 successes; after 20, 5 successes. So $H_1$ is not rejected.
    461 
    462 Another researcher only looks after 20 experiments, so for them, 5 successes means reject!
    463 
    464 It's strange that we are using probabilities of outcomes we never saw to interpret the evidence.
    465 The LR approach doesn't have this problem.
    466 
    467 \subsection{Example: ability to see color}
    468 You have 20 colors.
    469 Experiment 1: for people that don't see green, reject $p = \frac{1}{2}$ if \#successes = \{0,1,9,10\}.
    470 Experiment 2: $H_1: \enspace p = \frac{1}{2}$ for all colors, reject $H_1$ if at least one person gets 0 or 10.
    471 
    472 Result: experiment with green has 9 successes, experiment with all others in \{1,2,...9\}.
    473 What then?
    474 Reject and don't reject at the same time?
    475 
    476 Other experiments should not have an effect on evidential value of an experiment.
    477 
    478 \subsection{Example: one-tailed vs. two-tailed}
    479 Take $p$ to be unknown success probability.
    480 $\alpha = 0.05$, 100 experiments.
    481 $H: \enspace p = \frac{1}{2}$, reject H if $\text{successes} \leq 59 \cup \geq 61$.
    482 $H': \enspace p \leq \frac{1}{2}$, reject $H'$ if $\text{successes} \geq 59$.
    483 
    484 Suppose 60 successes. Reject $p \leq \frac{1}{2}$ but not $p = \frac{1}{2}$? Wtf?
    485 
    486 \subsection{Example: changing alpha}
    487 $H: \enspace p = \frac{1}{2}$, 40 experiments, $\alpha = 0.05$.
    488 $P \given{\geq 29 \cup \leq 11}{H} = 0.0003$.
    489 
    490 The researcher sees that $\alpha = 0.01$ would also be ok, so they claim to reject H at level $\alpha = 0.01$.
    491 
    492 This is wrong! $\alpha$ belongs to the whole experiment, it does not relate to an individual outcome.
    493 By changing $\alpha$ from experiment to experiment, it loses the \textit{only} interpretation it has.
    494 
    495 \section{P-values of LRs}
    496 Suppose $LR = 47$.
    497 The p-value is then $P \given{LR \geq 47}{H_2}$.
    498 The idea is that, if p-value is very small, then the LR of 47 is extreme for $H_2$.
    499 If it's large, then the 47 is `normal' for $H_2$.
    500 
    501 However, this still has no \textit{evidential} value.
    502 LR measures strength of evidence.
    503 The p-value tells you how rare such a LR is.
    504 However, once you have evidence, it doesn't matter how frequently evidence of that strength occurs.
    505 
    506 \subsection{Example: genomes}
    507 Two people with genomes g1, g2. $H_1:  siblings$, $H_2:  unrelated$.
    508 
    509 You can take different types of LRs:
    510 
    511 \begin{align*}
    512   LR_{H_1, H_2} (g_1, g_2) &= \frac{P \given{g_1, g_2}{H_1}}{P \given{g_1,g_2}{H_2}} \\
    513   LR' &= \frac{P \given{g_2}{g_1, H_1}}{P \given{g_2}{g_1, H_2}} \\
    514   LR'' &= \frac{P \given{g_1}{g_2, H_1}}{P \given{g_1}{g_2 H_2}}
    515 \end{align*}
    516 
    517 These are all basically the same.
    518 Take notation $p_1 = P(g_1)$, $p_2 = P(g_2)$.
    519 $p_1(g_2) = P(g_1 \text{ for a sibling of someone with } g_2)$.
    520 $p_2(g_1) = P(g_2 \text{ for a sibling of someone with } g_1)$.
    521 
    522 Then we can rewrite the LRs from above:
    523 
    524 \begin{align*}
    525   LR_{H_1, H_2} (g_1, g_2) &= \frac{p_1 p_2(g_1)}{p_1 p_2} = \frac{p_2(g_1)}{p_2} \\
    526   LR' &= \frac{p_2 (g_1)}{p_2} \\
    527   LR'' &= \frac{p_2 (g_1)}{p_2}
    528 \end{align*}
    529 
    530 The p values will be different though, because depending on the fixed genome, the frequency of how often it occurs will be different.
    531 This would lead to different actions, even though the LRs are identical, and thus so is the evidence.
    532 
    533 \subsection{Example: disease and test results}
    534 Take $H_1: \enspace \text{disease present}$, $H_2: \enspace \text{disease absent}$.
    535 
    536 Experiment 1:
    537 
    538 \vspace{1em}
    539 \renewcommand{\arraystretch}{1.5}
    540 \begin{tabular}{| l | c | r |}
    541   \hline
    542   \textbf{} & \textbf{+} & \textbf{-} \\ \hline
    543   $H_1$ & 0.94 & 0.06 \\ \hline
    544   $H_2$ & 0.02 & 0.98 \\ \hline
    545 \end{tabular}
    546 \renewcommand{\arraystretch}{1}
    547 \vspace{1em}
    548 
    549 \begin{align*}
    550   LR(+) &= \frac{P \given{+}{H_1}}{P \given{+}{H_2}} = \frac{0.94}{0.02} = 47 \\
    551   LR(-) &= \frac{P \given{-}{H_1}}{P \given{-}{H_2}} = \frac{0.06}{0.98} = \frac{1}{16} \\
    552 \end{align*}
    553 
    554 Experiment 2 (``0'' means experiment is not carried out):
    555 
    556 \vspace{1em}
    557 \renewcommand{\arraystretch}{1.5}
    558 \begin{tabular}{| l | c | r | r |}
    559   \hline
    560   \textbf{} & \textbf{+} & \textbf{0} & \textbf{-} \\ \hline
    561   $H_1$ & 0.47 & 0.5 & 0.03 \\ \hline
    562   $H_2$ & 0.01 & 0.5 & 0.49 \\ \hline
    563 \end{tabular}
    564 \renewcommand{\arraystretch}{1}
    565 \vspace{1em}
    566 
    567 \begin{align*}
    568   LR(+) &= 47 \\
    569   LR(-) &= \frac{1}{16} \\
    570 \end{align*}
    571 
    572 Experiment 3 (``*'' is negative result or no experiment):
    573 
    574 \vspace{1em}
    575 \renewcommand{\arraystretch}{1.5}
    576 \begin{tabular}{| l | c | r |}
    577   \hline
    578   \textbf{} & \textbf{+} & \textbf{*} \\ \hline
    579   $H_1$ & 0.47 & 0.53 \\ \hline
    580   $H_2$ & 0.01 & 0.99 \\ \hline
    581 \end{tabular}
    582 \renewcommand{\arraystretch}{1}
    583 \vspace{1em}
    584 
    585 \[
    586   LR(+) = 47
    587 \]
    588 
    589 All of the LRs are the same.
    590 So essentially, if a ``+'' is obtained, the evidential value is always the same no matter how it was obtained.
    591 
    592 Per experiment, $P \given{LR \geq 47}{H_2} = $
    593 
    594 \begin{enumerate}
    595   \item 0.02
    596   \item 0.01
    597   \item 0.01
    598 \end{enumerate}
    599 
    600 These are not all the same!
    601 
    602 The p-value relates to the \textit{entire} procedure, that's why it's not the same.
    603 The LR relates to an individual outcome, so it's always the same.
    604 
    605 \section{Why confidence intervals are similarly fucked}
    606 Recall testing H1 vs H2: define rejection region R, s.t. if sampled data are in R, you ``reject H1'' (take some action).
    607 Otherwise, do not reject.
    608 
    609 P-values define R in terms of what might happen if H1 is true, s.t. total probability for data to be in R is $\alpha$.
    610 The point is that you can't interpret data in R as evidence against H1.
    611 
    612 Neyman-Pearson: define R using LR threshold t. $R \{ E | LR(E) \leq t\}$ gives you optimality.
    613 
    614 Why p-values suck (recap):
    615 
    616 \begin{itemize}
    617   \item do not measure strength of evidence in E against H1
    618   \item they are ambiguous (several ways of defining them)
    619   \item the probability $\alpha$ is a property of the procedure that you do (how data are gathered), not of the obtained data
    620 \end{itemize}
    621 
    622 \subsection{Confidence intervals}
    623 Say we have a model (e.g. a Binomial distribution) that generates the data and has unknown param $\theta$ that we want to estimate.
    624 Example: $\theta$ mean height of people, model $N(\theta, \sigma^2)$.
    625 
    626 A CI of $1-\alpha$ consists of two functions on data that can be obtained, $\theta_{min}$ and $\theta_{max}$, such that if $\theta$ is true value of the param of interest, it lies between $\theta_{min} (E)$ and $\theta_{max} (E)$ with probability $1-\alpha$ if we repeat sampling of E.
    627 
    628 \subsubsection{Commonly encountered 95\% CI}
    629 For data from $N(\theta, \sigma^2)$, if I sample n points $x_1,\ldots,x_n$, estimate $\theta$ by \[
    630   \hat{\theta} = \overline{x} = \frac{1}{n} \sum_{i=1}^n x_i
    631 \]
    632 
    633 And take as 95\% CI $\big[ \overline{x} - 1.96 \frac{\sigma}{\sqrt{n}}, \quad \overline{x} + 1.96 \frac{\sigma}{\sqrt{n}} \big]$.
    634 $1.96$ is the z-score for the CI.
    635 
    636 Why? It gives the smallest 95\% CI for such data.
    637 
    638 \subsubsection{Binomial data (2 possible outcomes)}
    639 Data $x_1,\ldots,x_n$, interested in ``success prob'' $p$ of $p = P(x = 1)$.
    640 
    641 With success probability p, in n points $x_1,\ldots,x_n$, there are k successes (ones) with prob \[
    642   P (X = k) = \binom{n}{k} p^k (1-p)^{n-k}
    643 \] (Binomial distribution probability).
    644 
    645 A 95\% CI can b e computed with this, but a good approximation is \[
    646   \theta_{min,max} = \hat{p} \pm 1.96 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}
    647 \] where $\hat{p} = \frac{k}{n}$.
    648 
    649 The CI at level $1-\alpha$ contains exactly the values that would not lead to rejection with significance level $\alpha$ (i.e. p-value $\geq \alpha$).
    650 
    651 \subsection{Problems with CIs}
    652 CIs suffer from the same problems as p-values:
    653 
    654 \begin{itemize}
    655   \item $\alpha$ is a property of the procedure, not of any realized outcome
    656   \item ambiguity: lots of choices possible
    657 \end{itemize}
    658 
    659 \subsubsection{Example}
    660 Want to estimate $\theta$, gather data $x$.
    661 
    662 \[
    663   P(x | \theta) = \begin{cases} \frac{1}{2} \quad x = \theta \\ \frac{1}{2} \quad x = \theta+1 \end{cases}
    664 \]
    665 
    666 Gather two points $x_1, x_2$. CI defined as $\big[ \theta_{min} = \min(x_1, x_2), \quad \theta_{max} = \max(x_1, x_2) \big]$
    667 
    668 This is a 75\% CI.
    669 
    670 But if data are $x_1 = 28$, $x_2 = 29$, then CI is $[28, 29]$ and definitely contains $\theta$.
    671 If $x_1 = x_2 = 30$, then CI is $[30]$, $\theta$ could be 29 or 30.
    672 If the values for $\theta$ are equally likely, 50\% chance to contain $\theta$.
    673 
    674 \subsubsection{Example}
    675 With n points from $N(\mu, \sigma^2)$ normal dist, 95\% CI is $\overline{x} \pm 1.96 \frac{\sigma}{\sqrt{n}}$.
    676 
    677 Let $n=1$ or $n=100$ with probability $\frac{1}{2}$.
    678 What's a good 95\% CI?
    679 
    680 \begin{enumerate}
    681   \item If $n=1$, 95\% CI is $x_1 \pm 1.96 \sigma$.
    682     If $n=100$, 95\% CI is $\overline{x} \pm 1.96 \frac{\sigma}{10}$.
    683     But you can do better.
    684   \item If $n=1$, $x_1 \pm 1.62\sigma$ (91\% CI).
    685     If $n=100$, $\overline{x} \pm 2.72 \frac{\sigma}{10}$ (99\% CI).
    686     Overall, this is also 95\% CI.
    687 \end{enumerate}
    688 
    689 Why number 2? Expected width of intervals:
    690 
    691 \begin{enumerate}
    692   \item $\frac{1}{2}(2 \times 1.96 \sigma) + \frac{1}{2}(2 \times 1.96 \frac{\sigma}{10} = 1.96\sigma + 0.196\sigma = 2.156\sigma$
    693   \item $\frac{1}{2}(2 \times 1.62\sigma) + \frac{1}{2}(2*2.72 \frac{\sigma}{10} = 1.62\sigma + 0.272\sigma = 1.892\sigma$
    694 \end{enumerate}
    695 
    696 \subsubsection{Example}
    697 Heart/lung problems with newborns.
    698 Conventional medical treatment not very adequate, survival rate not precisely known but ~20\%.
    699 New, promising treatment ECMO, survival rate estimated possibly around 80\%.
    700 Study ECMO vs CMT.
    701 How large should it be?
    702 
    703 Take n patients, number of recoveries is x.
    704 
    705 Say, test $\theta_{ECMO} = 0.8$ vs $\theta_{ECMO} = 0.2$.
    706 If x recoveries, LR is
    707 
    708 \begin{align*}
    709   \frac{P \given{x}{\theta = 0.8}}{P \given{x}{\theta = 0.2}} &= \frac{\binom{n}{x} (0.8)^x (0.2)^{n-x}}{\binom{n}{x} (0.2)^x (0.8)^{n-x}} \\
    710                                                               &= \frac{4^x}{4^{n-x}} = 4^{2x-n} \\
    711                                                               &= 2^{4x-n} \\
    712 \end{align*}
    713 
    714 If I want $LR \geq 32 (\geq 2^5)$, I need $4x -2n \geq 5$. So $x\geq \frac{2n+5}{4}$.
    715 
    716 We can compute probability to get sufficiently strong evidence in favor of the true hypothesis, or probability of strongly misleading evidence, or probability of not obtaining strong evidence.
    717 
    718 Now suppose 13 out of 17 recoveries. What does this say about $\theta_{ECMO}$?
    719 
    720 We could CI that shit, but CIs have problems.
    721 
    722 The best $\theta_{ECMO} = \frac{13}{17} = 0.76$.
    723 How much better than $\theta = 0.5$?
    724 
    725 \[
    726   \frac{P \given{\text{13 out of 17}}{\theta = 0.8}}{P \given{\text{13 out of 17}}{\theta = 0.5}} = 11.5
    727 \]
    728 
    729 A likelihood interval. E.g. $\frac{1}{32}$ LI is all values $\theta$ such that LR for $\theta = \frac{13}{17}$ vs $\theta = \theta_0$ is at most 32.
    730 
    731 \section{Notes from Ioannidis}
    732 These are notes from the class when discussing the article ``Why Most Published Research Findings Are False'' by Ioannidis.
    733 
    734 $S$ : significant result ($p < 0.05$ ).
    735 
    736 \begin{align*}
    737   \frac{P \given{\overline{H_0}}{S}}{P \given{H_0}{S}} &= \frac{P \given{S}{\overline{H_0}}}{P \given{S}{H_0}} \times \underbrace{\frac{P(\overline{H_0})}{P(H_0)}}_\text{$R$ in the article} \\
    738   \frac{P \given{S}{\overline{H_0}}}{P \given{S}{H_0}} &= \frac{1-\beta}{\alpha}
    739 \end{align*}
    740 
    741 So $\frac{1-\beta}{\alpha} \times R > 1$ for $H_0$ to be false.
    742 
    743 In notation of Ioannidis, $(1-\beta)R > \alpha$.
    744 
    745 Odds $\frac{P \overline{H_0}}{P(H_0)} = R$ are equivalent to \[
    746   P (\overline{H_0}) = \frac{R}{R+1} \\
    747   P (H_0) = \frac{1}{R+1}
    748 \].
    749 
    750 Total number of research questions $c$ is then:
    751 
    752 \begin{align*}
    753   \begin{cases}
    754     c \frac{R}{R+1} \quad \text{if $\overline{H_0}$ true} \longrightarrow
    755       \begin{cases}
    756         S(\overline{H_0}\text{ true}) = (1-\beta) c \frac{R}{R+1} \\
    757         \overline{S}(\overline{H_0}\text{ true}) = \beta c \frac{R}{R+1}
    758       \end{cases}
    759     \\
    760     c \frac{1}{R+1} \quad \text{if $H_0$ true} \longrightarrow
    761     \begin{cases}
    762       S(H_0\text{ true}) = \alpha c \frac{1}{R+1} \\
    763       \overline{S}(H_0\text{ true}) = (1-\alpha) c (\frac{1}{R+1})
    764     \end{cases}
    765   \end{cases}
    766 \end{align*}
    767 
    768 Ioannidis: $\beta = 0.2$, $\alpha = 0.05$. So \[
    769   LR(S) = \frac{1-0.2}{0.05} = \frac{0.8}{0.05} = 16
    770 \]
    771 
    772 \subsection{Bias}
    773 Bias is when you get more significant findings than warranted by the data.
    774 E.g. you try to `clean up the data'.
    775 But then your original error rates don't apply anymore.
    776 
    777 Originally, \begin{gather*}
    778   P \given{S}{H_0} = \alpha \\
    779   P \given{S}{\overline{H_0}} = 1-\beta
    780 \end{gather*}
    781 
    782 Now, \begin{gather*}
    783   P \given{S}{H_0} = \alpha + (1-\alpha)u \\
    784   P \given{S}{\overline{H_0}} = (1-\beta) + \beta u
    785 \end{gather*} where $u$ is the probability of data becoming significant when they are not.
    786 
    787 Now, with bias, LR of S for $\overline{H_0}$ vs $H_0$ becomes \[
    788   \frac{P \given{S}{\overline{H_0}}}{P \given{S}{H_0}} = \frac{1-\beta+\beta u}{\alpha + (1-\alpha) u}
    789 \].
    790 
    791 $PPV = P \given{\overline{H_0}}{S}$.
    792 Plot y-axis odds $\frac{P \given{\overline{H_0}}{S}}{P \given{H_0}{S}}$, x-axis $u$.
    793 
    794 \vspace{1em}
    795 Suppose several teams:
    796 
    797 \begin{itemize}
    798   \item all the same research question
    799   \item all the same $\alpha$ and $\beta$
    800   \item result is published as soon as at least 1 team finds statistically significant result
    801 \end{itemize}
    802 
    803 $S$ : at least one team has $p < 0.05$. \[
    804   \frac{P \given{S}{\overline{H_0}}}{P \given{S}{H_0}} = \frac{1 - \beta^n}{1 - (1-\alpha)^n} \\
    805 \]
    806 
    807 As $n$ goes to infinity, the result tends towards 1.
    808 
    809 Corollaries:
    810 \begin{itemize}
    811   \item smaller studies = less likely for findings to be true
    812   \item smaller effect sizes = less likely for findings to be true
    813   \item greater number and less selection of tested relationships = less likely for findings to be true
    814   \item greater flexibility = less likely for findings to be true
    815 \end{itemize}
    816 
    817 \section{The Paradox of the Ravens (Hempel)}
    818 H: all ravens are black.
    819 Equivalent to saying ``all not black things are not ravens''.
    820 So observation of non-black should be evidence for H.
    821 
    822 Suppose two vases, one with only ravens (R, amount $n_R $), and one with only non-ravens (NR, amount $n_{NR}$ ).
    823 $P_R$ is probability of black in raven vase, $P_{NR}$ is probability of black in non-ravens.
    824 
    825 X is a draw from R. $H_A: P_R = 1$, $H_B: P_R = p < 1$.
    826 
    827 Evidence is that X is black.
    828 
    829 \begin{align*}
    830   LR_{A,B} (E) &= \frac{P \given{X is black}{A}}{P \given{X is black}{B}} \\
    831                &= \frac{1}{p} > 1
    832 \end{align*}
    833 
    834 So this is evidence that all ravens are black.
    835 
    836 Y is a draw from NR. Evidence is that Y is white.
    837 
    838 \begin{align*}
    839   LR_{A,B} (E') &= \frac{P \given{Y is white}{A}}{P \given{Y is white}{B}} \\
    840                 &= 1 \quad \text{A,B do not affect NR, just R}
    841 \end{align*}
    842 
    843 That's not evidence for H. It's neutral.
    844 
    845 \subsection{But there's a big but}
    846 What if we do this:
    847 
    848 \begin{enumerate}
    849   \item Mix all the things
    850   \item Choose non-black object from the mix
    851   \item Suppose this non-black object came from NR
    852   \item Claim this is evidence for H
    853 \end{enumerate}
    854 
    855 Z is outcome, R or NR.
    856 
    857 \begin{align*}
    858   LR_{A,B} (Z = NR) &= \frac{P \given{Z = NR}{A}}{P \given{Z = NR}{B}} \\
    859                     &= \frac{1}{P \given{Z = NR}{B}} \\
    860   P (Z = NR) &= \frac{\text{num of non-black objects in NR}}{\text{total num of non-black objects}} \\
    861              &= \frac{n_{NR} (1-P_{NR}}{n_{NR} (1-P_{NR}) + n_R (1-P_R} \\
    862   \therefore LR_{A,B} (Z = NR) &= \frac{1}{P(Z = NR} \\
    863                                &= \frac{n_{NR} (1-P_{NR}) + n_R (1-P_R}{n_{NR} (1-P_{NR}} \\
    864                                &= 1 + \frac{n_R (1-P_R)}{n_{NR} (1-P_{NR}} \\
    865                                &> 1, \text{ if $P_R$ is not 1 (assumed)} \\
    866 \end{align*}
    867 
    868 So this \textit{is} evidence for H! (though not very strong evidence)
    869 
    870 So, two ways of sampling, which one you use \textit{is} definitely relevant.
    871 If you select something that you \textit{know} is not a raven, and see that it's not black, that' snot evidence.
    872 If you randomly select something that's not black, and see that it's not a raven, it \textit{is} evidence.
    873 \end{document}
	lectures.alex.balgavy.eu Lecture notes from university.
	git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
	Log \| Files \| Refs \| Submodules