lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

index.md (2620B)


      1 +++
      2 title = 'Relationships between variables'
      3 template = 'page-math.html'
      4 +++
      5 
      6 # Relationships between variables
      7 
      8 relationship can be investigated, causality can’t.
      9 graphically, you can use scatterplots:
     10 
     11 ![](6de852d30c13f092f1d0954f4d21c2c6.png)
     12 
     13 ## Correlation
     14 
     15 correlation: if values of two variables are somehow associated with each other
     16 
     17 - positive: higher values of variable 1 are usually associated with higher values of variable 2
     18 - negative: higher values of variable 1 are usually associated with lower values of variable 2
     19 
     20 linear if the plotted points are basically a straight line.
     21 population linear correlation coefficient is ρ.
     22 sample linear correlation coefficient (estimator for ρhoρ) is:
     23 
     24 $r = \frac{1}{n-1} \times \frac{\sum_{i=1} n(x_{i} - \bar{x})(y_{i} - \bar{y})}{s_{x} s_{y}}$
     25 
     26 interpreting r:
     27 
     28 - r = 1: perfect positive linear relationship
     29 - r >0: positive linear relationship
     30 - r ≈ 0: no linear relationship (doesn’t mean no relationship!!)
     31 - r < 0: negative linear relationship
     32 - r = −1: perfect negative linear relationship
     33 
     34 ### Testing ρ = 0
     35 
     36 test statistic:
     37 
     38 $T_{p} = \frac{R - \rho}{\sqrt{\frac{1 - R^{2}}{n-1}}}$
     39 
     40 has under H0: ρ = 0 a t-distribution with n−2 degrees of freedom.
     41 
     42 ## Regression
     43 
     44 if there’s a correlation, points can be described by line 
     45 $y_{i} = \beta_{0} + \beta_{1} x_{i} + error_{i}$
     46 
     47 regression equation is $\hat{y} = b_{0} + b_{1} x$
     48 
     49 where b₀ and b₁ are least-squares estimates of β₀ and β₁
     50 
     51 you want values that satisfy least-squares property (i.e. minimise $\sum_{i} (observed - model)^{2}$)
     52 
     53 
     54 $\begin{aligned}
     55 b_{1} &= r \frac{s_{y}}{s_{x}} &&\text{(the slope)} \\\\
     56 b_0 &= \hat{y} - b_{1} \bar{x} &&\text{(the y intercept)}
     57 \end{aligned}$
     58 
     59 ### Testing linearity
     60 
     61 Test:
     62 - H0: β1 = 0
     63 - HA: β1 ≠ 0
     64 
     65 The score is:
     66 
     67 $t_{\beta} = \frac{b_{1}}{s_{b_{1}}}$
     68 
     69 (realisation of test statistic $T_{\beta}$ that has t-distribution with n−2 degrees of freedom under H₀)
     70 
     71 ### Coefficient of determination
     72 
     73 Coefficient of determination is proportion of variation in y variable that regression equation can explain:
     74 
     75 $r^{2} = \frac{\text{explained variations}}{\text{total variation}}$
     76 
     77 ### Residuals
     78 
     79 To check for a fixed standard deviation, make a residual plot.
     80 Residuals are estimates for the errors.
     81 
     82 residual: difference between observed yi and predicted value $\hat{y}\_{i} = b\_{0} + b\_{1} x\_{i}$
     83 
     84 $residual\_{i} = y\_{i} - \hat{y}\_{i} = y\_{i} - (b\_{0} + b\_{1} x\_{i})$
     85 
     86 A residual plot is scatterplot of residuals against x values. Should be no obvious pattern in residuals.
     87 
     88 ![](4670b5bf474343b006017ea93ea64fdb.png)