lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

commit f658739aadcc8700a7715b6ff0593be73f014053
parent 8d47712b875cc2f61eedf99ba01c6c57ca003e1f
Author: Alex Balgavy <alex@balgavy.eu>
Date:   Sun, 28 Mar 2021 00:01:43 +0100

Statistical methods notes

Diffstat:
Mcontent/_index.md | 2+-
Dcontent/stats-notes/Continuous probability distribution.html | 15---------------
Acontent/stats-notes/Continuous probability distribution.md | 40++++++++++++++++++++++++++++++++++++++++
Dcontent/stats-notes/Continuous probability distribution.resources/2F7F0CC9-0E00-43EB-BCDA-0440E725BE36.png | 0
Dcontent/stats-notes/Continuous probability distribution.resources/6EA63762-9823-494B-9DD3-1623E3226929.png | 0
Dcontent/stats-notes/Continuous probability distribution.resources/70078989-8600-4CFE-9E4E-7164D0773ED4.png | 0
Dcontent/stats-notes/Continuous probability distribution.resources/97D6FF41-BFE0-4FAC-A44E-A37D3A25BC97.png | 0
Dcontent/stats-notes/Continuous probability distribution.resources/D01B4919-6913-4897-A0C3-FD8809571DF8.png | 0
Dcontent/stats-notes/Continuous probability distribution.resources/D9A79AAB-2E2B-4D2A-B10D-BAA9DE2DCC39.png | 0
Dcontent/stats-notes/Continuous probability distribution.resources/EA4BF44C-A90C-434D-A8A9-ED2F5ED756CB.png | 0
Dcontent/stats-notes/Continuous probability distribution.resources/F5864D1C-0372-40B3-B8ED-6183687DA8E5.png | 0
Dcontent/stats-notes/Continuous probability distribution.resources/F9C40EC3-A503-48D4-BB6A-9BA2A98D78B1.png | 0
Dcontent/stats-notes/Discrete probability distributions.html | 14--------------
Acontent/stats-notes/Discrete probability distributions.md | 26++++++++++++++++++++++++++
Dcontent/stats-notes/Discrete probability distributions.resources/848A0942-2B6E-4B1D-B9FD-37838A13558E.png | 0
Dcontent/stats-notes/Discrete probability distributions.resources/9D698922-6835-4DF6-A48F-B660154609EE.png | 0
Dcontent/stats-notes/Discrete probability distributions.resources/A5980C16-FDAB-4A91-BAA7-659F13C7F697.png | 0
Dcontent/stats-notes/Discrete probability distributions.resources/AFCADD4F-0AD1-4C3F-A2B9-6039157E6FD8.png | 0
Dcontent/stats-notes/Hypothesis testing.html | 33---------------------------------
Dcontent/stats-notes/Hypothesis testing.resources/0E385596-9840-4183-A55A-D0FFEE289664.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/1C23AC31-18FC-4D37-915D-E1247246C251.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/263C0E18-D9DB-4CB0-BB4A-43F1827F4108.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/2B02CEB6-9584-4118-9A46-205A09FC1DB9.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/3414AB71-8448-4162-8DE8-DED5892D0A44.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/3B12EF72-73BF-4CF2-857B-A6DBD194DA91.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/597ED042-0B26-423F-84E4-83B05096226F.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/7A1BCF62-8BA7-41B5-AF4A-F683F306D4CA.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/7EFFA0D7-E377-4B0A-96D5-AD032CC72F30.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/8E7AA75B-D32B-4AAF-B0B0-E56A570B091D.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/933656F9-8F0F-4893-9A44-4A01FBAF7807.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/9BB450C5-1027-455B-9017-181596524970.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/B1078169-092F-4C44-85A6-DB841B1886D8.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/B2717D7E-C672-4C00-B996-39660928F65A.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/B9EA8A3D-3510-4F48-A8D2-119684783B2D.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/C8BDC62C-982B-4A3C-B727-B79C999AEC3D.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/D3588BEF-39B7-40E6-BF7F-64884EA6BB58.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/EBEF3D5C-C99B-49C2-B6D5-D6692FA541F5.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/EE0ABF4A-E76D-4C1C-8D99-24B80A2476AB.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/F36A9926-CAC5-4B8F-891A-B311A4D065E3.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/FAEED6E9-9545-4D2C-8EC6-9A22C7EC43AD.png | 0
Dcontent/stats-notes/Hypothesis testing.resources/FD339E08-317E-4225-B2E8-6349C7EF8937.png | 0
Dcontent/stats-notes/Introduction: Data.html | 13-------------
Dcontent/stats-notes/Introduction: Data.resources/0D52418B-F420-4868-AEB2-BF252B84BC51.png | 0
Dcontent/stats-notes/Introduction: Data.resources/D876FCB8-9423-49FF-97D9-D75D8E9DCDAF.png | 0
Acontent/stats-notes/Introduction_ Data.md | 62++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dcontent/stats-notes/Probability intro.html | 31-------------------------------
Acontent/stats-notes/Probability intro.md | 87+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dcontent/stats-notes/Probability intro.resources/060DFC41-4A01-45EA-80A9-55FACD3529B3.png | 0
Dcontent/stats-notes/Probability intro.resources/10FF60DB-0B9C-42D1-8F63-1771C85CC905.png | 0
Dcontent/stats-notes/Probability intro.resources/18BD58AF-261C-4798-95B0-1B0D1FD84316.png | 0
Dcontent/stats-notes/Probability intro.resources/1FD39172-02F8-4B50-A391-53475092F1BB.png | 0
Dcontent/stats-notes/Probability intro.resources/224D8EB6-BD63-43E5-BDA1-E6EEF63D3988.png | 0
Dcontent/stats-notes/Probability intro.resources/30C6C2F5-24D9-497C-8CF0-E5F30E56C453.png | 0
Dcontent/stats-notes/Probability intro.resources/318A5966-6BCE-42F9-BE72-B9B36A6798D7.png | 0
Dcontent/stats-notes/Probability intro.resources/35164666-FA75-49AB-A84D-6C6FD626E6D1.png | 0
Dcontent/stats-notes/Probability intro.resources/3FD28422-E813-4F7F-8E34-1F23E59D42C5.png | 0
Dcontent/stats-notes/Probability intro.resources/4F559F81-FF2E-497B-9DFA-EC210A10C01F.png | 0
Dcontent/stats-notes/Probability intro.resources/4FDC25D6-CF44-49AC-96BA-1C64C4C2795D.png | 0
Dcontent/stats-notes/Probability intro.resources/73C45B52-3233-4850-A675-FC9270B18B4C.png | 0
Dcontent/stats-notes/Probability intro.resources/82DA0CAB-1ECF-4D59-9F2A-D1234AE1EA04.png | 0
Dcontent/stats-notes/Probability intro.resources/846310A1-4C13-42A3-9F9E-33AD89AA7E0C.png | 0
Dcontent/stats-notes/Probability intro.resources/863448C3-05B6-42E8-99F5-EAD2A017F781.png | 0
Dcontent/stats-notes/Probability intro.resources/8947E446-03F4-4FE2-A919-B60AC846569F.png | 0
Dcontent/stats-notes/Probability intro.resources/9451F0BE-1227-40F8-90F9-38B0DA866992.png | 0
Dcontent/stats-notes/Probability intro.resources/994FA04E-5C98-4E29-8FED-B3F3C95B0563.png | 0
Dcontent/stats-notes/Probability intro.resources/A1FE0B75-7AA4-49D8-9C1C-1830554F6ED3.png | 0
Dcontent/stats-notes/Probability intro.resources/B4FA55EC-6128-4BD7-B20F-607AD531BE96.png | 0
Dcontent/stats-notes/Probability intro.resources/BDA4CF06-F628-462F-8605-7C6BCD8F711E.png | 0
Dcontent/stats-notes/Probability intro.resources/C0823957-9584-4C60-A9E8-9D414EF6B99B.png | 0
Dcontent/stats-notes/Probability intro.resources/C5248918-FC1B-4F58-A428-6AAE812CD000.png | 0
Dcontent/stats-notes/Probability intro.resources/CD098AB5-85D4-400E-8AB7-8B06F5600D5E.png | 0
Dcontent/stats-notes/Probability intro.resources/DB38D8AB-300C-4A63-BDFC-96B0BCF653C0.png | 0
Dcontent/stats-notes/Probability intro.resources/E0C52A8D-5988-4E5F-9DD3-C583246D76B0.png | 0
Dcontent/stats-notes/Probability intro.resources/E52D5FEF-A442-4256-83C9-190BC0FA2DE7.png | 0
Dcontent/stats-notes/Probability intro.resources/E67A439E-4A1D-4171-B02F-07C8B905048C.png | 0
Dcontent/stats-notes/Relationships between variables.html | 16----------------
Dcontent/stats-notes/Relationships between variables.resources/18E0D7AC-18DF-4AB8-B3F5-384049971D0A.png | 0
Dcontent/stats-notes/Relationships between variables.resources/1D4C0EE4-B01F-40B8-B032-F153ABFDB821.png | 0
Dcontent/stats-notes/Relationships between variables.resources/1E0FEC5E-6CCD-43CF-9C55-94EE42D1B04B.png | 0
Dcontent/stats-notes/Relationships between variables.resources/351CFA24-FF67-4EB2-9251-87FFDEE4120E.png | 0
Dcontent/stats-notes/Relationships between variables.resources/710FB15A-F331-4DC1-97D6-28F8F343C01D.png | 0
Dcontent/stats-notes/Relationships between variables.resources/86406A4F-1BB7-48E5-B1DF-0F3E45144091.png | 0
Dcontent/stats-notes/Relationships between variables.resources/8E22EE8D-B71D-416D-8415-4ECBBC119876.png | 0
Dcontent/stats-notes/Relationships between variables.resources/B57D5FB5-3C42-4448-9176-CED13640FC5B.png | 0
Dcontent/stats-notes/Relationships between variables.resources/D34A4C4D-E461-45E7-AEFB-37860E5126A6.png | 0
Dcontent/stats-notes/Relationships between variables.resources/E70E7CBD-825E-4743-9AAB-CF98B110B48B.png | 0
Rcontent/stats-notes/Relationships between variables.resources/3A61D08D-CDAF-400A-A557-A3F3F1F65F68.png -> content/stats-notes/Relationships between variables/4670b5bf474343b006017ea93ea64fdb.png | 0
Rcontent/stats-notes/Relationships between variables.resources/F12DF64D-BE0B-4F8A-8C6E-7D70FE8156EB.png -> content/stats-notes/Relationships between variables/6de852d30c13f092f1d0954f4d21c2c6.png | 0
Acontent/stats-notes/Relationships between variables/index.md | 88+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dcontent/stats-notes/Sampling distributions & estimators.html | 14--------------
Acontent/stats-notes/Sampling distributions & estimators.md | 25+++++++++++++++++++++++++
Dcontent/stats-notes/Sampling distributions & estimators.resources/02603FCF-F8A7-4685-ADDB-F61C740E6D5E.png | 0
Dcontent/stats-notes/Sampling distributions & estimators.resources/46623A86-1281-4C69-A02B-E97D9AE1E158.png | 0
Dcontent/stats-notes/Sampling distributions & estimators.resources/552E86FD-22CC-4049-9704-493C7CC73AA5.png | 0
Dcontent/stats-notes/Sampling distributions & estimators.resources/94A94B8B-C601-4EEA-8405-1A493EA0DD51.png | 0
Dcontent/stats-notes/Sampling distributions & estimators.resources/B849CBF7-AC34-4B58-BABA-D4C32843ED83.png | 0
Dcontent/stats-notes/Sampling distributions & estimators.resources/C79EBBB9-CF36-42A1-AA47-922A5328AF6E.png | 0
Dcontent/stats-notes/Sampling distributions & estimators.resources/F6E65F44-3F47-42A6-BF45-A777BA29888E.png | 0
Dcontent/stats-notes/Summarising data.html | 27---------------------------
Dcontent/stats-notes/Summarising data.resources/1D9573FE-46B1-48A9-BB45-DFE435C80568.png | 0
Dcontent/stats-notes/Summarising data.resources/212AC4F9-52D4-49F3-B2E5-F7CAEF56AC7F.png | 0
Dcontent/stats-notes/Summarising data.resources/91E716FB-58B4-4D1A-921F-73100090781C.png | 0
Dcontent/stats-notes/Summarising data.resources/D1AC7FAB-71A0-4065-A141-87E6BD0B14DA.png | 0
Dcontent/stats-notes/Summarising data.resources/EA168496-8A42-4DF8-AC6C-F0D5811C25BA.png | 0
Dcontent/stats-notes/Summarising data.resources/F67961D8-6174-4803-8055-23510B9410BF.png | 0
Rcontent/stats-notes/Summarising data.resources/5606E2CB-CE6C-4438-9FE8-F9EDA4144CAE.png -> content/stats-notes/Summarising data/121a30a0247a9ef2c8d6f222df0e39ba.png | 0
Rcontent/stats-notes/Summarising data.resources/1025F6BC-BC40-466C-8EA5-D814F6ED68E7.png -> content/stats-notes/Summarising data/1be3b41077a33b1704f30d44a6e6f2a3.png | 0
Rcontent/stats-notes/Summarising data.resources/DB880F6E-D4CA-4F2F-BA06-0A56ECFBDCC0.png -> content/stats-notes/Summarising data/2622ba4db3e301150ce401c70344ceba.png | 0
Rcontent/stats-notes/Summarising data.resources/0885D1E7-DBA9-4897-89EB-7C5416C48486.png -> content/stats-notes/Summarising data/353a35bb43541880822a45b4aedccc33.png | 0
Rcontent/stats-notes/Summarising data.resources/DF440523-F774-4FC5-8D67-99AA4637CC66.png -> content/stats-notes/Summarising data/6d7b91f79d3d9d8dfea9b17bc06a0b94.png | 0
Rcontent/stats-notes/Summarising data.resources/BB20B4B3-C7EE-4AF9-BAEE-8E1F18E5EDDC.png -> content/stats-notes/Summarising data/c712f8daf000f5fb759e01c0e0cae513.png | 0
Rcontent/stats-notes/Summarising data.resources/95ACBC8B-9387-4A03-8A89-475258CBC80A.png -> content/stats-notes/Summarising data/f30ee8b3f6ad23ca7a4a2967d3200a47.png | 0
Acontent/stats-notes/Summarising data/index.md | 85+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dcontent/stats-notes/TOC: Statistical Methods.resources/931F8049-746F-4C2F-AE8B-3FD457025035.png | 0
Dcontent/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/4314E768-B529-4221-BA2A-8D03A5F4E7EE.png | 0
Dcontent/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/6CFA449D-6B83-4CEC-8CB3-1D4F849B6809.png | 0
Dcontent/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/89031541-AB87-4E37-AB8D-104952DB11FE.png | 0
Dcontent/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/8B2F81A1-DA9F-48F7-8096-535BAA746FD5.png | 0
Dcontent/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/A8B8F47E-3843-496F-A6FF-A2C3107D7898.png | 0
Dcontent/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/BE430A6B-D948-4F60-AEA1-ECCFF1757DE6.png | 0
Dcontent/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/C59C5FD9-E7E1-43BA-B91E-9004B43AD0C8.png | 0
Dcontent/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/EAF66274-C9BF-4730-9345-03CD12405C24.png | 0
Dcontent/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/F5A39700-7BFA-4611-B15F-B4B87688B65A.png | 0
Dcontent/stats-notes/Testing characteristics of samples.html | 15---------------
Acontent/stats-notes/Testing characteristics of samples.md | 82+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Acontent/stats-notes/_index.md | 25+++++++++++++++++++++++++
Acontent/stats-notes/hypothesis-testing.md | 136+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dcontent/stats-notes/index.html | 122-------------------------------------------------------------------------------
Dcontent/stats-notes/overview-slides.pdf | 0
Dcontent/stats-notes/sitewide.css | 33---------------------------------
130 files changed, 657 insertions(+), 334 deletions(-)

diff --git a/content/_index.md b/content/_index.md @@ -24,7 +24,7 @@ title = "Alex's university course notes" ## Subject notes: Year 2 * [Data Structures & Algorithms](dsa-notes/) -* [Statistical Methods](https://thezeroalpha.github.io/stats-notes) +* [Statistical Methods](stats-notes) * [Operating Systems](https://thezeroalpha.github.io/os-notes) * [Intelligent Systems](is-notes/) * [Linear Algebra](lin-algebra-notes/) diff --git a/content/stats-notes/Continuous probability distribution.html b/content/stats-notes/Continuous probability distribution.html @@ -1,14 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> -<html><head><link rel="stylesheet" href="sitewide.css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.6 (457297)"/><meta name="altitude" content="-4.231714248657227"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2018-12-16 00:44:23 +0000"/><meta name="latitude" content="52.30035400390625"/><meta name="longitude" content="4.98817026635058"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2018-12-16 01:27:45 +0000"/><title>Continuous probability distribution</title></head><body><h1>Continuous probability distribution</h1><h2>Normal distribution</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Notation:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Continuous%20probability%20distribution.resources/6EA63762-9823-494B-9DD3-1623E3226929.png" height="20" width="95"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Percentile rules:</div><ul><li><div>68%: within one standard deviation from mean</div></li><li><div>95%: within two standard deviations from mean</div></li><li><div>99.7%: within three standard deviations from mean</div></li></ul><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">To find -<img src="Continuous%20probability%20distribution.resources/97D6FF41-BFE0-4FAC-A44E-A37D3A25BC97.png" height="16" width="60"/>:</div><ol><li><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Find z score for x:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Continuous%20probability%20distribution.resources/F5864D1C-0372-40B3-B8ED-6183687DA8E5.png" height="31" width="69"/></span><br/></div></li><li style=""><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Look up the cumulative probability for z.</div></li><li style=""><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"> -<img src="Continuous%20probability%20distribution.resources/2F7F0CC9-0E00-43EB-BCDA-0440E725BE36.png" height="16" width="136"/>. So that’s your answer.</div></li></ol><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Z scores come from distribution -<img src="Continuous%20probability%20distribution.resources/D01B4919-6913-4897-A0C3-FD8809571DF8.png" height="16" width="71"/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Also:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Continuous%20probability%20distribution.resources/D9A79AAB-2E2B-4D2A-B10D-BAA9DE2DCC39.png" height="18" width="187"/></span><br/></div><h3>Central limit theorem</h3><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">If you take sample size -<img src="Continuous%20probability%20distribution.resources/F9C40EC3-A503-48D4-BB6A-9BA2A98D78B1.png" height="13" width="41"/>, sample mean has approx normal distribution:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Continuous%20probability%20distribution.resources/70078989-8600-4CFE-9E4E-7164D0773ED4.png" height="24" width="101"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">useful sometimes:  -<img src="Continuous%20probability%20distribution.resources/EA4BF44C-A90C-434D-A8A9-ED2F5ED756CB.png" height="38" width="73"/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">If the population is already normally distributed, the sample is always normally distributed for any n.</div><h3>How do you know if something is normal?</h3><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div>Use a QQ plot. Put sample quantiles on y axis, theoretical quantiles on x axis.</div><div> -If there’s a linear correlation, sample is normal.</div><div> -In general, you can use QQ plots to compare two distributions/samples.</div></div><div><br/></div></body></html>- \ No newline at end of file diff --git a/content/stats-notes/Continuous probability distribution.md b/content/stats-notes/Continuous probability distribution.md @@ -0,0 +1,40 @@ ++++ +title = 'Continuous probability distribution' +template = 'page-math.html' ++++ +# Continuous probability distribution + +## Normal distribution + +Notation: +$X \sim N(\mu, \sigma^{2})$ + +Percentile rules: +- 68%: within one standard deviation from mean +- 95%: within two standard deviations from mean +- 99.7%: within three standard deviations from mean + +To find P(X ≤ x): +1. Find z score for x: $z = \frac{x - \mu}{\sigma}$ +2. Look up the cumulative probability for z. +3. P(X ≤ x) = P(Z ≤ z). So that’s your answer. + +Z scores come from distribution $Z \sim N(0,1)$ + +Also: P(X > x) = 1 - P(X ≤ x) + +### Central limit theorem + +If you take sample size n ≥ 30, sample mean has approx normal distribution: + +$\bar{X} \sim N(\mu, \frac{\sigma}{\sqrt{n}})$ + +useful sometimes: $\frac{\sigma}{\sqrt{n}} = \sqrt{\frac{\sigma^{2}}{n}}$ + +If the population is already normally distributed, the sample is always normally distributed for any n. + +### How do you know if something is normal? + +Use a QQ plot. Put sample quantiles on y axis, theoretical quantiles on x axis. +If there’s a linear correlation, sample is normal. +In general, you can use QQ plots to compare two distributions/samples. diff --git a/content/stats-notes/Continuous probability distribution.resources/2F7F0CC9-0E00-43EB-BCDA-0440E725BE36.png b/content/stats-notes/Continuous probability distribution.resources/2F7F0CC9-0E00-43EB-BCDA-0440E725BE36.png Binary files differ. diff --git a/content/stats-notes/Continuous probability distribution.resources/6EA63762-9823-494B-9DD3-1623E3226929.png b/content/stats-notes/Continuous probability distribution.resources/6EA63762-9823-494B-9DD3-1623E3226929.png Binary files differ. diff --git a/content/stats-notes/Continuous probability distribution.resources/70078989-8600-4CFE-9E4E-7164D0773ED4.png b/content/stats-notes/Continuous probability distribution.resources/70078989-8600-4CFE-9E4E-7164D0773ED4.png Binary files differ. diff --git a/content/stats-notes/Continuous probability distribution.resources/97D6FF41-BFE0-4FAC-A44E-A37D3A25BC97.png b/content/stats-notes/Continuous probability distribution.resources/97D6FF41-BFE0-4FAC-A44E-A37D3A25BC97.png Binary files differ. diff --git a/content/stats-notes/Continuous probability distribution.resources/D01B4919-6913-4897-A0C3-FD8809571DF8.png b/content/stats-notes/Continuous probability distribution.resources/D01B4919-6913-4897-A0C3-FD8809571DF8.png Binary files differ. diff --git a/content/stats-notes/Continuous probability distribution.resources/D9A79AAB-2E2B-4D2A-B10D-BAA9DE2DCC39.png b/content/stats-notes/Continuous probability distribution.resources/D9A79AAB-2E2B-4D2A-B10D-BAA9DE2DCC39.png Binary files differ. diff --git a/content/stats-notes/Continuous probability distribution.resources/EA4BF44C-A90C-434D-A8A9-ED2F5ED756CB.png b/content/stats-notes/Continuous probability distribution.resources/EA4BF44C-A90C-434D-A8A9-ED2F5ED756CB.png Binary files differ. diff --git a/content/stats-notes/Continuous probability distribution.resources/F5864D1C-0372-40B3-B8ED-6183687DA8E5.png b/content/stats-notes/Continuous probability distribution.resources/F5864D1C-0372-40B3-B8ED-6183687DA8E5.png Binary files differ. diff --git a/content/stats-notes/Continuous probability distribution.resources/F9C40EC3-A503-48D4-BB6A-9BA2A98D78B1.png b/content/stats-notes/Continuous probability distribution.resources/F9C40EC3-A503-48D4-BB6A-9BA2A98D78B1.png Binary files differ. diff --git a/content/stats-notes/Discrete probability distributions.html b/content/stats-notes/Discrete probability distributions.html @@ -1,13 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> -<html><head><link rel="stylesheet" href="sitewide.css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.6 (457297)"/><meta name="altitude" content="-4.208069801330566"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2018-12-16 00:43:27 +0000"/><meta name="latitude" content="52.30035400390625"/><meta name="longitude" content="4.988170682800604"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2018-12-16 01:27:36 +0000"/><title>Discrete probability distributions</title></head><body><h1>Discrete probability distributions</h1><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div>experiment: possible outcomes, probability of outcome</div><div> -random variable: possible values, probability of value</div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Tables, e.g. for dice:</div><table style="border-collapse: collapse; min-width: 100%;"><colgroup><col style="width: 130px;"/><col style="width: 130px;"/></colgroup><tbody><tr><td style="width: 130px; padding: 8px; border: 1px solid;">x</td><td style="width: 130px; padding: 8px; border: 1px solid;"><div> -<img src="Discrete%20probability%20distributions.resources/848A0942-2B6E-4B1D-B9FD-37838A13558E.png" height="16" width="60"/></div></td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">1</td><td style="width: 130px; padding: 8px; border: 1px solid;"><div> -<img src="Discrete%20probability%20distributions.resources/A5980C16-FDAB-4A91-BAA7-659F13C7F697.png" height="31" width="8"/></div></td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">2</td><td style="width: 130px; padding: 8px; border: 1px solid;"><div> -<img src="Discrete%20probability%20distributions.resources/A5980C16-FDAB-4A91-BAA7-659F13C7F697.png" height="31" width="8"/></div></td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">3</td><td style="width: 130px; padding: 8px; border: 1px solid;"><div> -<img src="Discrete%20probability%20distributions.resources/A5980C16-FDAB-4A91-BAA7-659F13C7F697.png" height="31" width="8"/></div></td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">4</td><td style="width: 130px; padding: 8px; border: 1px solid;"><div> -<img src="Discrete%20probability%20distributions.resources/A5980C16-FDAB-4A91-BAA7-659F13C7F697.png" height="31" width="8"/></div></td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">5</td><td style="width: 130px; padding: 8px; border: 1px solid;"><div> -<img src="Discrete%20probability%20distributions.resources/A5980C16-FDAB-4A91-BAA7-659F13C7F697.png" height="31" width="8"/></div></td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">6</td><td style="width: 130px; padding: 8px; border: 1px solid;"><div> -<img src="Discrete%20probability%20distributions.resources/A5980C16-FDAB-4A91-BAA7-659F13C7F697.png" height="31" width="8"/></div></td></tr></tbody></table><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div> -<img src="Discrete%20probability%20distributions.resources/9D698922-6835-4DF6-A48F-B660154609EE.png" height="16" width="34"/>: sum of (x times its probability)</div><div><br/></div><div><span style="font-size: 16px;"> -<img src="Discrete%20probability%20distributions.resources/AFCADD4F-0AD1-4C3F-A2B9-6039157E6FD8.png" height="48" width="239"/></span></div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">in english: to get variance, sum the (square of each value times its probability) and subtract the square of the mean.</div></body></html>- \ No newline at end of file diff --git a/content/stats-notes/Discrete probability distributions.md b/content/stats-notes/Discrete probability distributions.md @@ -0,0 +1,26 @@ ++++ +title = 'Discrete probability distributions' +template = 'page-math.html' ++++ + +# Discrete probability distributions + +experiment: possible outcomes, probability of outcome +random variable: possible values, probability of value +Tables, e.g. for dice: + +<table> + <tr><th>x</th><th>P(X = x)</th></tr> + <tr><td>1</td><td>$\frac{1}{6}$</td></tr> + <tr><td>2</td><td>$\frac{1}{6}$</td></tr> + <tr><td>3</td><td>$\frac{1}{6}$</td></tr> + <tr><td>4</td><td>$\frac{1}{6}$</td></tr> + <tr><td>5</td><td>$\frac{1}{6}$</td></tr> + <tr><td>6</td><td>$\frac{1}{6}$</td></tr> +</table> + +E(X): sum of (x times its probability) + +$Var(X) = \sum_{i}^{k} \lbrack x_{i}^{2} P(X = x_{i}) \rbrack - \mu^{2}$ + +in english: to get variance, sum the (square of each value times its probability) and subtract the square of the mean. diff --git a/content/stats-notes/Discrete probability distributions.resources/848A0942-2B6E-4B1D-B9FD-37838A13558E.png b/content/stats-notes/Discrete probability distributions.resources/848A0942-2B6E-4B1D-B9FD-37838A13558E.png Binary files differ. diff --git a/content/stats-notes/Discrete probability distributions.resources/9D698922-6835-4DF6-A48F-B660154609EE.png b/content/stats-notes/Discrete probability distributions.resources/9D698922-6835-4DF6-A48F-B660154609EE.png Binary files differ. diff --git a/content/stats-notes/Discrete probability distributions.resources/A5980C16-FDAB-4A91-BAA7-659F13C7F697.png b/content/stats-notes/Discrete probability distributions.resources/A5980C16-FDAB-4A91-BAA7-659F13C7F697.png Binary files differ. diff --git a/content/stats-notes/Discrete probability distributions.resources/AFCADD4F-0AD1-4C3F-A2B9-6039157E6FD8.png b/content/stats-notes/Discrete probability distributions.resources/AFCADD4F-0AD1-4C3F-A2B9-6039157E6FD8.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.html b/content/stats-notes/Hypothesis testing.html @@ -1,33 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> -<html><head><link rel="stylesheet" href="sitewide.css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.6 (457297)"/><meta name="altitude" content="-4.208069801330566"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2018-12-16 00:43:30 +0000"/><meta name="latitude" content="52.30035400390625"/><meta name="longitude" content="4.988170682800604"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2018-12-16 01:28:09 +0000"/><title>Hypothesis testing</title></head><body><h1>Hypothesis testing</h1><div style="-en-paragraph:true;">If -<img src="Hypothesis%20testing.resources/7A1BCF62-8BA7-41B5-AF4A-F683F306D4CA.png" height="8" width="8"/> is known, use Z scores. If not, use T scores and -<img src="Hypothesis%20testing.resources/9BB450C5-1027-455B-9017-181596524970.png" height="10" width="13"/> (or if sample size is below 30).</div><div style="-en-paragraph:true;"><br/></div><h2>The steps</h2><ol><li><div style="-en-paragraph:true;">Choose population parameter</div></li><li><div style="-en-paragraph:true;">Formulate null and alternative hypotheses. Choose significance level.</div></li><ul><li><div style="-en-paragraph:true;">H0: parameter = some value</div></li><li><div style="-en-paragraph:true;">HA: depends, can be two-tailed or one-tailed</div></li><ul><li><div>one-tailed: param &lt; value or param -&gt; -value</div></li><li><div>two-tailed: param ≠ value</div></li></ul></ul><li><div style="-en-paragraph:true;">Collect data.</div></li><li><div style="-en-paragraph:true;">Choose test statistic (based on parameter) and identify its distribution under H0H_0H0</div></li><li><div style="-en-paragraph:true;">Calculate value of test statistic.</div></li><li><div style="-en-paragraph:true;">Find p-value, or critical region based on significance.</div></li></ol><ul><li><div>watch out for the critical region. if two-tailed test, have to divide significance by 2 first.</div></li></ul><ol start="7"><li><div>Decide whether or not to reject the null hypothesis: -</div></li><ul><li><div>p-value: -</div></li><ul><li><div>if p-value ≤ significance, reject</div></li><li><div>otherwise, fail to reject</div></li></ul><li><div>critical values: -</div></li><ul><li><div>if Z-score or T-score not in critical region, fail to reject</div></li><li><div>otherwise, reject</div></li></ul></ul></ol><div style="-en-paragraph:true;"><span style="font-weight: bold;"><br/></span></div><div style="-en-paragraph:true;"><span style="font-weight: bold;">YOU NEVER ACCEPT HYPOTHESES</span></div><div style="-en-paragraph:true;"><span style="font-weight: bold;"><br/></span></div><h2>Errors in testing</h2><table style="border-collapse: collapse; min-width: 100%;"><colgroup><col style="width: 130px;"/><col style="width: 130px;"/><col style="width: 130px;"/></colgroup><tbody><tr><td style="width: 130px; padding: 8px; border: 1px solid;" /><td style="width: 130px; padding: 8px; border: 1px solid;">H0 true</td><td style="width: 130px; padding: 8px; border: 1px solid;">H0 false</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">reject H0</td><td style="width: 130px; padding: 8px; border: 1px solid;">Type I</td><td style="width: 130px; padding: 8px; border: 1px solid;">fine</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">not reject H0</td><td style="width: 130px; padding: 8px; border: 1px solid;">fine</td><td style="width: 130px; padding: 8px; border: 1px solid;">type II</td></tr></tbody></table><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<br/><img src="Hypothesis%20testing.resources/933656F9-8F0F-4893-9A44-4A01FBAF7807.png" height="18" width="273"/></span></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<br><img src="Hypothesis%20testing.resources/3414AB71-8448-4162-8DE8-DED5892D0A44.png" height="18" width="561"/></span></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"><br/></span></div><h2>Proportion test</h2><div style="-en-paragraph:true;">test statistic:</div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Hypothesis%20testing.resources/0E385596-9840-4183-A55A-D0FFEE289664.png" height="56" width="94"/></span></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"><br/></span></div><h2>Mean test</h2><div style="-en-paragraph:true;"><span style="font-weight: bold;">Test statistic </span><span style="font-style: italic; font-weight: bold;">iff</span><span style="font-weight: bold;"> </span><b><span style="-en-paragraph:true;">σ</span></b><b> </b><span style="font-weight: bold;"> known:</span></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Hypothesis%20testing.resources/597ED042-0B26-423F-84E4-83B05096226F.png" height="46" width="86"/></span></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;">has standard normal distribution under null hypothesis.</div><div style="-en-paragraph:true;"><div><span style="font-weight: bold;"><br/></span></div><div><span style="font-weight: bold;">Test statistic otherwise:</span></div><div> -basically just replace σ with its estimator  -<img src="Hypothesis%20testing.resources/1C23AC31-18FC-4D37-915D-E1247246C251.png" height="31" width="21"/></div></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Hypothesis%20testing.resources/3B12EF72-73BF-4CF2-857B-A6DBD194DA91.png" height="46" width="87"/></span></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;">has t-distribution with n−1 degrees of freedom under null hypothesis.</div><div style="-en-paragraph:true;"><span style="font-weight: bold;"><br/></span></div><div style="-en-paragraph:true;"><span style="font-weight: bold;">Confidence interval (1−α) for μ:</span></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Hypothesis%20testing.resources/FAEED6E9-9545-4D2C-8EC6-9A22C7EC43AD.png" height="35" width="245"/></span></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;">What does -<img src="Hypothesis%20testing.resources/EE0ABF4A-E76D-4C1C-8D99-24B80A2476AB.png" height="15" width="48"/> mean? Well, we need a t-score, with n−1 degrees of freedom. Divide significance by 2 because α is the full area (both tails) and since we’re adding/subtracting a t-score, we want to find the score corresponding to the area in one tail.</div><div style="-en-paragraph:true;"><br/></div><h2>Two samples</h2><h3>Dependent</h3><div style="-en-paragraph:true;">dependent: values in one sample are related to values in the other sample, or form natural matched pairs</div><div style="-en-paragraph:true;"><div>to test, we look at the <span style="font-style: italic;">difference</span> of means.</div><div> -null hypothesis can be either no difference, or that difference is a certain value. alternative hypothesis can basically be whatever.</div></div><div style="-en-paragraph:true;">calculate the differences for each x, then have a sample mean of differences -<img src="Hypothesis%20testing.resources/C8BDC62C-982B-4A3C-B727-B79C999AEC3D.png" height="14" width="12"/> and standard deviation of differences -<img src="Hypothesis%20testing.resources/7EFFA0D7-E377-4B0A-96D5-AD032CC72F30.png" height="10" width="13"/>.</div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;">test statistic:</div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Hypothesis%20testing.resources/FD339E08-317E-4225-B2E8-6349C7EF8937.png" height="46" width="140"/></span></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;">which under null hypothesis has t-distribution with n−1 degrees of freedom.</div><h3>Independent</h3><div style="-en-paragraph:true;">independent: no relationship between two samples</div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;"><div><span style="font-weight: bold;">Assuming equal σ</span></div><div><br/></div><div> -if sample randomly drawn from same population, we assume that -<img src="Hypothesis%20testing.resources/B2717D7E-C672-4C00-B996-39660928F65A.png" height="10" width="46"/>.</div></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;">test statistic:</div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Hypothesis%20testing.resources/8E7AA75B-D32B-4AAF-B0B0-E56A570B091D.png" height="53" width="206"/></span></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"><br/></span></div><div style="-en-paragraph:true;">the pooled sample variance is:</div><div style="-en-paragraph:true;"><span style="font-size: 16px;"><br/></span></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Hypothesis%20testing.resources/263C0E18-D9DB-4CB0-BB4A-43F1827F4108.png" height="40" width="200"/></span></div><div style="-en-paragraph:true;"><span style="font-weight: bold;"><br/></span></div><div style="-en-paragraph:true;"><span style="font-weight: bold;">Not assuming equal σ</span></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;">test statistic:</div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Hypothesis%20testing.resources/2B02CEB6-9584-4118-9A46-205A09FC1DB9.png" height="53" width="198"/></span></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;">which under null hypothesis has t-distribution with -<img src="Hypothesis%20testing.resources/B1078169-092F-4C44-85A6-DB841B1886D8.png" height="10" width="8"/> degrees of freedom. -<img src="Hypothesis%20testing.resources/B1078169-092F-4C44-85A6-DB841B1886D8.png" height="10" width="8"/> at the exam is the smallest of the two sample sizes.</div><div style="-en-paragraph:true;"><br/></div><h2>Two proportions</h2><div style="-en-paragraph:true;">H0: p<sub>1</sub> = p<sub>2</sub></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;">test statistic:</div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Hypothesis%20testing.resources/D3588BEF-39B7-40E6-BF7F-64884EA6BB58.png" height="53" width="160"/></span></div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;">(1−α) CI for p<sub>1</sub>−p<sub>2</sub>:</div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;"> -<img src="Hypothesis%20testing.resources/B9EA8A3D-3510-4F48-A8D2-119684783B2D.png" height="16" width="82"/> where</div><div style="-en-paragraph:true;"><br/></div><div style="-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Hypothesis%20testing.resources/F36A9926-CAC5-4B8F-891A-B311A4D065E3.png" height="50" width="266"/></span></div></body></html> diff --git a/content/stats-notes/Hypothesis testing.resources/0E385596-9840-4183-A55A-D0FFEE289664.png b/content/stats-notes/Hypothesis testing.resources/0E385596-9840-4183-A55A-D0FFEE289664.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/1C23AC31-18FC-4D37-915D-E1247246C251.png b/content/stats-notes/Hypothesis testing.resources/1C23AC31-18FC-4D37-915D-E1247246C251.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/263C0E18-D9DB-4CB0-BB4A-43F1827F4108.png b/content/stats-notes/Hypothesis testing.resources/263C0E18-D9DB-4CB0-BB4A-43F1827F4108.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/2B02CEB6-9584-4118-9A46-205A09FC1DB9.png b/content/stats-notes/Hypothesis testing.resources/2B02CEB6-9584-4118-9A46-205A09FC1DB9.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/3414AB71-8448-4162-8DE8-DED5892D0A44.png b/content/stats-notes/Hypothesis testing.resources/3414AB71-8448-4162-8DE8-DED5892D0A44.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/3B12EF72-73BF-4CF2-857B-A6DBD194DA91.png b/content/stats-notes/Hypothesis testing.resources/3B12EF72-73BF-4CF2-857B-A6DBD194DA91.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/597ED042-0B26-423F-84E4-83B05096226F.png b/content/stats-notes/Hypothesis testing.resources/597ED042-0B26-423F-84E4-83B05096226F.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/7A1BCF62-8BA7-41B5-AF4A-F683F306D4CA.png b/content/stats-notes/Hypothesis testing.resources/7A1BCF62-8BA7-41B5-AF4A-F683F306D4CA.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/7EFFA0D7-E377-4B0A-96D5-AD032CC72F30.png b/content/stats-notes/Hypothesis testing.resources/7EFFA0D7-E377-4B0A-96D5-AD032CC72F30.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/8E7AA75B-D32B-4AAF-B0B0-E56A570B091D.png b/content/stats-notes/Hypothesis testing.resources/8E7AA75B-D32B-4AAF-B0B0-E56A570B091D.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/933656F9-8F0F-4893-9A44-4A01FBAF7807.png b/content/stats-notes/Hypothesis testing.resources/933656F9-8F0F-4893-9A44-4A01FBAF7807.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/9BB450C5-1027-455B-9017-181596524970.png b/content/stats-notes/Hypothesis testing.resources/9BB450C5-1027-455B-9017-181596524970.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/B1078169-092F-4C44-85A6-DB841B1886D8.png b/content/stats-notes/Hypothesis testing.resources/B1078169-092F-4C44-85A6-DB841B1886D8.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/B2717D7E-C672-4C00-B996-39660928F65A.png b/content/stats-notes/Hypothesis testing.resources/B2717D7E-C672-4C00-B996-39660928F65A.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/B9EA8A3D-3510-4F48-A8D2-119684783B2D.png b/content/stats-notes/Hypothesis testing.resources/B9EA8A3D-3510-4F48-A8D2-119684783B2D.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/C8BDC62C-982B-4A3C-B727-B79C999AEC3D.png b/content/stats-notes/Hypothesis testing.resources/C8BDC62C-982B-4A3C-B727-B79C999AEC3D.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/D3588BEF-39B7-40E6-BF7F-64884EA6BB58.png b/content/stats-notes/Hypothesis testing.resources/D3588BEF-39B7-40E6-BF7F-64884EA6BB58.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/EBEF3D5C-C99B-49C2-B6D5-D6692FA541F5.png b/content/stats-notes/Hypothesis testing.resources/EBEF3D5C-C99B-49C2-B6D5-D6692FA541F5.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/EE0ABF4A-E76D-4C1C-8D99-24B80A2476AB.png b/content/stats-notes/Hypothesis testing.resources/EE0ABF4A-E76D-4C1C-8D99-24B80A2476AB.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/F36A9926-CAC5-4B8F-891A-B311A4D065E3.png b/content/stats-notes/Hypothesis testing.resources/F36A9926-CAC5-4B8F-891A-B311A4D065E3.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/FAEED6E9-9545-4D2C-8EC6-9A22C7EC43AD.png b/content/stats-notes/Hypothesis testing.resources/FAEED6E9-9545-4D2C-8EC6-9A22C7EC43AD.png Binary files differ. diff --git a/content/stats-notes/Hypothesis testing.resources/FD339E08-317E-4225-B2E8-6349C7EF8937.png b/content/stats-notes/Hypothesis testing.resources/FD339E08-317E-4225-B2E8-6349C7EF8937.png Binary files differ. diff --git a/content/stats-notes/Introduction: Data.html b/content/stats-notes/Introduction: Data.html @@ -1,12 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> -<html><head><link rel="stylesheet" href="sitewide.css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.6 (457297)"/><meta name="altitude" content="-4.208069801330566"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2018-12-16 00:43:31 +0000"/><meta name="latitude" content="52.30035400390625"/><meta name="longitude" content="4.988170682800604"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2018-12-16 01:27:21 +0000"/><title>Introduction: Data</title></head><body><h1>Introduction: Data</h1><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div>statistics: the science of data - collecting, organising, analysing, interpreting, presenting</div><div> -sample: a selected subcollection from the population</div></div><h2>Collecting sample data</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">concepts:</div><ul><li><div>variables: -</div></li><ul><li><div>independent: might cause the effect being studied</div></li><li><div>dependent: represents the effect being studied</div></li></ul><li><div>confounding: when there’s too many variables and you have no clue wtf is causing the effect</div></li></ul><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">sampling methods:</div><ul><li><div>voluntary response: subjects decide to be included</div></li><li><div>random: each <span style="font-style: italic;">member</span> from population has equal probability to be selected</div></li><li><div>simple random: each <span style="font-style: italic;">sample of size n</span> has equal probability to be selected</div></li><li><div>systematic: after starting point, select every k-th member (based on a system)</div></li><li><div>convenience: choose what’s convenient</div></li><li><div>startified: split population into subgroups with same characteristics, simple random sample each group</div></li><li><div>cluster: split population into clusters, then randomly select some of them</div></li></ul><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">types of studies:</div><ul><li><div>observational study: subjects observed, not modified -</div></li><ul><li><div>retrospective: data from past</div></li><li><div>cross-sectional: data from one point in time</div></li><li><div>prospective: data to be collected (future)</div></li></ul><li><div>experiment: some treatment applied to subjects -</div></li><ul><li><div>sometimes control and treatment group</div></li><li><div>gotta watch out for placebo and observer effects</div></li></ul></ul><h2>Types of data</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">What to do with data?</div><ul><li><div>parameter: numerical measurement of <span style="font-style: italic;">population</span> (in Greek: -<img src="Introduction%3A%20Data.resources/D876FCB8-9423-49FF-97D9-D75D8E9DCDAF.png" height="11" width="52"/>)</div></li><li><div>statistic: numerical measurement of <span style="font-style: italic;">sample</span> (in English: -<img src="Introduction%3A%20Data.resources/0D52418B-F420-4868-AEB2-BF252B84BC51.png" height="13" width="50"/>)</div></li></ul><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">data can be:</div><ul><li><div>qualitative: names or labels (strings)</div></li><li><div>quantitative: numbers (ints, floats) -</div></li><ul><li><div>discrete: countable</div></li><li><div>continuous: not countable (on a continuous scale like length, weight, distance)</div></li></ul></ul><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">you have different levels of measurement:</div><ul><li><div>qualitative: -</div></li><ul><li><div>nominal: no ordering (gender, eye color)</div></li><li><div>ordinal: ordering, but differences between categories have no meaning (e.g. agree/disagree)</div></li></ul><li><div>quantitative: -</div></li><ul><li><div>interval: ordering, differences, but no natural zero point (year of birth, temperatures in F/C)</div></li><li><div>ratio: ordering, differences, natural zero point (body length, marathon times)</div></li></ul></ul><div><br/></div></body></html>- \ No newline at end of file diff --git a/content/stats-notes/Introduction: Data.resources/0D52418B-F420-4868-AEB2-BF252B84BC51.png b/content/stats-notes/Introduction: Data.resources/0D52418B-F420-4868-AEB2-BF252B84BC51.png Binary files differ. diff --git a/content/stats-notes/Introduction: Data.resources/D876FCB8-9423-49FF-97D9-D75D8E9DCDAF.png b/content/stats-notes/Introduction: Data.resources/D876FCB8-9423-49FF-97D9-D75D8E9DCDAF.png Binary files differ. diff --git a/content/stats-notes/Introduction_ Data.md b/content/stats-notes/Introduction_ Data.md @@ -0,0 +1,62 @@ ++++ +title = 'Introduction: Data' +template = 'page-math.html' ++++ + +# Introduction: Data + +statistics: the science of data - collecting, organising, analysing, interpreting, presenting + +sample: a selected subcollection from the population + +## Collecting sample data + +concepts: + +- variables: + - independent: might cause the effect being studied + - dependent: represents the effect being studied +- confounding: when there’s too many variables and you have no clue wtf is causing the effect + +sampling methods: + +- voluntary response: subjects decide to be included +- random: each *member* from population has equal probability to be selected +- simple random: each *sample of size n* has equal probability to be selected +- systematic: after starting point, select every k-th member (based on a system) +- convenience: choose what’s convenient +- startified: split population into subgroups with same characteristics, simple random sample each group +- cluster: split population into clusters, then randomly select some of them + +types of studies: + +- observational study: subjects observed, not modified + - retrospective: data from past + - cross-sectional: data from one point in time + - prospective: data to be collected (future) +- experiment: some treatment applied to subjects + - sometimes control and treatment group + - gotta watch out for placebo and observer effects + +## Types of data + +What to do with data? + +- parameter: numerical measurement of *population* (in Greek: μ, σ, ...) +- statistic: numerical measurement of *sample* (in English: $\bar{x}$, s, ...) + +data can be: + +- qualitative: names or labels (strings) +- quantitative: numbers (ints, floats) + - discrete: countable + - continuous: not countable (on a continuous scale like length, weight, distance) + +you have different levels of measurement: + +- qualitative: + - nominal: no ordering (gender, eye color) + - ordinal: ordering, but differences between categories have no meaning (e.g. agree/disagree) +- quantitative: + - interval: ordering, differences, but no natural zero point (year of birth, temperatures in F/C) + - ratio: ordering, differences, natural zero point (body length, marathon times) diff --git a/content/stats-notes/Probability intro.html b/content/stats-notes/Probability intro.html @@ -1,30 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> -<html><head><link rel="stylesheet" href="sitewide.css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.6 (457297)"/><meta name="altitude" content="-4.208069801330566"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2018-12-16 00:43:31 +0000"/><meta name="latitude" content="52.30035400390625"/><meta name="longitude" content="4.988170682800604"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2018-12-16 01:27:31 +0000"/><title>Probability intro</title></head><body><h1>Probability intro</h1><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div>sample space: set of all possible outcomes</div><div><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/82DA0CAB-1ECF-4D59-9F2A-D1234AE1EA04.png" height="16" width="115"/></span></div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div>event: collection of outcomes (capital letters)</div><div><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/E0C52A8D-5988-4E5F-9DD3-C583246D76B0.png" height="18" width="267"/></span></div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div>probability measure: value between 0 and 1</div><div><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/BDA4CF06-F628-462F-8605-7C6BCD8F711E.png" height="34" width="153"/></span></div></div><ul><li><div>P(A) = 0: event is impossible</div></li><li><div>P(A) = 1: event is certain</div></li></ul><div><br/></div><h2>Determining probability</h2><ol><li><div>Estimate with relative frequency:</div></li></ol><div style="margin-top: 1em; margin-bottom: 1em; margin-left: 40px;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/3FD28422-E813-4F7F-8E34-1F23E59D42C5.png" height="99" width="343"/></span><br/></div><ol start="2"><li><div>Theoretical approach: make a probability model</div></li><li><div>Subjective approach: estimate P(A) based on intuition/experience</div></li></ol><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Finding P(A) for discrete case:</div><ol><li><div>Find sample space <span style="font-size: 16px;"><img src="Probability%20intro.resources/35164666-FA75-49AB-A84D-6C6FD626E6D1.png" height="13" width="11"/></span></div></li><li><div>Determine probabilities -<img src="Probability%20intro.resources/DB38D8AB-300C-4A63-BDFC-96B0BCF653C0.png" height="16" width="30"/> for all  -<img src="Probability%20intro.resources/C5248918-FC1B-4F58-A428-6AAE812CD000.png" height="13" width="37"/></div></li><ul><li><div>if all equally likely, then -<img src="Probability%20intro.resources/318A5966-6BCE-42F9-BE72-B9B36A6798D7.png" height="16" width="75"/> where N is number of outcomes in  -<img src="Probability%20intro.resources/4FDC25D6-CF44-49AC-96BA-1C64C4C2795D.png" height="12" width="10"/></div></li></ul><li><div>Determine which outcomes are in A</div></li><li><div>Compute P(A) by </div></li></ol><div style="margin-left: 40px;"><span style="font-size: 16px;"><br/></span></div><div style="margin-left: 40px;"><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/4F559F81-FF2E-497B-9DFA-EC210A10C01F.png" height="37" width="137"/></span></div><div style=""><font style="font-size: 14px;"><br/></font></div><h2>Probability rules:</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">“At least one”:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/73C45B52-3233-4850-A675-FC9270B18B4C.png" height="18" width="210"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Addition rule (A and B):</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/8947E446-03F4-4FE2-A919-B60AC846569F.png" height="18" width="268"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Complement (not A):</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/994FA04E-5C98-4E29-8FED-B3F3C95B0563.png" height="19" width="121"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/30C6C2F5-24D9-497C-8CF0-E5F30E56C453.png" height="19" width="212"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Conditional probability (B given A):</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/10FF60DB-0B9C-42D1-8F63-1771C85CC905.png" height="40" width="145"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/CD098AB5-85D4-400E-8AB7-8B06F5600D5E.png" height="18" width="199"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/18BD58AF-261C-4798-95B0-1B0D1FD84316.png" height="19" width="144"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/B4FA55EC-6128-4BD7-B20F-607AD531BE96.png" height="19" width="156"/></span> <span style="font-weight: bold;">&nbsp;&nbsp;&nbsp;&nbsp;(NOT IF COMPLEMENT IS IN CONDITION)</span></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Disjoint events (mutually exclusive):</div><ul><li><div><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/863448C3-05B6-42E8-99F5-EAD2A017F781.png" height="18" width="181"/></span></div></li><li><div><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/E52D5FEF-A442-4256-83C9-190BC0FA2DE7.png" height="18" width="96"/></span></div></li></ul><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Independent events:</div><ul><li><div><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/1FD39172-02F8-4B50-A391-53475092F1BB.png" height="18" width="181"/></span></div></li><li><div><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/A1FE0B75-7AA4-49D8-9C1C-1830554F6ED3.png" height="18" width="111"/></span></div></li></ul><div><span style="font-size: 16px;"><br/></span></div><h2>Bayes theorem</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Forget that complicated-ass formula. You literally never need to use it.</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">For example, given these values:</div><ul><li><div><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/224D8EB6-BD63-43E5-BDA1-E6EEF63D3988.png" height="18" width="86"/></span></div></li><li><div><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/E67A439E-4A1D-4171-B02F-07C8B905048C.png" height="19" width="87"/></span></div></li><li><div><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/9451F0BE-1227-40F8-90F9-38B0DA866992.png" height="18" width="97"/></span></div></li><li><div><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/060DFC41-4A01-45EA-80A9-55FACD3529B3.png" height="19" width="105"/></span></div></li></ul><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">You need to calculate -<img src="Probability%20intro.resources/846310A1-4C13-42A3-9F9E-33AD89AA7E0C.png" height="16" width="48"/>. Use conditional probability and do some rewriting:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Probability%20intro.resources/C0823957-9584-4C60-A9E8-9D414EF6B99B.png" height="312" width="338"/></span><br/></div></body></html>- \ No newline at end of file diff --git a/content/stats-notes/Probability intro.md b/content/stats-notes/Probability intro.md @@ -0,0 +1,87 @@ ++++ +title = 'Probability intro' +template = 'page-math.html' ++++ + +# Probability intro + +sample space: set of all possible outcomes +- Ω = 1,2,3,4,5,6 + +event: collection of outcomes (capital letters) +- A = {even number thrown} = {2,4,6} + +probability measure: value between 0 and 1 +- $P(A) = P(2,4,6 = \frac{1}{2})$ +- P(A) = 0: event is impossible +- P(A) = 1: event is certain + +## Determining probability + +1. Estimate with relative frequency: + $\begin{aligned} + P(A) &= \frac{\text{number of occurrences of A}}{\text{number of times procedure was repeated}} \\\\ + &= \frac{\text{successes}}{\text{total number of tries}} + \end{aligned}$ + +2. Theoretical approach: make a probability model +3. Subjective approach: estimate P(A) based on intuition/experience + +Finding P(A) for discrete case: +1. Find sample space Ω + +2. Determine probabilities P(ω) for all ω ∈ Ω + + - if all equally likely, then P(ω) = 1/N where N is number of outcomes in Ω + +3. Determine which outcomes are in A +4. Compute P(A) by + +$P(A) = \sum_{\omega :\; \omega \in A} P(\omega)$ + +## Probability rules: + +“At least one”: P(at least one) = 1 - P(none) + +Addition rule (A and B): $P(A \cup B) = P(A) + P(B) - P(A \cap B)$ + +Complement (not A): +- $P(\bar{A}) = 1 - P(A)$ +- $P(A) = P(B \cap A) + P(\bar{B} \cap A)$ + +Conditional probability (B given A): +- $P(B | A) = \frac{P(A \cap B)}{P(A)}$ +- $P(A \cap B) = P(A | B) \times P(B)$ +- $P(B) + = P(B | \bar{A}) \times \bar{A}$ +- $P(B | A) + P(\bar{B} | A) = 1$ (**NOT IF COMPLEMENT IS IN CONDITION**) + +Disjoint events (mutually exclusive): +- $P(A \cup B) = P(A) + P(B)$ +- $P (A \cap B) = 0$ + +Independent events: +- $P(A \cap B) = P(A) \times P(B)$ +- $P(B|A) = P(B)$ + +## Bayes theorem + +Forget that complicated-ass formula. You literally never need to use it. +For example, given these values: +- P(A) = 0.01 +- $P(\bar{A})$ = 0.99 +- $P(X|A)$ = 0.9 +- $P(X|\bar{A})$ = 0.08 + +You need to calculate $P(A|X)$. Use conditional probability and do some rewriting: + +$\begin{aligned} +P(A|X) &= \frac{P(A \cap X)}{P(X)}\\\\ +P(X) &= P(A \cap X) + P(\bar{A} \cap X)\\\\ +\therefore P(A|X) &= \frac{P(A \cap X)}{P(A \cap X) + P(\bar{A} \cap X)} \\\\ +P(A \cap X) &= P(X \cap A) \\\\ + &= P(X|A) \times P(A) \\\\ +\therefore P(A|X) &= \frac{P(X|A) \times P(A)}{P(X|A) \times P(A) + P(X|\bar{A}) \times P(\bar{A})} \\\\ + &= \frac{0.9 \times 0.01}{0.9 \times 0.01 + 0.08 \times 0.99} \\\\ + &= 0.1020408163 \approx 0.1 +\end{aligned}$ + diff --git a/content/stats-notes/Probability intro.resources/060DFC41-4A01-45EA-80A9-55FACD3529B3.png b/content/stats-notes/Probability intro.resources/060DFC41-4A01-45EA-80A9-55FACD3529B3.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/10FF60DB-0B9C-42D1-8F63-1771C85CC905.png b/content/stats-notes/Probability intro.resources/10FF60DB-0B9C-42D1-8F63-1771C85CC905.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/18BD58AF-261C-4798-95B0-1B0D1FD84316.png b/content/stats-notes/Probability intro.resources/18BD58AF-261C-4798-95B0-1B0D1FD84316.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/1FD39172-02F8-4B50-A391-53475092F1BB.png b/content/stats-notes/Probability intro.resources/1FD39172-02F8-4B50-A391-53475092F1BB.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/224D8EB6-BD63-43E5-BDA1-E6EEF63D3988.png b/content/stats-notes/Probability intro.resources/224D8EB6-BD63-43E5-BDA1-E6EEF63D3988.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/30C6C2F5-24D9-497C-8CF0-E5F30E56C453.png b/content/stats-notes/Probability intro.resources/30C6C2F5-24D9-497C-8CF0-E5F30E56C453.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/318A5966-6BCE-42F9-BE72-B9B36A6798D7.png b/content/stats-notes/Probability intro.resources/318A5966-6BCE-42F9-BE72-B9B36A6798D7.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/35164666-FA75-49AB-A84D-6C6FD626E6D1.png b/content/stats-notes/Probability intro.resources/35164666-FA75-49AB-A84D-6C6FD626E6D1.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/3FD28422-E813-4F7F-8E34-1F23E59D42C5.png b/content/stats-notes/Probability intro.resources/3FD28422-E813-4F7F-8E34-1F23E59D42C5.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/4F559F81-FF2E-497B-9DFA-EC210A10C01F.png b/content/stats-notes/Probability intro.resources/4F559F81-FF2E-497B-9DFA-EC210A10C01F.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/4FDC25D6-CF44-49AC-96BA-1C64C4C2795D.png b/content/stats-notes/Probability intro.resources/4FDC25D6-CF44-49AC-96BA-1C64C4C2795D.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/73C45B52-3233-4850-A675-FC9270B18B4C.png b/content/stats-notes/Probability intro.resources/73C45B52-3233-4850-A675-FC9270B18B4C.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/82DA0CAB-1ECF-4D59-9F2A-D1234AE1EA04.png b/content/stats-notes/Probability intro.resources/82DA0CAB-1ECF-4D59-9F2A-D1234AE1EA04.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/846310A1-4C13-42A3-9F9E-33AD89AA7E0C.png b/content/stats-notes/Probability intro.resources/846310A1-4C13-42A3-9F9E-33AD89AA7E0C.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/863448C3-05B6-42E8-99F5-EAD2A017F781.png b/content/stats-notes/Probability intro.resources/863448C3-05B6-42E8-99F5-EAD2A017F781.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/8947E446-03F4-4FE2-A919-B60AC846569F.png b/content/stats-notes/Probability intro.resources/8947E446-03F4-4FE2-A919-B60AC846569F.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/9451F0BE-1227-40F8-90F9-38B0DA866992.png b/content/stats-notes/Probability intro.resources/9451F0BE-1227-40F8-90F9-38B0DA866992.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/994FA04E-5C98-4E29-8FED-B3F3C95B0563.png b/content/stats-notes/Probability intro.resources/994FA04E-5C98-4E29-8FED-B3F3C95B0563.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/A1FE0B75-7AA4-49D8-9C1C-1830554F6ED3.png b/content/stats-notes/Probability intro.resources/A1FE0B75-7AA4-49D8-9C1C-1830554F6ED3.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/B4FA55EC-6128-4BD7-B20F-607AD531BE96.png b/content/stats-notes/Probability intro.resources/B4FA55EC-6128-4BD7-B20F-607AD531BE96.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/BDA4CF06-F628-462F-8605-7C6BCD8F711E.png b/content/stats-notes/Probability intro.resources/BDA4CF06-F628-462F-8605-7C6BCD8F711E.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/C0823957-9584-4C60-A9E8-9D414EF6B99B.png b/content/stats-notes/Probability intro.resources/C0823957-9584-4C60-A9E8-9D414EF6B99B.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/C5248918-FC1B-4F58-A428-6AAE812CD000.png b/content/stats-notes/Probability intro.resources/C5248918-FC1B-4F58-A428-6AAE812CD000.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/CD098AB5-85D4-400E-8AB7-8B06F5600D5E.png b/content/stats-notes/Probability intro.resources/CD098AB5-85D4-400E-8AB7-8B06F5600D5E.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/DB38D8AB-300C-4A63-BDFC-96B0BCF653C0.png b/content/stats-notes/Probability intro.resources/DB38D8AB-300C-4A63-BDFC-96B0BCF653C0.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/E0C52A8D-5988-4E5F-9DD3-C583246D76B0.png b/content/stats-notes/Probability intro.resources/E0C52A8D-5988-4E5F-9DD3-C583246D76B0.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/E52D5FEF-A442-4256-83C9-190BC0FA2DE7.png b/content/stats-notes/Probability intro.resources/E52D5FEF-A442-4256-83C9-190BC0FA2DE7.png Binary files differ. diff --git a/content/stats-notes/Probability intro.resources/E67A439E-4A1D-4171-B02F-07C8B905048C.png b/content/stats-notes/Probability intro.resources/E67A439E-4A1D-4171-B02F-07C8B905048C.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.html b/content/stats-notes/Relationships between variables.html @@ -1,15 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> -<html><head><link rel="stylesheet" href="sitewide.css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.6 (457297)"/><meta name="altitude" content="-4.208069801330566"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2018-12-16 00:43:31 +0000"/><meta name="latitude" content="52.30035400390625"/><meta name="longitude" content="4.988170682800604"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2018-12-16 01:28:13 +0000"/><title>Relationships between variables</title></head><body><h1>Relationships between variables</h1><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">relationship can be investigated, causality can’t.</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">graphically, you can use scatterplots:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Relationships%20between%20variables.resources/F12DF64D-BE0B-4F8A-8C6E-7D70FE8156EB.png" height="562" width="578"/><br/></div><h2>Correlation</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">correlation: if values of two variables are somehow associated with each other</div><ul><li><div>positive: higher values of variable 1 are usually associated with higher values of variable 2</div></li><li><div>negative: higher values of variable 1 are usually associated with lower values of variable 2</div></li></ul><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">linear if the plotted points are basically a straight line.</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">population linear correlation coefficient is ρ.</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">sample linear correlation coefficient (estimator for ρ\rhoρ) is:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Relationships%20between%20variables.resources/8E22EE8D-B71D-416D-8415-4ECBBC119876.png" height="40" width="251"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">interpreting r:</div><ul><li><div>r = 1: perfect positive linear relationship</div></li><li><div>r &gt;0: positive linear relationship</div></li><li><div>r ≈ 0: no linear relationship (doesn’t mean no relationship!!)</div></li><li><div>r &lt; 0: negative linear relationship</div></li><li><div>r = −1: perfect negative linear relationship</div></li></ul><h3>Testing ρ = 0</h3><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">test statistic:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Relationships%20between%20variables.resources/1D4C0EE4-B01F-40B8-B032-F153ABFDB821.png" height="51" width="93"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">has under H0: ρ = 0 a t-distribution with n−2 degrees of freedom.</div><h2>Regression</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">if there’s a correlation, points can be described by line  -<img src="Relationships%20between%20variables.resources/86406A4F-1BB7-48E5-B1DF-0F3E45144091.png" height="14" width="142"/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">regression equation is  -<img src="Relationships%20between%20variables.resources/E70E7CBD-825E-4743-9AAB-CF98B110B48B.png" height="14" width="75"/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">where b<sub>0</sub> and b<sub>1</sub> are least-squares estimates of β<sub>0</sub> and β<sub>1</sub></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">you want values that satisfy least-squares property (i.e. minimise -<img src="Relationships%20between%20variables.resources/18E0D7AC-18DF-4AB8-B3F5-384049971D0A.png" height="32" width="144"/>)</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Relationships%20between%20variables.resources/351CFA24-FF67-4EB2-9251-87FFDEE4120E.png" height="56" width="216"/></span><br/></div><h3>Testing linearity</h3><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Test:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">H0: β<sub>1</sub> = 0</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">HA: β<sub>1</sub> ≠ 0</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">The score is:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Relationships%20between%20variables.resources/710FB15A-F331-4DC1-97D6-28F8F343C01D.png" height="39" width="58"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">(realisation of test statistic T<sub>β</sub> that has t-distribution with n−2 degrees of freedom under H0)</div><h3>Coefficient of determination</h3><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Coefficient of determination is proportion of variation in y variable that regression equation can explain:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Relationships%20between%20variables.resources/B57D5FB5-3C42-4448-9176-CED13640FC5B.png" height="35" width="171"/></span><br/></div><h3>Residuals</h3><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div>To check for a fixed standard deviation, make a residual plot.</div><div> -Residuals are estimates for the errors.</div><div> -residual: difference between observed y<sub>i</sub> and predicted value  -<img src="Relationships%20between%20variables.resources/D34A4C4D-E461-45E7-AEFB-37860E5126A6.png" height="14" width="83"/></div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Relationships%20between%20variables.resources/1E0FEC5E-6CCD-43CF-9C55-94EE42D1B04B.png" height="18" width="258"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">A residual plot is scatterplot of residuals against x values. Should be no obvious pattern in residuals.</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Relationships%20between%20variables.resources/3A61D08D-CDAF-400A-A557-A3F3F1F65F68.png" height="422" width="847"/><br/></div><div><br/></div></body></html>- \ No newline at end of file diff --git a/content/stats-notes/Relationships between variables.resources/18E0D7AC-18DF-4AB8-B3F5-384049971D0A.png b/content/stats-notes/Relationships between variables.resources/18E0D7AC-18DF-4AB8-B3F5-384049971D0A.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.resources/1D4C0EE4-B01F-40B8-B032-F153ABFDB821.png b/content/stats-notes/Relationships between variables.resources/1D4C0EE4-B01F-40B8-B032-F153ABFDB821.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.resources/1E0FEC5E-6CCD-43CF-9C55-94EE42D1B04B.png b/content/stats-notes/Relationships between variables.resources/1E0FEC5E-6CCD-43CF-9C55-94EE42D1B04B.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.resources/351CFA24-FF67-4EB2-9251-87FFDEE4120E.png b/content/stats-notes/Relationships between variables.resources/351CFA24-FF67-4EB2-9251-87FFDEE4120E.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.resources/710FB15A-F331-4DC1-97D6-28F8F343C01D.png b/content/stats-notes/Relationships between variables.resources/710FB15A-F331-4DC1-97D6-28F8F343C01D.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.resources/86406A4F-1BB7-48E5-B1DF-0F3E45144091.png b/content/stats-notes/Relationships between variables.resources/86406A4F-1BB7-48E5-B1DF-0F3E45144091.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.resources/8E22EE8D-B71D-416D-8415-4ECBBC119876.png b/content/stats-notes/Relationships between variables.resources/8E22EE8D-B71D-416D-8415-4ECBBC119876.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.resources/B57D5FB5-3C42-4448-9176-CED13640FC5B.png b/content/stats-notes/Relationships between variables.resources/B57D5FB5-3C42-4448-9176-CED13640FC5B.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.resources/D34A4C4D-E461-45E7-AEFB-37860E5126A6.png b/content/stats-notes/Relationships between variables.resources/D34A4C4D-E461-45E7-AEFB-37860E5126A6.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.resources/E70E7CBD-825E-4743-9AAB-CF98B110B48B.png b/content/stats-notes/Relationships between variables.resources/E70E7CBD-825E-4743-9AAB-CF98B110B48B.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.resources/3A61D08D-CDAF-400A-A557-A3F3F1F65F68.png b/content/stats-notes/Relationships between variables/4670b5bf474343b006017ea93ea64fdb.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables.resources/F12DF64D-BE0B-4F8A-8C6E-7D70FE8156EB.png b/content/stats-notes/Relationships between variables/6de852d30c13f092f1d0954f4d21c2c6.png Binary files differ. diff --git a/content/stats-notes/Relationships between variables/index.md b/content/stats-notes/Relationships between variables/index.md @@ -0,0 +1,88 @@ ++++ +title = 'Relationships between variables' +template = 'page-math.html' ++++ + +# Relationships between variables + +relationship can be investigated, causality can’t. +graphically, you can use scatterplots: + +![](6de852d30c13f092f1d0954f4d21c2c6.png) + +## Correlation + +correlation: if values of two variables are somehow associated with each other + +- positive: higher values of variable 1 are usually associated with higher values of variable 2 +- negative: higher values of variable 1 are usually associated with lower values of variable 2 + +linear if the plotted points are basically a straight line. +population linear correlation coefficient is ρ. +sample linear correlation coefficient (estimator for ρhoρ) is: + +$r = \frac{1}{n-1} \times \frac{\sum_{i=1} n(x_{i} - \bar{x})(y_{i} - \bar{y})}{s_{x} s_{y}}$ + +interpreting r: + +- r = 1: perfect positive linear relationship +- r >0: positive linear relationship +- r ≈ 0: no linear relationship (doesn’t mean no relationship!!) +- r < 0: negative linear relationship +- r = −1: perfect negative linear relationship + +### Testing ρ = 0 + +test statistic: + +$T_{p} = \frac{R - \rho}{\sqrt{\frac{1 - R^{2}}{n-1}}}$ + +has under H0: ρ = 0 a t-distribution with n−2 degrees of freedom. + +## Regression + +if there’s a correlation, points can be described by line  +$y_{i} = \beta_{0} + \beta_{1} x_{i} + error_{i}$ + +regression equation is $\hat{y} = b_{0} + b_{1} x$ + +where b₀ and b₁ are least-squares estimates of β₀ and β₁ + +you want values that satisfy least-squares property (i.e. minimise $\sum_{i} (observed - model)^{2}$) + + +$\begin{aligned} +b_{1} &= r \frac{s_{y}}{s_{x}} &&\text{(the slope)} \\\\ +b_0 &= \hat{y} - b_{1} \bar{x} &&\text{(the y intercept)} +\end{aligned}$ + +### Testing linearity + +Test: +- H0: β1 = 0 +- HA: β1 ≠ 0 + +The score is: + +$t_{\beta} = \frac{b_{1}}{s_{b_{1}}}$ + +(realisation of test statistic $T_{\beta}$ that has t-distribution with n−2 degrees of freedom under H₀) + +### Coefficient of determination + +Coefficient of determination is proportion of variation in y variable that regression equation can explain: + +$r^{2} = \frac{\text{explained variations}}{\text{total variation}}$ + +### Residuals + +To check for a fixed standard deviation, make a residual plot. +Residuals are estimates for the errors. + +residual: difference between observed yi and predicted value $\hat{y}\_{i} = b\_{0} + b\_{1} x\_{i}$ + +$residual\_{i} = y\_{i} - \hat{y}\_{i} = y\_{i} - (b\_{0} + b\_{1} x\_{i})$ + +A residual plot is scatterplot of residuals against x values. Should be no obvious pattern in residuals. + +![](4670b5bf474343b006017ea93ea64fdb.png) diff --git a/content/stats-notes/Sampling distributions & estimators.html b/content/stats-notes/Sampling distributions & estimators.html @@ -1,13 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> -<html><head><link rel="stylesheet" href="sitewide.css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.6 (457297)"/><meta name="altitude" content="-4.208069801330566"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2018-12-16 00:43:31 +0000"/><meta name="latitude" content="52.30035400390625"/><meta name="longitude" content="4.988170682800604"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2018-12-16 01:27:55 +0000"/><title>Sampling distributions &amp; estimators</title></head><body><h1>Sampling distributions &amp; estimators</h1><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div>sampling distribution of sample mean: probability distribution of random variable  -<img src="Sampling%20distributions%20&amp;%20estimators.resources/552E86FD-22CC-4049-9704-493C7CC73AA5.png" height="16" width="19"/></div><div> -sampling distribution of sample proportion: probability distribution of  -<img src="Sampling%20distributions%20&amp;%20estimators.resources/02603FCF-F8A7-4685-ADDB-F61C740E6D5E.png" height="17" width="15"/></div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">a sample proportion is  -<img src="Sampling%20distributions%20&amp;%20estimators.resources/F6E65F44-3F47-42A6-BF45-A777BA29888E.png" height="31" width="176"/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"> -<img src="Sampling%20distributions%20&amp;%20estimators.resources/C79EBBB9-CF36-42A1-AA47-922A5328AF6E.png" height="31" width="124"/>, with p the number of successes</div><h2>Confidence intervals</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">a way to estimate stuff. e.g. a 95% confidence interval means we are 95% confident that this interval has a true value of -<img src="Sampling%20distributions%20&amp;%20estimators.resources/46623A86-1281-4C69-A02B-E97D9AE1E158.png" height="11" width="8"/>.</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Sampling%20distributions%20&amp;%20estimators.resources/94A94B8B-C601-4EEA-8405-1A493EA0DD51.png" height="35" width="109"/></span><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div>Z is the Z-score for the confidence level you want (find this with a table).</div><div> -The margin of error is whatever you add to/subtract from the sample mean.</div><div> -To get -<img src="Sampling%20distributions%20&amp;%20estimators.resources/B849CBF7-AC34-4B58-BABA-D4C32843ED83.png" height="10" width="13"/>, you can use the central limit theorem.</div></div><div><br/></div></body></html>- \ No newline at end of file diff --git a/content/stats-notes/Sampling distributions & estimators.md b/content/stats-notes/Sampling distributions & estimators.md @@ -0,0 +1,25 @@ ++++ +title = 'Sampling distributions & estimators' +template = 'page-math.html' ++++ + +# Sampling distributions & estimators + +sampling distribution of sample mean: probability distribution of random variable $\bar{X}_{n}$ + +sampling distribution of sample proportion: probability distribution of $\hat{P}_{n}$ + +a sample proportion is $\frac{\text{number of successes}}{\text{total number of observations}}$ + +$\hat{P}_{n} \sim N(p, \frac{p(1-p)}{n})$, with p the number of successes + +## Confidence intervals + +a way to estimate stuff. e.g. a 95% confidence interval means we are 95% confident that this interval has a true value of μ. + +$CI = \bar{x}_{n} \pm z \frac{s_n}{\sqrt{n}} $ + +Z is the Z-score for the confidence level you want (find this with a table). +The margin of error is whatever you add to/subtract from the sample mean. + +To get $s_{n}$, you can use the central limit theorem. diff --git a/content/stats-notes/Sampling distributions & estimators.resources/02603FCF-F8A7-4685-ADDB-F61C740E6D5E.png b/content/stats-notes/Sampling distributions & estimators.resources/02603FCF-F8A7-4685-ADDB-F61C740E6D5E.png Binary files differ. diff --git a/content/stats-notes/Sampling distributions & estimators.resources/46623A86-1281-4C69-A02B-E97D9AE1E158.png b/content/stats-notes/Sampling distributions & estimators.resources/46623A86-1281-4C69-A02B-E97D9AE1E158.png Binary files differ. diff --git a/content/stats-notes/Sampling distributions & estimators.resources/552E86FD-22CC-4049-9704-493C7CC73AA5.png b/content/stats-notes/Sampling distributions & estimators.resources/552E86FD-22CC-4049-9704-493C7CC73AA5.png Binary files differ. diff --git a/content/stats-notes/Sampling distributions & estimators.resources/94A94B8B-C601-4EEA-8405-1A493EA0DD51.png b/content/stats-notes/Sampling distributions & estimators.resources/94A94B8B-C601-4EEA-8405-1A493EA0DD51.png Binary files differ. diff --git a/content/stats-notes/Sampling distributions & estimators.resources/B849CBF7-AC34-4B58-BABA-D4C32843ED83.png b/content/stats-notes/Sampling distributions & estimators.resources/B849CBF7-AC34-4B58-BABA-D4C32843ED83.png Binary files differ. diff --git a/content/stats-notes/Sampling distributions & estimators.resources/C79EBBB9-CF36-42A1-AA47-922A5328AF6E.png b/content/stats-notes/Sampling distributions & estimators.resources/C79EBBB9-CF36-42A1-AA47-922A5328AF6E.png Binary files differ. diff --git a/content/stats-notes/Sampling distributions & estimators.resources/F6E65F44-3F47-42A6-BF45-A777BA29888E.png b/content/stats-notes/Sampling distributions & estimators.resources/F6E65F44-3F47-42A6-BF45-A777BA29888E.png Binary files differ. diff --git a/content/stats-notes/Summarising data.html b/content/stats-notes/Summarising data.html @@ -1,26 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> -<html><head><link rel="stylesheet" href="sitewide.css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.6 (457297)"/><meta name="altitude" content="-4.208069801330566"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2018-12-16 00:43:33 +0000"/><meta name="latitude" content="52.30035400390625"/><meta name="longitude" content="4.988170682800604"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2018-12-16 01:27:26 +0000"/><title>Summarising data</title></head><body><h1>Summarising data</h1><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div><span style="font-weight: bold;">data distribution:</span> we want to know what the data looks like</div><div> -a good summary needs to show location, spread, range, extremes, gaps/holes, symmetry, etc.</div></div><h2>Graphical summaries</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-weight: bold;">Frequency distribution (table)</span></div><table style="border-collapse: collapse; min-width: 100%;"><colgroup><col style="width: 130px;"/><col style="width: 130px;"/></colgroup><tbody><tr><td style="width: 130px; padding: 8px; border: 1px solid;">Grade</td><td style="width: 130px; padding: 8px; border: 1px solid;">Frequency</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">5</td><td style="width: 130px; padding: 8px; border: 1px solid;">2</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">6</td><td style="width: 130px; padding: 8px; border: 1px solid;">1</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">7</td><td style="width: 130px; padding: 8px; border: 1px solid;">3</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">8</td><td style="width: 130px; padding: 8px; border: 1px solid;">2</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">9</td><td style="width: 130px; padding: 8px; border: 1px solid;">1</td></tr><tr><td style="width: 130px; padding: 8px; border: 1px solid;">10</td><td style="width: 130px; padding: 8px; border: 1px solid;">2</td></tr></tbody></table><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-weight: bold;">Bar chart</span></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Summarising%20data.resources/1025F6BC-BC40-466C-8EA5-D814F6ED68E7.png" height="357" width="752"/><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div><span style="font-weight: bold;">Pareto bar chart</span></div><div> -orders categories based on frequency. only for nominal level of measurement</div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Summarising%20data.resources/DF440523-F774-4FC5-8D67-99AA4637CC66.png" height="293" width="743"/><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div><span style="font-weight: bold;">Pie chart</span></div><div> -size of pieces of pie shows frequency of category.</div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Summarising%20data.resources/BB20B4B3-C7EE-4AF9-BAEE-8E1F18E5EDDC.png" height="213" width="230"/><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div><span style="font-weight: bold;">Histogram</span></div><div> -size of bar shows frequency of that category.</div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Summarising%20data.resources/95ACBC8B-9387-4A03-8A89-475258CBC80A.png" height="315" width="789"/><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><div><span style="font-weight: bold;">Time series</span></div><div> -shows quantity that varies over time.</div></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><img src="Summarising%20data.resources/0885D1E7-DBA9-4897-89EB-7C5416C48486.png" height="267" width="760"/><br/></div><h2>Descriptive summaries</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">qualitative description:</div><ul><li><div>shape:</div><div> - </div><div> -<img src="Summarising%20data.resources/5606E2CB-CE6C-4438-9FE8-F9EDA4144CAE.png" height="407" width="783"/></div></li><li><div>location: position on x axis (around 0, around 10, etc.)</div></li><li><div>dispersion: spread out graph == large dispersion</div></li></ul><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">numerical description:</div><ul><li><div>location: measure of center -</div></li><ul><li><div>mean: average (sum everything, divide by the total number)</div></li><li><div>median: sort, find the middle number</div></li><li><div>mode: most often occurring value (highest frequency) -</div></li><ul><li><div>unimodal: unique mode</div></li><li><div>bimodal: two modes</div></li><li><div>multimodal: more than two modes</div></li></ul></ul><li><div>dispersion: -</div></li><ul><li><div>measures of variation -</div></li><ul><li><div>sample standard deviation (how much values deviate from mean) -</div></li><ul><li><div>same units as data (unlike variance)</div></li><li><div>standard deviation is  -<img src="Summarising%20data.resources/EA168496-8A42-4DF8-AC6C-F0D5811C25BA.png" height="16" width="25"/></div></li><li><div> -<img src="Summarising%20data.resources/91E716FB-58B4-4D1A-921F-73100090781C.png" height="33" width="132"/></div></li><li><div>for population: -<img src="Summarising%20data.resources/1D9573FE-46B1-48A9-BB45-DFE435C80568.png" height="16" width="33"/></div></li></ul><li><div>range -</div></li><ul><li><div> -<img src="Summarising%20data.resources/D1AC7FAB-71A0-4065-A141-87E6BD0B14DA.png" height="11" width="145"/></div></li><li><div>sensitive to extreme values</div></li></ul></ul><li><div>relative standing -</div></li><ul><li><div>percentiles, quartiles (special percentiles: Q1, Q2 (median), Q3)</div></li><li><div>IQR: interquartile range =  -<img src="Summarising%20data.resources/212AC4F9-52D4-49F3-B2E5-F7CAEF56AC7F.png" height="14" width="52"/></div></li><li><div>5-number summary: min, Q1, median (Q2), Q3, max -</div></li><ul><li><div>boxplot is graph of this</div></li><li><div>whiskers are lines from box (by default, not more than -<img src="Summarising%20data.resources/F67961D8-6174-4803-8055-23510B9410BF.png" height="14" width="64"/>)</div></li><li><div>outliers: points outside of whiskers</div><div> - </div><div> -<img src="Summarising%20data.resources/DB880F6E-D4CA-4F2F-BA06-0A56ECFBDCC0.png" height="234" width="694"/></div></li></ul></ul></ul></ul><div><br/></div></body></html>- \ No newline at end of file diff --git a/content/stats-notes/Summarising data.resources/1D9573FE-46B1-48A9-BB45-DFE435C80568.png b/content/stats-notes/Summarising data.resources/1D9573FE-46B1-48A9-BB45-DFE435C80568.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/212AC4F9-52D4-49F3-B2E5-F7CAEF56AC7F.png b/content/stats-notes/Summarising data.resources/212AC4F9-52D4-49F3-B2E5-F7CAEF56AC7F.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/91E716FB-58B4-4D1A-921F-73100090781C.png b/content/stats-notes/Summarising data.resources/91E716FB-58B4-4D1A-921F-73100090781C.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/D1AC7FAB-71A0-4065-A141-87E6BD0B14DA.png b/content/stats-notes/Summarising data.resources/D1AC7FAB-71A0-4065-A141-87E6BD0B14DA.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/EA168496-8A42-4DF8-AC6C-F0D5811C25BA.png b/content/stats-notes/Summarising data.resources/EA168496-8A42-4DF8-AC6C-F0D5811C25BA.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/F67961D8-6174-4803-8055-23510B9410BF.png b/content/stats-notes/Summarising data.resources/F67961D8-6174-4803-8055-23510B9410BF.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/5606E2CB-CE6C-4438-9FE8-F9EDA4144CAE.png b/content/stats-notes/Summarising data/121a30a0247a9ef2c8d6f222df0e39ba.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/1025F6BC-BC40-466C-8EA5-D814F6ED68E7.png b/content/stats-notes/Summarising data/1be3b41077a33b1704f30d44a6e6f2a3.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/DB880F6E-D4CA-4F2F-BA06-0A56ECFBDCC0.png b/content/stats-notes/Summarising data/2622ba4db3e301150ce401c70344ceba.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/0885D1E7-DBA9-4897-89EB-7C5416C48486.png b/content/stats-notes/Summarising data/353a35bb43541880822a45b4aedccc33.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/DF440523-F774-4FC5-8D67-99AA4637CC66.png b/content/stats-notes/Summarising data/6d7b91f79d3d9d8dfea9b17bc06a0b94.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/BB20B4B3-C7EE-4AF9-BAEE-8E1F18E5EDDC.png b/content/stats-notes/Summarising data/c712f8daf000f5fb759e01c0e0cae513.png Binary files differ. diff --git a/content/stats-notes/Summarising data.resources/95ACBC8B-9387-4A03-8A89-475258CBC80A.png b/content/stats-notes/Summarising data/f30ee8b3f6ad23ca7a4a2967d3200a47.png Binary files differ. diff --git a/content/stats-notes/Summarising data/index.md b/content/stats-notes/Summarising data/index.md @@ -0,0 +1,85 @@ ++++ +title = 'Summarising data' +template = 'page-math.html' ++++ + +# Summarising data + +**data distribution:** we want to know what the data looks like + +a good summary needs to show location, spread, range, extremes, gaps/holes, symmetry, etc. + +## Graphical summaries + +### Frequency distribution (table) + +| Grade | Frequency | +| --- | --- | +| 5 | 2 | +| 6 | 1 | +| 7 | 3 | +| 8 | 2 | +| 9 | 1 | +| 10 | 2 | + +### Bar chart +![](1be3b41077a33b1704f30d44a6e6f2a3.png) + +### Pareto bar chart +orders categories based on frequency. only for nominal level of measurement + +![](6d7b91f79d3d9d8dfea9b17bc06a0b94.png) + +### Pie chart +size of pieces of pie shows frequency of category. + +![](c712f8daf000f5fb759e01c0e0cae513.png) + +### Histogram +size of bar shows frequency of that category. + +![](f30ee8b3f6ad23ca7a4a2967d3200a47.png) + +### Time series +shows quantity that varies over time. + +![](353a35bb43541880822a45b4aedccc33.png) + +## Descriptive summaries +qualitative description: + +- shape: + + ![](121a30a0247a9ef2c8d6f222df0e39ba.png) + +- location: position on x axis (around 0, around 10, etc.) +- dispersion: spread out graph == large dispersion + +numerical description: + +- location: measure of center + - mean: average (sum everything, divide by the total number) + - median: sort, find the middle number + - mode: most often occurring value (highest frequency) + - unimodal: unique mode + - bimodal: two modes + - multimodal: more than two modes +- dispersion: + - measures of variation + - sample standard deviation (how much values deviate from mean) + - same units as data (unlike variance) + - standard deviation is $\sqrt{s^{2}}$ + - $s^{2} = \frac{\sum_{i=1} n(x_{i} - \bar{x}^{2})}{n-1}$ + - for population: σ², σ + - range + - (minimum - maximum) + - sensitive to extreme values + - relative standing + - percentiles, quartiles (special percentiles: Q1, Q2 (median), Q3) + - IQR: interquartile range = (Q3 - Q1) + - 5-number summary: min, Q1, median (Q2), Q3, max + - boxplot is graph of this + - whiskers are lines from box (by default, not more than 1.5 × IQR + - outliers: points outside of whiskers + +![](2622ba4db3e301150ce401c70344ceba.png) diff --git a/content/stats-notes/TOC: Statistical Methods.resources/931F8049-746F-4C2F-AE8B-3FD457025035.png b/content/stats-notes/TOC: Statistical Methods.resources/931F8049-746F-4C2F-AE8B-3FD457025035.png Binary files differ. diff --git a/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/4314E768-B529-4221-BA2A-8D03A5F4E7EE.png b/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/4314E768-B529-4221-BA2A-8D03A5F4E7EE.png Binary files differ. diff --git a/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/6CFA449D-6B83-4CEC-8CB3-1D4F849B6809.png b/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/6CFA449D-6B83-4CEC-8CB3-1D4F849B6809.png Binary files differ. diff --git a/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/89031541-AB87-4E37-AB8D-104952DB11FE.png b/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/89031541-AB87-4E37-AB8D-104952DB11FE.png Binary files differ. diff --git a/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/8B2F81A1-DA9F-48F7-8096-535BAA746FD5.png b/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/8B2F81A1-DA9F-48F7-8096-535BAA746FD5.png Binary files differ. diff --git a/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/A8B8F47E-3843-496F-A6FF-A2C3107D7898.png b/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/A8B8F47E-3843-496F-A6FF-A2C3107D7898.png Binary files differ. diff --git a/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/BE430A6B-D948-4F60-AEA1-ECCFF1757DE6.png b/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/BE430A6B-D948-4F60-AEA1-ECCFF1757DE6.png Binary files differ. diff --git a/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/C59C5FD9-E7E1-43BA-B91E-9004B43AD0C8.png b/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/C59C5FD9-E7E1-43BA-B91E-9004B43AD0C8.png Binary files differ. diff --git a/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/EAF66274-C9BF-4730-9345-03CD12405C24.png b/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/EAF66274-C9BF-4730-9345-03CD12405C24.png Binary files differ. diff --git a/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/F5A39700-7BFA-4611-B15F-B4B87688B65A.png b/content/stats-notes/Testing characteristics of samples (goodness-of-fit, independence, homogeneity).resources/F5A39700-7BFA-4611-B15F-B4B87688B65A.png Binary files differ. diff --git a/content/stats-notes/Testing characteristics of samples.html b/content/stats-notes/Testing characteristics of samples.html @@ -1,14 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> -<html><head><link rel="stylesheet" href="sitewide.css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.6 (457297)"/><meta name="altitude" content="-4.235000133514404"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2018-12-16 19:24:24 +0000"/><meta name="latitude" content="52.30033088657014"/><meta name="longitude" content="4.988105169232488"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2018-12-16 19:35:28 +0000"/><title>Testing characteristics of samples (goodness-of-fit, independence, homogeneity)</title></head><body><h1>Testing characteristics of samples (goodness-of-fit, independence, homogeneity)</h1><h2>Goodness-of-fit</h2><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Checks if observed freq. distribution fits a claimed distribution.</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Sample size n with k different categories.</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"> -<img src="Testing%20characteristics%20of%20samples%20(goodness-of-fit,%20independence,%20homogeneity).resources/A8B8F47E-3843-496F-A6FF-A2C3107D7898.png" height="14" width="332"/><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"> -<img src="Testing%20characteristics%20of%20samples%20(goodness-of-fit,%20independence,%20homogeneity).resources/BE430A6B-D948-4F60-AEA1-ECCFF1757DE6.png" height="14" width="402"/><br/></div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-style: italic;">O</span><span style="vertical-align: sub; font-style: italic;">i</span> is observed frequency count of category <span style="font-style: italic;">i</span>. -<img src="Testing%20characteristics%20of%20samples%20(goodness-of-fit,%20independence,%20homogeneity).resources/6CFA449D-6B83-4CEC-8CB3-1D4F849B6809.png" height="14" width="69"/> is the expected frequency count.</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;">Test statistic is:</div><div style="margin-top: 1em; margin-bottom: 1em;-en-paragraph:true;"><span style="font-size: 16px;"> -<img src="Testing%20characteristics%20of%20samples%20(goodness-of-fit,%20independence,%20homogeneity).resources/89031541-AB87-4E37-AB8D-104952DB11FE.png" height="45" width="149"/></span><br/></div><div>and has approximately a chi-square distribution with k − 1 degrees of freedom under the null hypothesis.</div><div><br/></div><div>Critical value: </div><ul><li><div>reject null hypothesis if -<img src="Testing%20characteristics%20of%20samples%20(goodness-of-fit,%20independence,%20homogeneity).resources/F5A39700-7BFA-4611-B15F-B4B87688B65A.png" height="20" width="76"/> </div></li><li><div>P value: reject null hypothesis if  -<img src="Testing%20characteristics%20of%20samples%20(goodness-of-fit,%20independence,%20homogeneity).resources/EAF66274-C9BF-4730-9345-03CD12405C24.png" height="18" width="97"/><br/></div></li></ul><div><br/></div><div>test is right-tailed since we need large values of test statistic (even if hypothesis is undirected).</div><div><br/></div><h2>Test of independence</h2><div>When: two variables in a <i>single sample</i></div><div><br/></div><div>you have a contingency table with r row categories and c column categories. checking to see if columns and variables are dependent.</div><div><br/></div><div>H0: row and column variables are independent </div><div>HA: row and column variables are dependent</div><div><br/></div><div>test statistic:</div><div><br/></div><div><span style="font-size: 16px;"> -<img src="Testing%20characteristics%20of%20samples%20(goodness-of-fit,%20independence,%20homogeneity).resources/4314E768-B529-4221-BA2A-8D03A5F4E7EE.png" height="45" width="133"/></span></div><div><br/></div><div>has under H0 approximately a chi-square distribution with (r − 1)(c − 1) degrees of freedom.</div><div><br/></div><div>reject null hypothesis if  -<img src="Testing%20characteristics%20of%20samples%20(goodness-of-fit,%20independence,%20homogeneity).resources/8B2F81A1-DA9F-48F7-8096-535BAA746FD5.png" height="20" width="113"/></div><div><br/></div><h2>Test of homogeneity</h2><div>When: comparing <i>two or more samples</i> to see if they have the same proportions of characteristics.</div><div><br/></div><div>r different populations (rows) and c different categories (columns) of some variable checking for proportions of a characteristic in the populations.</div><div><br/></div><div>H0: different populations have same proportions of some characteristics </div><div>HA: different populations don’t have the same proportions of some characteristics.</div><div><br/></div><div>test statistic:</div><div><br/></div><div><span style="font-size: 16px;"> -<img src="Testing%20characteristics%20of%20samples%20(goodness-of-fit,%20independence,%20homogeneity).resources/4314E768-B529-4221-BA2A-8D03A5F4E7EE.png" height="45" width="133"/></span></div><div><br/></div><div>has under H0 approximately a chi-square distribution with (r − 1)(c − 1) degrees of freedom.</div><div><br/></div><div>reject H0 if observed  -<img src="Testing%20characteristics%20of%20samples%20(goodness-of-fit,%20independence,%20homogeneity).resources/8B2F81A1-DA9F-48F7-8096-535BAA746FD5.png" height="20" width="113"/></div><div><br/></div><h2>Fisher’s exact test for 2-by-2 contingency table</h2><div>either: </div><ul><li><div>H0: row and column variables are independent </div></li><li><div>HA: occurrence of “first column category” is more common in group of “first row category” than in group of “second row category”</div></li></ul><div>or: </div><ul><li><div>H0: populations have same proportion of one characteristic </div></li><li><div>HA: the proportion of the characteristic is bigger/smaller in one population</div></li></ul><div><br/></div><div>test statistic: frequency count in cell (1,1) has under H0 and given marginals a hypergeometric distribution </div><div>parameters  -<img src="Testing%20characteristics%20of%20samples%20(goodness-of-fit,%20independence,%20homogeneity).resources/C59C5FD9-E7E1-43BA-B91E-9004B43AD0C8.png" height="16" width="415"/></div><div><br/></div><div>guess we don’t need to know how to do this manually.</div></body></html>- \ No newline at end of file diff --git a/content/stats-notes/Testing characteristics of samples.md b/content/stats-notes/Testing characteristics of samples.md @@ -0,0 +1,82 @@ ++++ +title = 'Testing characteristics of samples (goodness-of-fit, independence, homogeneity)' +template = 'page-math.html' ++++ + +# Testing characteristics of samples (goodness-of-fit, independence, homogeneity) + +## Goodness-of-fit + +Checks if observed freq. distribution fits a claimed distribution. +Sample size n with k different categories. + +Hypotheses: +- $H_{0}$: frequency counts agree with claimed distribution +- $H_{A}$: frequency counts do not agree with the claimed distribution + +$O_{i}$ is observed frequency count of category *i*. $E_{i} = n \times p_{i}$ is the expected frequency count. + +Test statistic is: +$\chi^{2} = \sum_{i=1}k\frac{(O_{i} - E_{i})^{2}}{E_{i}}$ + +and has approximately a chi-square distribution with k − 1 degrees of freedom under the null hypothesis. + +Critical value: + +- reject null hypothesis if $\chi_{2} > \chi^{2}_{k-1, \alpha}$ +- P value: reject null hypothesis if $P(\chi^{2} \geq x^{2}) < \alpha$ + +test is right-tailed since we need large values of test statistic (even if hypothesis is undirected). + +## Test of independence + +When: two variables in a *single sample* + +you have a contingency table with r row categories and c column categories. checking to see if columns and variables are dependent. + +H0: row and column variables are independent +HA: row and column variables are dependent + +test statistic: + +$\chi^2 = \sum_{cells} \frac{(O-E)^{2}}{E}$ + +has under H0 approximately a chi-square distribution with (r − 1)(c − 1) degrees of freedom. + +reject null hypothesis if $\chi^{2} > \chi^{2}_{(r-1)(e-1), \alpha}$ + +## Test of homogeneity + +When: comparing two or more samples to see if they have the same proportions of characteristics. + +r different populations (rows) and c different categories (columns) of some variable checking for proportions of a characteristic in the populations. + +H0: different populations have same proportions of some characteristics + +HA: different populations don’t have the same proportions of some characteristics. + +test statistic: + +$\chi^{2} = \sum_{cells} \frac{(O-E)^2}{E}$ + +has under H0 approximately a chi-square distribution with (r − 1)(c − 1) degrees of freedom. + +reject H0 if observed $\chi^{2} > \chi^{2}_{(r-1)(e-1),\alpha}$ + +## Fisher’s exact test for 2-by-2 contingency table + +either: + +- H0: row and column variables are independent +- HA: occurrence of “first column category” is more common in group of “first row category” than in group of “second row category” + +or: + +- H0: populations have same proportion of one characteristic +- HA: the proportion of the characteristic is bigger/smaller in one population + +test statistic: frequency count in cell (1,1) has under H0 and given marginals a hypergeometric distribution + +parameters n = (first row total), N = (grand total), and k = (first column total) + +guess we don’t need to know how to do this manually. diff --git a/content/stats-notes/_index.md b/content/stats-notes/_index.md @@ -0,0 +1,25 @@ ++++ +title = 'Statistical Methods' ++++ + +# Statistical Methods + +1. [Introduction: Data](introduction-data) +2. [Summarising data](summarising-data) +3. [Probability intro](probability-intro) +4. [Discrete probability distributions](discrete-probability-distributions) +5. [Continuous probability distribution](continuous-probability-distribution) +6. [Sampling distributions & estimators](sampling-distributions-estimators) +7. [Hypothesis testing](hypothesis-testing) +8. [Relationships between variables](relationships-between-variables) +9. [Testing characteristics of samples (goodness-of-fit, independence, homogeneity)](testing-characteristics-of-samples) + +Overview of statistical methods: + +<table> +<tr><th></th><th>Categorical data</th><th>Numerical data</th></tr> +<tr><td>inference about one population</td><td>confidence interval for p.<br>z test for one proportion p. goodness-of-fit test</td><td>confidence interval for μ.<br>t test for mean</td></tr> +<tr><td>inference about two populations</td><td>confidence interval for p₁ - p₂<br>Z test for two proportions</td><td>t test for matched pairs<br>t test for independent samples</td></tr> +<tr><td>relationship between two variables</td><td>chi-square test of independence<br>Fisher’s exact test</td><td>t test of correlation<br>simple linear regression</td></tr> +<tr><td>comparing more than 2 populations</td><td>chi-square test for homogeneity</td><td> </td></tr> +</table> diff --git a/content/stats-notes/hypothesis-testing.md b/content/stats-notes/hypothesis-testing.md @@ -0,0 +1,136 @@ ++++ +title = 'Hypothesis testing' +template = 'page-math.html' ++++ + +# Hypothesis testing + +If σ is known, use Z scores. If not, use T scores and $s_{n}$ (or if sample size is below 30). + +## The steps + +1. Choose population parameter +2. Formulate null and alternative hypotheses. Choose significance level. + + - H0: parameter = some value + - HA: depends, can be two-tailed or one-tailed + - one-tailed: param < value or param > value + - two-tailed: param ≠ value + +3. Collect data. + +4. Choose test statistic (based on parameter) and identify its distribution under H0 + +5. Calculate value of test statistic. +6. Find p-value, or critical region based on significance. + + - watch out for the critical region. if one-tailed test, have to divide significance by 2 first. + +1. Decide whether or not to reject the null hypothesis: + + - p-value: + - if p-value ≤ significance, reject + - otherwise, fail to reject + - critical values: + - if Z-score or T-score not in critical region, fail to reject + - otherwise, reject + +**YOU NEVER ACCEPT HYPOTHESES** + +## Errors in testing + +| | | | +| --- | --- | --- | +| | H0 true | H0 false | +| reject H0 | Type I | fine | +| not reject H0 | fine | type II | + +- P(Type I error) = α (significance level) +- P(Type II error) = β (depends on sample size and actual population parameter) + +## Proportion test + +test statistic: + +$Z = \frac{\hat{P}_{n} - p}{\sqrt{\frac{p(1-p)}{n}}}$ + +## Mean test + +**Test statistic iff σ known:** + +$Z = \frac{\bar{X}_{n} - \mu}{\frac{\sigma}{\sqrt{n}}}$ + +has standard normal distribution under null hypothesis. + +**Test statistic otherwise:** + +basically just replace σ with its estimator $\frac{s_n}{\sqrt{n}}$ + +$T = \frac{\bar{X}_{n} - \mu}{\frac{s_n}{\sqrt{n}}}$ + +has t-distribution with n−1 degrees of freedom under null hypothesis. + +**Confidence interval (1−α) for μ:** + + +$\text{lower, upper} = \bar{x}\_{n} \pm t\_{n-1, \alpha/2} \times \frac{s_n}{\sqrt{n}}$ + +What does $t_{n-1, \alpha / 2}$ mean? Well, we need a t-score, with n−1 degrees of freedom. Divide significance by 2 because α is the full area (both tails) and since we’re adding/subtracting a t-score, we want to find the score corresponding to the area in one tail. + +## Two samples + +### Dependent + +dependent: values in one sample are related to values in the other sample, or form natural matched pairs + +to test, we look at the *difference* of means. + +null hypothesis can be either no difference, or that difference is a certain value. alternative hypothesis can basically be whatever. + +calculate the differences for each x, then have a sample mean of differences $\bar{D}$ and standard deviation of differences $s_{d}$. + +test statistic: + +$T_{d} = \frac{\bar{D} - (\mu_{1} - \mu_{2})}{\frac{s_{d}}{\sqrt{n}}}$ + +which under null hypothesis has t-distribution with n−1 degrees of freedom. + +### Independent + +independent: no relationship between two samples + +#### Assuming equal σ + +if sample randomly drawn from same population, we assume that σ₁ = σ₂. + +test statistic: + +$T\_{2}^{eq} = \frac{(\bar{X}\_{1} - \bar{X}\_{2}) - (\mu\_{1} - \mu\_{2})}{\sqrt{\frac{s^{2}\_{p}}{n\_{1}} + \frac{s^{2}\_{p}}{n\_{2}}}}$ + +the pooled sample variance is: + +$s\_{p}^{2} = \frac{(n\_{1} - 1) s\_{1}^{2} + (n\_{2} - 1) s\_{2}^{2}}{n\_{1} + n\_{2} - 2}$ + +#### Not assuming equal σ + +test statistic: + +$T\_{2} = \frac{(\bar{X}\_{1} - \bar{X}\_{2}) - (\mu\_{1} - \mu\_{2})}{\sqrt{\frac{s\_{1}^{2}}{n\_{1}} + \frac{s\_{2}^{2}}{n\_{2}}}}$ + +which under null hypothesis has t-distribution with $\bar{n}$ degrees of freedom. $\bar{n}$ at the exam is the smallest of the two sample sizes. + +## Two proportions + +H0: p1 = p2 + +test statistic: + +$z\_{p} = \frac{(\hat{p}\_{1} - \hat{p}\_{2})}{\sqrt{\frac{\bar{p} (1-\bar{p})}{n\_{1}} + \frac{\bar{p}(1-\bar{p})}{n\_{2}}}}$ + +(1−α) CI for p1−p2: + +$(\hat{p}_{1} - \hat{p}_{2}) \pm E$ where + +$E = z\_{\alpha / 2} \times \sqrt{\frac{\hat{p}\_{1} (1-\hat{p}\_{1})}{n\_{1}} + \frac{\hat{p}\_{2} (1-\hat{p}\_{2})}{n\_{2}}}$ + +P(Type I error) = α (significance level) diff --git a/content/stats-notes/index.html b/content/stats-notes/index.html @@ -1,122 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> -<html> - -<head> - <link rel="stylesheet" href="sitewide.css"> - <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> - <meta name="exporter-version" content="Evernote Mac 7.6 (457297)" /> - <meta name="altitude" content="-4.231714248657227" /> - <meta name="author" content="Alex Balgavy" /> - <meta name="created" content="2018-12-16 00:43:36 +0000" /> - <meta name="latitude" content="52.30035400390625" /> - <meta name="longitude" content="4.98817026635058" /> - <meta name="source" content="desktop.mac" /> - <meta name="updated" content="2018-12-16 01:28:18 +0000" /> - <title>TOC: Statistical Methods</title> -</head> - -<body> - <nav> -<a href="http://thezeroalpha.github.io">Homepage</a> -</nav> - - <h1>TOC: Statistical Methods</h1> - <h3 align="center">Alex Balgavy</h3> - <ol> - <li> - <div><a href="Introduction%3A Data.html">Introduction: Data</a></div> - </li> - <li> - <div><a href="Summarising data.html">Summarising data</a></div> - </li> - <li> - <div><a href="Probability intro.html">Probability intro</a></div> - </li> - <li> - <div><a href="Discrete probability distributions.html">Discrete probability distributions</a></div> - </li> - <li> - <div><a href="Continuous probability distribution.html">Continuous probability distribution</a></div> - </li> - <li> - <div><a href="Sampling distributions &amp; estimators.html">Sampling distributions &amp; estimators</a></div> - </li> - <li> - <div><a href="Hypothesis testing.html">Hypothesis testing</a></div> - </li> - <li> - <div><a href="Relationships between variables.html">Relationships between variables</a></div> - </li> - <li> - <div><a href="Testing characteristics of samples.html">Testing characteristics of samples</a></div> - </li> - </ol> - <div style="margin-top: 1em;margin-bottom: 1em;-en-paragraph:true;">Overview of statistical methods:</div> - <table style="border-collapse: collapse; min-width: 100%;"> - <colgroup> - <col style="width: 130px;" /> - <col style="width: 130px;" /> - <col style="width: 130px;" /> - </colgroup> - <tbody> - <tr> - <td style="width: 130px; padding: 8px; border: 1px solid;"> </td> - <td style="width: 130px; padding: 8px; border: 1px solid;">Categorical data</td> - <td style="width: 130px; padding: 8px; border: 1px solid;">Numerical data</td> - </tr> - <tr> - <td style="width: 130px; padding: 8px; border: 1px solid;">inference about one population</td> - <td style="width: 130px; padding: 8px; border: 1px solid;"> - <div>confidence interval for p.</div> - <div> - z test for one proportion p. goodness-of-fit test</div> - </td> - <td style="width: 130px; padding: 8px; border: 1px solid;"> - <div>confidence interval for μ\muμ.</div> - <div> - t test for mean</div> - </td> - </tr> - <tr> - <td style="width: 130px; padding: 8px; border: 1px solid;">inference about two populations</td> - <td style="width: 130px; padding: 8px; border: 1px solid;"> - <div>confidence interval for  - <img src="TOC%3A%20Statistical%20Methods.resources/931F8049-746F-4C2F-AE8B-3FD457025035.png" height="11" width="44" /> - <div> - Z test for two proportions</div> - </td> - <td style="width: 130px; padding: 8px; border: 1px solid;"> - <div>t test for matched pairs</div> - <div> - t test for independent samples</div> - </td> - </tr> - <tr> - <td style="width: 130px; padding: 8px; border: 1px solid;">relationship between two variables</td> - <td style="width: 130px; padding: 8px; border: 1px solid;"> - <div>chi-square test of independence</div> - <div> - Fisher’s exact test</div> - </td> - <td style="width: 130px; padding: 8px; border: 1px solid;"> - <div>t test of correlation</div> - <div> - simple linear regression</div> - </td> - </tr> - <tr> - <td style="width: 130px; padding: 8px; border: 1px solid;">comparing more than 2 populations</td> - <td style="width: 130px; padding: 8px; border: 1px solid;">chi-square test for homogeneity</td> - <td style="width: 130px; padding: 8px; border: 1px solid;"> - <div><br /></div> - </td> - </tr> - </tbody> - </table> - <div><br /></div> - <div><a href="overview-slides.pdf">Overview slides from the last lecture.</a></div> - <div><br /></div> -</body> - -</html> diff --git a/content/stats-notes/overview-slides.pdf b/content/stats-notes/overview-slides.pdf Binary files differ. diff --git a/content/stats-notes/sitewide.css b/content/stats-notes/sitewide.css @@ -1,33 +0,0 @@ -@charset 'UTF-8'; -@font-face{font-family:'FontAwesome';src:url('font/fontawesome-webfont.eot?v=4.0.1');src:url('font/fontawesome-webfont.eot?#iefix&v=4.0.1') format('embedded-opentype'),url('font/fontawesome-webfont.woff?v=4.0.1') format('woff'),url('font/fontawesome-webfont.ttf?v=4.0.1') format('truetype'),url('font/fontawesome-webfont.svg?v=4.0.1#fontawesomeregular') format('svg');font-weight:normal;font-style:normal} - -body { - margin: 0px; - padding: 1em; - background: #282B37; - font-family: 'Lato', sans-serif; - font-size: 12pt; - font-weight: 300; - line-height: 1.5em; - color: #8A8A8A; - padding-left: 50px; -} -h1 { - margin: 0px; - padding: 0px; - font-weight: 300; - text-align: center; -} -h3 { - font-style: italic; -} -a { - color: #D1551F; - } -a:hover { - color: #AF440F; -} - strong { - font-weight: 700; - color: #2A2A2A; - }