lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

Methodology.html (57601B)


      1 
      2 				<!DOCTYPE html>
      3 				<html>
      4 					<head>
      5 						<meta charset="UTF-8">
      6 
      7 						<title>Methodology</title>
      8 					<link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head>
      9 					<body>
     10 
     11 <div id="rendered-md"><h1 id="ml-methodology">ML: Methodology</h1>
     12 <nav class="table-of-contents"><ul><li><a href="#ml-methodology">ML: Methodology</a><ul><li><a href="#performing-an-experiment">Performing an experiment</a><ul><li><a href="#what-if-you-need-to-test-many-models">What if you need to test many models?</a></li><li><a href="#the-modern-recipe">The modern recipe</a></li><li><a href="#cross-validation">Cross-validation</a></li></ul></li><li><a href="#what-to-report">What to report</a><ul><li><a href="#classification">Classification</a><ul><li><a href="#whats-a-good-error-5">What&#39;s a good error (5%)?</a></li><li><a href="#performance-metrics">Performance metrics</a><ul><li><a href="#confusion-matrix-contingency-table">Confusion matrix (contingency table)</a></li><li><a href="#precision-and-recall">Precision and recall</a></li></ul></li></ul></li><li><a href="#regression">Regression</a></li><li><a href="#errors-confidence-intervals">Errors &amp; confidence intervals</a></li></ul></li><li><a href="#the-no-free-lunch-theorem-and-principle">The no-free-lunch theorem and principle</a></li><li><a href="#cleaning-your-data">Cleaning your data</a><ul><li><a href="#missing-data">Missing data</a></li><li><a href="#outliers">Outliers</a></li><li><a href="#class-imbalance">Class imbalance</a></li></ul></li><li><a href="#choosing-features">Choosing features</a></li><li><a href="#normalisation-standardisation">Normalisation &amp; standardisation</a><ul><li><a href="#normalisation">Normalisation</a></li><li><a href="#standardisation">Standardisation</a></li><li><a href="#whitening">Whitening</a></li></ul></li><li><a href="#dimensionality-reduction">Dimensionality reduction</a></li></ul></li></ul></nav><h2 id="performing-an-experiment">Performing an experiment</h2>
     13 <p>Never judge your performance on the training data (or you'll fail the<br>
     14 course and life).</p>
     15 <p>The proportion of training to test data is not important, the <em>absolute<br>
     16 size</em> of the test data is. Aim to have min 500 examples in<br>
     17 test data (ideal 10 000 or more).</p>
     18 <h3 id="what-if-you-need-to-test-many-models">What if you need to test many models?</h3>
     19 <p>e.g. k-nearest neighbours, which classifies a point based on the<br>
     20 classification of its k nearest neighbours.</p>
     21 <h3 id="the-modern-recipe">The modern recipe</h3>
     22 <ol>
     23 <li>Split data into train, validation, and test data. Sample randomly,<br>
     24 at least 500 examples in test set.</li>
     25 <li>Choose model, hyperparameters, etc. only based on the training set.<br>
     26 Test on validation. Don't use test set for anything.</li>
     27 <li>State the hypothesis.</li>
     28 <li>During the final run, train on training + validation data.</li>
     29 <li>Test hypothesis <em>once</em> on the test data. Usually at the<br>
     30 very end of the project, when you write report/paper.</li>
     31 </ol>
     32 <p>Don't re-use test data:</p>
     33 <ul>
     34 <li>you'd pick the wrong model</li>
     35 <li>it would inflate your performance estimate</li>
     36 </ul>
     37 <p>For temporal data, you'll probably want to keep the data ordered by<br>
     38 time.</p>
     39 <p>Which hyperparameters to try?</p>
     40 <ul>
     41 <li>trial-and-error (via intuition)</li>
     42 <li>grid search: define finite set of values for each hyperparam, try<br>
     43 all combinations</li>
     44 <li>random search</li>
     45 </ul>
     46 <h3 id="cross-validation">Cross-validation</h3>
     47 <p>You still split your data, but every run, a different slice becomes the<br>
     48 validation data. Then you average the results for the final result.</p>
     49 <p>If it's temporal data, you might want to do walk-forward validation,<br>
     50 where you always expand your data slices forward in time.</p>
     51 <h2 id="what-to-report">What to report</h2>
     52 <h3 id="classification">Classification</h3>
     53 <h4 id="whats-a-good-error-5">What's a good error (5%)?</h4>
     54 <p>It depends, just like in every class so far:</p>
     55 <ul>
     56 <li>Class imbalance: how much more likely is a positive example than a<br>
     57 negative example?</li>
     58 <li>Cost imbalance: how much worse is mislabeled positive than<br>
     59 mislabeled negative? e.g. how bad is it to mark a real email as spam<br>
     60 vs letting a spam message into your inbox?</li>
     61 </ul>
     62 <h4 id="performance-metrics">Performance metrics</h4>
     63 <h5 id="confusion-matrix-contingency-table">Confusion matrix (contingency table)</h5>
     64 <p>Metrics for a single classifier.</p>
     65 <p>The margins give four totals: actual number of each class present in<br>
     66 data, number of each class predicted by the classifier.</p>
     67 <p><img src="_resources/6fa59a4013a0431a9561c4b00b29e8b9.png" alt=""></p>
     68 <h5 id="precision-and-recall">Precision and recall</h5>
     69 <p>Also for a single classifier.</p>
     70 <ul>
     71 <li>Precision: proportion of returned positives that are<br>
     72 <em>actually</em> positive</li>
     73 <li>Recall: proportion of existing positives that the classifier found</li>
     74 </ul>
     75 <p><img src="_resources/e395f5797bc3479090c5c7128b77f074.png" alt=""></p>
     76 <p>You can then calculate rates:</p>
     77 <ul>
     78 <li>True positive rate (TPR): proportion of actual positives that we<br>
     79 classified correctly</li>
     80 <li>False positive rate (FPR): proportion of actual negatives that we<br>
     81 misclassified as positive</li>
     82 </ul>
     83 <p><img src="_resources/4a756d1ddc51411cbb13957c08b20a8f.png" alt=""></p>
     84 <p>ROC (receiver-operating characteristics) space: plot true positives<br>
     85 against false positives. the best classifier is in the top left corner.</p>
     86 <p><img src="_resources/aba7a57e16944be7b654c26df0acae65.png" alt=""></p>
     87 <p>Ranking classifier: also gives score of how negative/positive a point<br>
     88 is.</p>
     89 <ul>
     90 <li>turning classifier into ranking classifier:
     91 <ul>
     92 <li>for linear classifier, measure distance from decision boundary,<br>
     93 and now you can scale classifier from timid to bold by moving<br>
     94 the decision boundary</li>
     95 <li>for tree classifier: sort by class proportion in each segment</li>
     96 </ul>
     97 </li>
     98 <li>ranking errors: one per every pair of instances that's ranked<br>
     99 wrongly (a negative point is ranked more positively than a positive<br>
    100 point)</li>
    101 </ul>
    102 <p>Coverage matrix: shows what happens to TPR and FPR if we move threshold<br>
    103 from right to left (more or less identical to ROC space)</p>
    104 <p>If we draw line between two classifiers, we can create classifier for<br>
    105 every point on that line by picking output of one of the classifiers at<br>
    106 random. E.g. with 50/50 probability, end up halfway between the two. The<br>
    107 area under the curve of classifiers we can create (&quot;convex hull&quot;) is<br>
    108 good indication of quality of classifier -- the bigger this area, the<br>
    109 more useful classifiers we can achieve. Good way to compare classifiers<br>
    110 with class or cost imbalance, if we're unsure of our preferences.</p>
    111 <h3 id="regression">Regression</h3>
    112 <p>Loss function: mean squared errors<br>
    113 (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac><msub><mo>∑</mo><mi>i</mi></msub><mo stretchy="false">(</mo><mover accent="true"><msub><mi>y</mi><mi>i</mi></msub><mo>^</mo></mover><mo>−</mo><msub><mi>y</mi><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">\frac{1}{n} \sum_i (\hat{y_i} - y_i)^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord">^</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.19444em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span>)</p>
    114 <p>Evaluation function: root mean squared error<br>
    115 (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msqrt><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac><msub><mo>∑</mo><mi>i</mi></msub><mo stretchy="false">(</mo><mover accent="true"><msub><mi>y</mi><mi>i</mi></msub><mo>^</mo></mover><mo>−</mo><msub><mi>y</mi><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></msqrt></mrow><annotation encoding="application/x-tex">\sqrt{\frac{1}{n} \sum_i (\hat{y_i} - y_i)^2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.84em;vertical-align:-0.604946em;"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.235054em;"><span class="svg-align" style="top:-3.8em;"><span class="pstrut" style="height:3.8em;"></span><span class="mord" style="padding-left:1em;"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord">^</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.19444em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-3.195054em;"><span class="pstrut" style="height:3.8em;"></span><span class="hide-tail" style="min-width:1.02em;height:1.8800000000000001em;"><svg width='400em' height='1.8800000000000001em' viewBox='0 0 400000 1944' preserveAspectRatio='xMinYMin slice'><path d='M983 90
    116 l0 -0
    117 c4,-6.7,10,-10,18,-10 H400000v40
    118 H1013.1s-83.4,268,-264.1,840c-180.7,572,-277,876.3,-289,913c-4.7,4.7,-12.7,7,-24,7
    119 s-12,0,-12,0c-1.3,-3.3,-3.7,-11.7,-7,-25c-35.3,-125.3,-106.7,-373.3,-214,-744
    120 c-10,12,-21,25,-33,39s-32,39,-32,39c-6,-5.3,-15,-14,-27,-26s25,-30,25,-30
    121 c26.7,-32.7,52,-63,76,-91s52,-60,52,-60s208,722,208,722
    122 c56,-175.3,126.3,-397.3,211,-666c84.7,-268.7,153.8,-488.2,207.5,-658.5
    123 c53.7,-170.3,84.5,-266.8,92.5,-289.5z
    124 M1001 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.604946em;"><span></span></span></span></span></span></span></span></span>)</p>
    125 <ul>
    126 <li>you may want to report this, because minimised at same places as MSE,<br>
    127 but has same units as the original output value, so easier to<br>
    128 interpret</li>
    129 </ul>
    130 <p>Bias: distance from true MSE (which is unknown) to the optimum MSE.</p>
    131 <ul>
    132 <li>high bias: model doesn't fit generating distribution.<br>
    133 &quot;underfitting&quot;</li>
    134 <li>reduce by increasing model capacity or features</li>
    135 </ul>
    136 <p>Variance: spread of different experiments' MSE around the true MSE</p>
    137 <ul>
    138 <li>high variance: high model capacity, sensitivity to random<br>
    139 fluctuations. &quot;overfitting&quot;</li>
    140 <li>reduce by reducing model capacity, adding regularization, reducing<br>
    141 tree depth</li>
    142 </ul>
    143 <p>specifically for k-NN regression: increasing k increases bias and<br>
    144 decreases variance</p>
    145 <p>Dartboard example:</p>
    146 <p><img src="_resources/748a8a36136244d9bc6e3c5c8e4060cb.png" alt=""></p>
    147 <h3 id="errors-confidence-intervals">Errors &amp; confidence intervals</h3>
    148 <p>Statistics tries to answer: can observed results be attributed to <em>real<br>
    149 characteristics</em> of the models, or are they observed <em>by<br>
    150 chance</em>?</p>
    151 <p>If you see error bars, the author has to indicate what they mean --<br>
    152 there's no convention.</p>
    153 <p>Standard deviation: measure of spread, variance</p>
    154 <p>Standard error, confidence interval: measure of confidence</p>
    155 <p>If the population distribution is normal, the standard error of the mean<br>
    156 is calculated by <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mi>σ</mi><msqrt><mi>n</mi></msqrt></mfrac></mrow><annotation encoding="application/x-tex">\frac{\sigma}{\sqrt{n}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.233392em;vertical-align:-0.538em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.695392em;"><span style="top:-2.6258665em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord sqrt mtight"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8059050000000001em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mtight" style="padding-left:0.833em;"><span class="mord mathdefault mtight">n</span></span></span><span style="top:-2.765905em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail mtight" style="min-width:0.853em;height:1.08em;"><svg width='400em' height='1.08em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702
    157 c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
    158 c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
    159 c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
    160 s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
    161 c69,-144,104.5,-217.7,106.5,-221
    162 l0 -0
    163 c5.3,-9.3,12,-14,20,-14
    164 H400000v40H845.2724
    165 s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
    166 c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
    167 M834 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.234095em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.538em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span>(because the<br>
    168 sample distribution is the t distribution)</p>
    169 <p>Re confidence intervals: the correct phrasing is &quot;if we repeat the<br>
    170 experiment many times, computing the confidence interval each time, the<br>
    171 true mean would be inside the interval in 95% of those experiments&quot;</p>
    172 <p>Use statistics in ML to show confidence and spread.</p>
    173 <h2 id="the-no-free-lunch-theorem-and-principle">The no-free-lunch theorem and principle</h2>
    174 <p>Answer to question &quot;what is the best ML method/model in general?&quot;</p>
    175 <p>Theorem: &quot;any two optimization algorithms are equivalent when their<br>
    176 performance is averaged across all possible problems&quot;</p>
    177 <p>i.e. you can't say shit in general.</p>
    178 <p>A few outs:</p>
    179 <ul>
    180 <li>universal distribution, the datasets for which our methods works are<br>
    181 the likely ones</li>
    182 <li>Occam's razor, the simplest solution/explanation is often the best</li>
    183 </ul>
    184 <p>Principle: there is no single best learning method; whether an algorithm<br>
    185 is good depends on the domain</p>
    186 <p>Inductive bias: the aspects of a learning algorithm, which implicitly or<br>
    187 explicitly make it suitable for certain problems make it unsuitable for<br>
    188 others</p>
    189 <h2 id="cleaning-your-data">Cleaning your data</h2>
    190 <h3 id="missing-data">Missing data</h3>
    191 <p>Simplest way - remove features for which values missing. Maybe they're not important, probably, hopefully.</p>
    192 <p>Or remove instances (rows) with missing data. The problem is if data wasn't corrupted uniformly, removing rows with missing values changes the data distribution. An example is if people refuse to answer questions.</p>
    193 <p>Generally, think about the real-world use case -- can you also expect missing data there?</p>
    194 <ul>
    195 <li>if yes: keep them in test set, make a model that can consume them</li>
    196 <li>if no: try to get a test set without missing values, test methods for completing data only in the training set</li>
    197 </ul>
    198 <p>Guessing the missing data (&quot;imputation&quot;):</p>
    199 <ul>
    200 <li>categorical: use the <dfn title="the value that occurs most often">mode</dfn></li>
    201 <li>numerical: use the mean</li>
    202 <li>or, make the feature a target value and train a model</li>
    203 </ul>
    204 <h3 id="outliers">Outliers</h3>
    205 <p>Are they mistakes?:</p>
    206 <ul>
    207 <li>Yes: deal with them.</li>
    208 <li>No: leave them alone, check model for strong assumptions of normally distributed data</li>
    209 </ul>
    210 <p>Can we expect them in production?</p>
    211 <ul>
    212 <li>Yes: make sure model can deal with them</li>
    213 <li>No: remove them, get a test dataset representing production</li>
    214 </ul>
    215 <p>Watch out for MSE, it's based on assumption of normally distributed randomness. If you get data with big outliers, it fucks up.</p>
    216 <h3 id="class-imbalance">Class imbalance</h3>
    217 <p><def title="i.e. how much more likely is a positive example than a negative example?">Class imbalance</def> is a problem, but how do you improve training?</p>
    218 <ul>
    219 <li>Use a big test set</li>
    220 <li>Don't rely on accuracy -- try ROC plots, precision-recall plots, AUC, look at confusion matrix...</li>
    221 <li>Resample training data
    222 <ul>
    223 <li>oversample: sample with replacements. leads to more data, but creates duplicates and increases likelihood of overfitting.</li>
    224 <li>undersample: doesn't lead to duplicates, but you throw away data. might be useful for multiple-pass algorithms</li>
    225 </ul>
    226 </li>
    227 <li>Use data augmentation for minority class
    228 <ul>
    229 <li>oversample minority with new data derived from existing data</li>
    230 <li>example: SMOTE, which finds small clusters of points in minority class, and generates their mean as new minority class point</li>
    231 </ul>
    232 </li>
    233 </ul>
    234 <h2 id="choosing-features">Choosing features</h2>
    235 <p>Even if data is a table, you shouldn't just use columns as features.<br>
    236 Some algorithms work only on numeric features, some only on categorical, some on both.</p>
    237 <p>Converting between categoric/numeric:</p>
    238 <ul>
    239 <li>numeric to categoric - you're bound to lose information, but it might be tolerable</li>
    240 <li>categoric to numeric
    241 <ul>
    242 <li>integer coding: make everything an integer - imposes false ordering on unordered data. generally not a good idea.</li>
    243 <li>one-hot coding: one categorical feature becomes several numeric features. for each element, you say whether or not the feature applies (0 or 1).</li>
    244 </ul>
    245 </li>
    246 </ul>
    247 <p>Expanding features: adding extra features derived from existing features (improves performance).<br>
    248 For example, when you have results that don't fit on a line, but <em>do</em> fit on a curve, you can add a derived feature x².<br>
    249 If we don't have any intuition for extra features to add, just add all cross products, or use functions like sin/log.</p>
    250 <h2 id="normalisation-standardisation">Normalisation &amp; standardisation</h2>
    251 <p>Create a uniform scale.</p>
    252 <h3 id="normalisation">Normalisation</h3>
    253 <p>Fit to [0,1].<br>
    254 Scales the data linearly, smallest point becomes zero, largest point becomes 1:<br>
    255 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>χ</mi><mo>←</mo><mfrac><mrow><mi>χ</mi><mo>−</mo><msub><mi>χ</mi><mrow><mi>m</mi><mi>i</mi><mi>n</mi></mrow></msub></mrow><mrow><msub><mi>χ</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub><mo>−</mo><msub><mi>χ</mi><mi>min</mi><mo>⁡</mo></msub></mrow></mfrac></mrow><annotation encoding="application/x-tex">\chi \leftarrow \frac{\chi - \chi_{min}}{\chi_{max} - \chi_{\min}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathdefault">χ</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.335547em;vertical-align:-0.481108em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.854439em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathdefault mtight">χ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16454285714285719em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">m</span><span class="mord mathdefault mtight">a</span><span class="mord mathdefault mtight">x</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span><span class="mbin mtight">−</span><span class="mord mtight"><span class="mord mathdefault mtight">χ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3340428571428572em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mop mtight"><span class="mtight">m</span><span class="mtight">i</span><span class="mtight">n</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">χ</span><span class="mbin mtight">−</span><span class="mord mtight"><span class="mord mathdefault mtight">χ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">m</span><span class="mord mathdefault mtight">i</span><span class="mord mathdefault mtight">n</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.481108em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></p>
    256 <h3 id="standardisation">Standardisation</h3>
    257 <p>Fit to 1D standard normal distribution.<br>
    258 Rescale data so mean becomes zero, standard deviation becomes 1. Make it look like the data came from a standard normal distribution.<br>
    259 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>χ</mi><mo>←</mo><mfrac><mrow><mi>χ</mi><mo>−</mo><mi>μ</mi></mrow><mi>σ</mi></mfrac></mrow><annotation encoding="application/x-tex">\chi \leftarrow \frac{\chi - \mu}{\sigma}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathdefault">χ</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.199439em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.854439em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">χ</span><span class="mbin mtight">−</span><span class="mord mathdefault mtight">μ</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></p>
    260 <h3 id="whitening">Whitening</h3>
    261 <p>Fit to multivariate standard normal distribution.<br>
    262 If the data is correlated, you don't end up with a spherical shape after normalising/standardising. So you have to choose a different basis (coordinate system) for the points.</p>
    263 <p>Back to linear algebra - choose a basis</p>
    264 <p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>B</mi><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>c</mi></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>d</mi></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>1.26</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mo>−</mo><mn>0.3</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.9</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.5</mn></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow></mrow><annotation encoding="application/x-tex">B = \begin{bmatrix} c &amp; d \end{bmatrix} = \begin{bmatrix} 1.26 &amp; -0.3 \\ 0.9 &amp; 0.5 \end{bmatrix}
    265 </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.20001em;vertical-align:-0.35001em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size1">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8500000000000001em;"><span style="top:-3.01em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">c</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.35000000000000003em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8500000000000001em;"><span style="top:-3.01em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">d</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.35000000000000003em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size1">]</span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span><span class="mord">.</span><span class="mord">2</span><span class="mord">6</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">0</span><span class="mord">.</span><span class="mord">9</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">−</span><span class="mord">0</span><span class="mord">.</span><span class="mord">3</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">0</span><span class="mord">.</span><span class="mord">5</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span></span></span></span></span></p>
    266 <p>Then if you want to convert a coordinate to this basis, multiply <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>B</mi><mi>x</mi></mrow><annotation encoding="application/x-tex">Bx</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span><span class="mord mathdefault">x</span></span></span></span>. If you want to convert from this basis to the standard, multiply <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mi>x</mi></mrow><annotation encoding="application/x-tex">B^{-1} x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05017em;">B</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mord mathdefault">x</span></span></span></span>.</p>
    267 <p>Since the inverse of a matrix is computationally expensive, prefer orthonormal bases (the basis vectors are <def title="perpendicular to each other">orthogonal</def> and <def title="have length 1">normal</def>). Because then <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mo>=</mo><msup><mi>B</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">B^{-1} = B^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05017em;">B</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8413309999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05017em;">B</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span>, and the transpose is much easier to compute.</p>
    268 <p>Steps:</p>
    269 <ol>
    270 <li>Compute sample mean <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><msub><mo>∑</mo><mi>i</mi></msub><msub><mi>x</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">m = \frac{1}{n} \sum_i x_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">m</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> and sample covariance <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi><mo>=</mo><mfrac><mn>1</mn><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></mfrac><mi>X</mi><msup><mi>X</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">S = \frac{1}{n-1} X X^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05764em;">S</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2484389999999999em;vertical-align:-0.403331em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.403331em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span> (where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi><mo>=</mo><mo stretchy="false">[</mo><msub><mi>x</mi><mn>1</mn></msub><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msub><mi>x</mi><mi>n</mi></msub><mo stretchy="false">]</mo><mo>−</mo><mi>m</mi></mrow><annotation encoding="application/x-tex">X = [x_1, \dots, x_n] -m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">]</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">m</span></span></span></span>).</li>
    271 <li>Find some A st <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi><mo>=</mo><mi>A</mi><msup><mi>A</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">S = AA^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05764em;">S</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8413309999999999em;vertical-align:0em;"></span><span class="mord mathdefault">A</span><span class="mord"><span class="mord mathdefault">A</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span>:<br>
    272 * Cholesky decomposition<br>
    273 * Singular value decomposition<br>
    274 * Matrix square root</li>
    275 <li>White the data: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>←</mo><msup><mi>A</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>m</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">x \leftarrow A^{-1} (x-m)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault">A</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">m</span><span class="mclose">)</span></span></span></span></li>
    276 </ol>
    277 <p>So whitening means we choose new basis vectors for a coordinate system where the features are not correlated, and variance is 1 in every direction.</p>
    278 <h2 id="dimensionality-reduction">Dimensionality reduction</h2>
    279 <p>Opposite of feature expansion - reducing number of features in data by deriving new features from old ones, hopefully without losing essential information.</p>
    280 <p>Good for efficiency, reducing variance of model performance, and visualisation.</p>
    281 <p>Principal component analysis (PCA): whitening with some extra properties. Afte applying, you throw away all but first k dimensions, and get very good projection of data down to k dimensions.</p>
    282 <ol>
    283 <li>Mean-center the data</li>
    284 <li>Compute sample covariance S</li>
    285 <li>Compute singular value decomposition: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>U</mi><mi>Z</mi><msup><mi>U</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">UZU^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8413309999999999em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">U</span><span class="mord mathdefault" style="margin-right:0.07153em;">Z</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span><br>
    286 * SVD is usually computed from X or A (set equal to X or A)<br>
    287 * Z is diagonal, whose diagonal values sorted from largest to smallest are the eigenvalues.<br>
    288 * U is an orthonormal basis, whose columns are the eigenvectors of A.
    289 <ul>
    290 <li>Eigenvectors: a matrix transforms vectors, with some getting stretched and rotated. If a vector only gets stretched/flipped, but its direction doesn't change, it's an eigenvector. Translating to math, if Au = λu, u is an eigenvector, and λ is its corresponding scalar eigenvalue.</li>
    291 </ul>
    292 </li>
    293 <li>Transform data: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>←</mo><msup><mi>U</mi><mi>T</mi></msup><mi>x</mi></mrow><annotation encoding="application/x-tex">x \leftarrow U^T x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8413309999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mord mathdefault">x</span></span></span></span>. To whiten, also divide by diag(Z)</li>
    294 <li>Discard all but first k features (keep only features corresponding to biggest eigenvectors)</li>
    295 </ol>
    296 </div></div>
    297 					</body>
    298 				</html>