commit 2a25ac3aaac20fdb56d1e23efc0e20ea332d41c4 parent 1b9e984eabeb038eec9709760dab4d384025b66e Author: Alex Balgavy <alex@balgavy.eu> Date: Tue, 22 Dec 2020 22:21:28 +0100 ML notes Diffstat:
81 files changed, 2137 insertions(+), 2079 deletions(-)
diff --git a/content/_index.md b/content/_index.md @@ -15,7 +15,7 @@ title = "Alex's university course notes" ## Subject notes: Year 3 * [Equational Programming](equational-notes/) -* [Machine Learning](https://thezeroalpha.github.io/ml-notes) **(unfinished)** +* [Machine Learning](ml-notes/) **(unfinished)** * [Automata & Complexity](automata-complexity-notes/) **(unfinished)** * [Philosophy](https://thezeroalpha.github.io/philosophy-notes) diff --git a/content/ml-notes/.nojekyll b/content/ml-notes/.nojekyll diff --git a/content/ml-notes/Deep learning.html b/content/ml-notes/Deep learning.html @@ -1,332 +0,0 @@ - - <!DOCTYPE html> - <html> - <head> - <meta charset="UTF-8"> - - <title>Deep learning</title> - <link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head> - <body> - -<div id="rendered-md"><h1 id="deep-learning">Deep learning</h1> -<nav class="table-of-contents"><ul><li><a href="#deep-learning">Deep learning</a><ul><li><a href="#deep-learning-systems-autodiff-engines">Deep learning systems (autodiff engines)</a><ul><li><a href="#tensors">Tensors</a></li><li><a href="#functions-on-tensors">Functions on tensors</a></li></ul></li><li><a href="#backpropagation-revisited">Backpropagation revisited</a><ul><li><a href="#multivariate-chain-rule">Multivariate chain rule</a></li><li><a href="#backpropagation-with-tensors-matrix-calculus">Backpropagation with tensors - matrix calculus</a><ul><li><a href="#example">Example:</a></li></ul></li></ul></li><li><a href="#making-deep-neural-nets-work">Making deep neural nets work</a><ul><li><a href="#overcoming-vanishing-gradients">Overcoming vanishing gradients</a></li><li><a href="#minibatch-gradient-descent">Minibatch gradient descent</a></li><li><a href="#optimizers">Optimizers</a><ul><li><a href="#momentum">Momentum</a></li><li><a href="#nesterov-momentum">Nesterov momentum</a></li><li><a href="#adam">Adam</a></li></ul></li><li><a href="#regularizers">Regularizers</a><ul><li><a href="#l2-regularizer">L2 regularizer</a></li><li><a href="#l1-regulariser">L1 regulariser</a></li><li><a href="#dropout-regularisation">Dropout regularisation</a></li></ul></li></ul></li><li><a href="#convolutional-neural-networks">Convolutional neural networks</a></li><li><a href="#deep-learning-vs-machine-learning">Deep learning vs machine learning</a></li><li><a href="#generators">Generators</a></li><li><a href="#generative-adversarial-networks">Generative adversarial networks</a><ul><li><a href="#vanilla-gans">Vanilla GANs</a></li><li><a href="#conditional-gans">Conditional GANs</a></li><li><a href="#cyclegan">CycleGAN</a></li><li><a href="#stylegan">StyleGAN</a></li><li><a href="#what-can-we-do-with-a-generator">What can we do with a generator?</a></li></ul></li><li><a href="#autoencoders">Autoencoders</a><ul><li><a href="#turning-an-autoencoder-into-a-generator">Turning an autoencoder into a generator</a></li></ul></li><li><a href="#variational-autoencoders">Variational autoencoders</a></li></ul></li></ul></nav><h2 id="deep-learning-systems-autodiff-engines">Deep learning systems (autodiff engines)</h2> -<h3 id="tensors">Tensors</h3> -<p>To scale up backpropagation, want to move from operations on scalars to tensors.</p> -<p>Tensor: generalisation of vectors/matrices to higher dimensions. e.g. a 2-tensor<br> -has two dimensions, a 4-tensor has 4 dimensions.</p> -<p>You can represent data as a tensor. e.g. an RGB image is a 3-tensor of the red,<br> -green, and blue values for each pixel.</p> -<h3 id="functions-on-tensors">Functions on tensors</h3> -<p>Functions have inputs and outputs, all of which are tensors.</p> -<p>They implement:</p> -<ul> -<li><code class="inline-code">forward(...)</code>: computing outputs given the inputs</li> -<li><code class="inline-code">backward(...)</code>: computing gradients over inputs, given gradients over<br> -outputs</li> -</ul> -<p>The modules we chain together are defined in a computation graph:</p> -<p><img src="_resources/bf4ec9fe629e41389da29c0de7efb63d.png" alt=""></p> -<p>A deep learning system uses this graph to execute a computation (forward pass),<br> -and does backpropagation to compute gradients to data nodes wrt the output<br> -(backward pass).</p> -<p>Autodiff engine</p> -<ul> -<li>Perform computation by chaining functions</li> -<li>keeps track of all computation in a computation graph</li> -<li>when computation done, walk backward through computation graph for<br> -backpropagation</li> -<li>eager evaluation: build graph as we perform computation</li> -</ul> -<h2 id="backpropagation-revisited">Backpropagation revisited</h2> -<p>Functions can have any number of inputs and outputs, which must be tensors.</p> -<p>The final output must be a scalar (i.e. always take derivative of scalar<br> -function).</p> -<h3 id="multivariate-chain-rule">Multivariate chain rule</h3> -<p>How do you take derivatives when variables aren't scalars?</p> -<p>Multiple inputs:</p> -<p><img src="_resources/edfac0f6027c40c9a9e012e658f54d68.png" alt=""></p> -<p>How do you find the derivative with two inputs? Use the multivariate chain rule,<br> -i.e. take single derivative for each input and then sum them.</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>c</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>x</mi></mrow></mfrac><mo>=</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>c</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>a</mi></mrow></mfrac><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>a</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>x</mi></mrow></mfrac><mo>+</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>c</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>b</mi></mrow></mfrac><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>b</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>x</mi></mrow></mfrac></mrow><annotation encoding="application/x-tex">\frac{\partial c}{\partial x} = \frac{\partial c}{\partial a} \frac{\partial a}{\partial x} + \frac{\partial c}{\partial b} \frac{\partial b}{\partial x}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.2251079999999999em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">c</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2251079999999999em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">a</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">c</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">a</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.2251079999999999em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">b</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">c</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">b</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></p> -<h3 id="backpropagation-with-tensors-matrix-calculus">Backpropagation with tensors - matrix calculus</h3> -<p>Start with scalar derivatives: one output over one input (just pick a random<br> -one)</p> -<p>Tensor derivative: put all possible scalar derivatives into a tensor.</p> -<p>But how to arrange/order the tensor?</p> -<p>Solution: accumulate the gradient product.</p> -<p>forward(x): given input x, compute output y</p> -<p>backward(l<sub>y</sub>): given <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>l</mi><mi>y</mi></msub><mo>=</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>y</mi></mrow></mfrac></mrow><annotation encoding="application/x-tex">l_{y} = \frac{\partial loss}{\partial y}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.980548em;vertical-align:-0.286108em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-left:-0.01968em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.3612159999999998em;vertical-align:-0.481108em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span><span class="mord mathdefault mtight">o</span><span class="mord mathdefault mtight">s</span><span class="mord mathdefault mtight">s</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.481108em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span>, compute<br> -<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>y</mi></mrow></mfrac><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>y</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>x</mi></mrow></mfrac></mrow><annotation encoding="application/x-tex">\frac{\partial loss}{\partial y} \frac{\partial y}{\partial x}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.4133239999999998em;vertical-align:-0.481108em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span><span class="mord mathdefault mtight">o</span><span class="mord mathdefault mtight">s</span><span class="mord mathdefault mtight">s</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.481108em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9322159999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span>.</p> -<p>convention: gradient of A has same shape as A</p> -<h4 id="example">Example:</h4> -<p>Let:</p> -<ul> -<li>k = Wx + b</li> -<li>forward(W, x, b): compute Wx + b</li> -<li>backward(l<sub>k</sub>): compute <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow></mfrac><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>W</mi></mrow></mfrac><mo separator="true">,</mo><mspace width="1em"/><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow></mfrac><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>x</mi></mrow></mfrac><mo separator="true">,</mo><mspace width="1em"/><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow></mfrac><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>b</mi></mrow></mfrac></mrow><annotation encoding="application/x-tex">\frac{\partial l}{\partial k} \frac{\partial k}{\partial -W}, \quad \frac{\partial l}{\partial k} \frac{\partial k}{\partial x}, \quad -\frac{\partial l}{\partial k} \frac{\partial k}{\partial b}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.2251079999999999em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.13889em;">W</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mspace" style="margin-right:1em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mspace" style="margin-right:1em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">b</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></li> -</ul> -<p>Steps:</p> -<ol> -<li>work out scalar derivative: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow></mfrac><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow><mrow><mi mathvariant="normal">∂</mi><msub><mi>W</mi><mn>23</mn></msub></mrow></mfrac></mrow><annotation encoding="application/x-tex">\frac{\partial l}{\partial k} \frac{\partial -k}{\partial W_{23}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.325208em;vertical-align:-0.44509999999999994em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31731428571428577em;"><span style="top:-2.357em;margin-left:-0.13889em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight">2</span><span class="mord mtight">3</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.44509999999999994em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></li> -<li>apply multivariate chain rule <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow></mfrac><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow><mrow><mi mathvariant="normal">∂</mi><msub><mi>W</mi><mn>23</mn></msub></mrow></mfrac><mo>=</mo><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mo>=</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi></mrow><mrow><mi mathvariant="normal">∂</mi><msub><mi>k</mi><mn>2</mn></msub></mrow></mfrac><msub><mi>x</mi><mn>3</mn></msub></mrow><annotation encoding="application/x-tex">\frac{\partial l}{\partial k} \frac{\partial -k}{\partial W_{23}} = ... = \frac{\partial l}{\partial k_{2}} x_{3}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.325208em;vertical-align:-0.44509999999999994em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31731428571428577em;"><span style="top:-2.357em;margin-left:-0.13889em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight">2</span><span class="mord mtight">3</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.44509999999999994em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.36687em;vertical-align:0em;"></span><span class="mord">.</span><span class="mord">.</span><span class="mord">.</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.325208em;vertical-align:-0.44509999999999994em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31731428571428577em;"><span style="top:-2.357em;margin-left:-0.03148em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.44509999999999994em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">3</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></li> -<li>now we know that <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow></mfrac><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow><mrow><mi mathvariant="normal">∂</mi><msub><mi>W</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow></mfrac><mo>=</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi></mrow><mrow><mi mathvariant="normal">∂</mi><msub><mi>k</mi><mi>i</mi></msub></mrow></mfrac><msub><mi>x</mi><mi>j</mi></msub></mrow><annotation encoding="application/x-tex">\frac{\partial l}{\partial k} \frac{\partial k}{\partial W_{ij}} = -\frac{\partial l}{\partial k_{i}} x_{j}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.4224279999999998em;vertical-align:-0.5423199999999999em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:-0.13889em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2818857142857143em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.5423199999999999em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.325208em;vertical-align:-0.44509999999999994em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:-0.03148em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.44509999999999994em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></li> -<li>so, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow></mfrac><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>W</mi></mrow></mfrac><mo>=</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>l</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>k</mi></mrow></mfrac><msup><mi>x</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">\frac{\partial l}{\partial k} \frac{\partial k}{\partial W} = \frac{\partial -l}{\partial k} x^{T}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.2251079999999999em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.13889em;">W</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2251079999999999em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03148em;">k</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.01968em;">l</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span></span></li> -</ol> -<h2 id="making-deep-neural-nets-work">Making deep neural nets work</h2> -<h3 id="overcoming-vanishing-gradients">Overcoming vanishing gradients</h3> -<p>If weights of network are initialized too high, activations will hit rightmost<br> -part of gradient, so local gradient for each node will be very close to zero. So<br> -network won't start learning.</p> -<p>If they are too negative, then hit leftmost part of sigmoid, and get the same<br> -problem.</p> -<p><img src="_resources/8c2398ce91ed4694abf679c536b2cf61.png" alt=""></p> -<p>ReLU preserves derivatives for nodes whose activations it lets through. Kills<br> -derivatives for nodes that produce negative value, but as long as network is<br> -properly initialised, around half of values in batch will always produce<br> -positive input for ReLU.</p> -<p>Still risk that durin training, the network will move to configuration where<br> -neuron always produces negative input for every instance in data. In that case,<br> -end up with a dead neuron - its gradient will always be zero, no weights below<br> -that neuron will change anymore (unless they also feed into a non-dead neuron).</p> -<p>Initialization:</p> -<ul> -<li>assume that the layer input is roughly distributed so that its mean is 0 and<br> -variance is 1 in every direction (standardise/normalise data so this is true<br> -for first layer)</li> -<li>initialisation designed to pick a random matrix that keeps these properties<br> -true</li> -</ul> -<h3 id="minibatch-gradient-descent">Minibatch gradient descent</h3> -<p>Like stochastic gradient descent, but with small batches of instances, instead<br> -of single instances.</p> -<ul> -<li>smaller batches: close to stochastic gradient descent, more noisy, less<br> -parallelism</li> -<li>bigger batches: more like regular gradient descnet, more parallelism, limit<br> -is memory</li> -</ul> -<p>In general, stay between 16 and 128 instances.</p> -<h3 id="optimizers">Optimizers</h3> -<h4 id="momentum">Momentum</h4> -<p>If gradient descent is a hiker in a snowstorm, moment gradient descent is a<br> -boulder rolling down the hill.</p> -<p>Gradient doesn't affect its movement directly, but acts as a force on moving<br> -object. If gradient is zero, updates continue in the same direction, but slowed<br> -down by a 'friction constant' (μ).</p> -<p>Regular gradient descent: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>w</mi><mo>←</mo><mi>w</mi><mo>−</mo><mi>η</mi><mi mathvariant="normal">∇</mi><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi><mo stretchy="false">(</mo><mi>w</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">w \leftarrow w - \eta \nabla loss(w)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.66666em;vertical-align:-0.08333em;"></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">η</span><span class="mord">∇</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mclose">)</span></span></span></span></p> -<p>With momentum:</p> -<ul> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>v</mi><mo>←</mo><mi>μ</mi><mi>v</mi><mo>−</mo><mi>η</mi><mi mathvariant="normal">∇</mi><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi><mo stretchy="false">(</mo><mi>w</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">v \leftarrow \mu v - \eta \nabla loss(w)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">v</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.7777700000000001em;vertical-align:-0.19444em;"></span><span class="mord mathdefault">μ</span><span class="mord mathdefault" style="margin-right:0.03588em;">v</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">η</span><span class="mord">∇</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mclose">)</span></span></span></span></li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>w</mi><mo>←</mo><mi>w</mi><mo>+</mo><mi>v</mi></mrow><annotation encoding="application/x-tex">w \leftarrow w + v</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.66666em;vertical-align:-0.08333em;"></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">v</span></span></span></span></li> -</ul> -<h4 id="nesterov-momentum">Nesterov momentum</h4> -<p>In regular momentum, actual stem taken is sum of two vectors: the momentum step<br> -(in direction we took last iteration) and gradient step (in direction of<br> -steepest descent at current point)</p> -<p>This evaluates gradient after the momentum step, since we are taking that step<br> -anyway. Makes the gradient a bit more accurate.</p> -<h4 id="adam">Adam</h4> -<p>Combines idea of momentum with idea that each weight should have its own<br> -learning rate.</p> -<p>Normalize gradients: keep running mean m and uncentered variance v, for each<br> -parameter over the gradient. Subtract these instead of the gradient.</p> -<p>Calculations:</p> -<ul> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi><mo>←</mo><msub><mi>β</mi><mn>1</mn></msub><mo>∗</mo><mi>m</mi><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><msub><mi>β</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mi mathvariant="normal">∇</mi><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi><mo stretchy="false">(</mo><mi>w</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">m \leftarrow \beta_{1} * m + (1 - \beta_{1}) \nabla loss(w)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">m</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05278em;">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.05278em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.66666em;vertical-align:-0.08333em;"></span><span class="mord mathdefault">m</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05278em;">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.05278em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord">∇</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mclose">)</span></span></span></span></li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>v</mi><mo>←</mo><msub><mi>β</mi><mn>2</mn></msub><mo>∗</mo><mi>v</mi><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><msub><mi>β</mi><mn>2</mn></msub><mo stretchy="false">)</mo><mo stretchy="false">(</mo><mi mathvariant="normal">∇</mi><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi><mo stretchy="false">(</mo><mi>w</mi><mo stretchy="false">)</mo><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">v \leftarrow \beta_{2} * v + (1 - \beta_{2}) (\nabla loss(w))^{2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">v</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05278em;">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.05278em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.66666em;vertical-align:-0.08333em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">v</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05278em;">β</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.05278em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mopen">(</span><span class="mord">∇</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mclose">)</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span></span></li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>w</mi><mo>←</mo><mi>w</mi><mo>−</mo><mi>η</mi><mfrac><mi>m</mi><mrow><msqrt><mi>v</mi></msqrt><mo>+</mo><mi>ϵ</mi></mrow></mfrac></mrow><annotation encoding="application/x-tex">w \leftarrow w - \eta \frac{m}{\sqrt{v} + \epsilon}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.66666em;vertical-align:-0.08333em;"></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.233392em;vertical-align:-0.538em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">η</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.695392em;"><span style="top:-2.6258665em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord sqrt mtight"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8059050000000001em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mtight" style="padding-left:0.833em;"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">v</span></span></span><span style="top:-2.765905em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail mtight" style="min-width:0.853em;height:1.08em;"><svg width='400em' height='1.08em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702 -c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14 -c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54 -c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10 -s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429 -c69,-144,104.5,-217.7,106.5,-221 -l0 -0 -c5.3,-9.3,12,-14,20,-14 -H400000v40H845.2724 -s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7 -c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z -M834 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.234095em;"><span></span></span></span></span></span><span class="mbin mtight">+</span><span class="mord mathdefault mtight">ϵ</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">m</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.538em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></li> -</ul> -<h3 id="regularizers">Regularizers</h3> -<p>The bigger your model is, the bigger the capacity for overfitting.</p> -<p>Regularizers pull the model back towards simpler models, but don't eliminate<br> -more complex solutions.</p> -<h4 id="l2-regularizer">L2 regularizer</h4> -<p>"Simpler means smaller parameters"</p> -<p>Take all params, stick them in one vector ("θ"). Then <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>l</mi><mi>o</mi><mi>s</mi><msub><mi>s</mi><mrow><mi>r</mi><mi>e</mi><mi>g</mi></mrow></msub><mo>=</mo><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi><mo>+</mo><mi>λ</mi><mi mathvariant="normal">∥</mi><mi>θ</mi><mi mathvariant="normal">∥</mi></mrow><annotation encoding="application/x-tex">loss_{reg} = loss + -\lambda \|\theta\|</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.980548em;vertical-align:-0.286108em;"></span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord"><span class="mord mathdefault">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02778em;">r</span><span class="mord mathdefault mtight">e</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">g</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.77777em;vertical-align:-0.08333em;"></span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">λ</span><span class="mord">∥</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mord">∥</span></span></span></span></p> -<p>Models with bigger weights get higher loss, but if it's worth it (i.e. original<br> -loss decreases enough), they can still beat simpler models.</p> -<p>If you have a bowl where you want to roll a marble to the lowest point, L2 loss<br> -is like tipping the bowl slightly to the right (shifting the lowest point).</p> -<h4 id="l1-regulariser">L1 regulariser</h4> -<p>"Simpler means smaller parameters and more zero parameters"</p> -<p>lp norm: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∥</mi><mi>θ</mi><msup><mi mathvariant="normal">∥</mi><mi>p</mi></msup><mo>=</mo><mroot><mrow><msup><mi>w</mi><mi>p</mi></msup><mo>+</mo><msup><mi>b</mi><mi>p</mi></msup></mrow><mi>p</mi></mroot></mrow><annotation encoding="application/x-tex">\|\theta\|^{p} = \sqrt[p]{w^{p}+b^{p}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∥</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mord"><span class="mord">∥</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.664392em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">p</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.04em;vertical-align:-0.14944500000000005em;"></span><span class="mord sqrt"><span class="root"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6599459999999999em;"><span style="top:-2.944666em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size6 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">p</span></span></span></span></span></span></span></span><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.890555em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord" style="padding-left:0.833em;"><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.590392em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">p</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.590392em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">p</span></span></span></span></span></span></span></span></span></span></span><span style="top:-2.850555em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em;"><svg width='400em' height='1.08em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702 -c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14 -c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54 -c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10 -s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429 -c69,-144,104.5,-217.7,106.5,-221 -l0 -0 -c5.3,-9.3,12,-14,20,-14 -H400000v40H845.2724 -s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7 -c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z -M834 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.14944500000000005em;"><span></span></span></span></span></span></span></span></span> <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi><mo>←</mo><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi><mo>+</mo><mi>λ</mi><mi mathvariant="normal">∥</mi><mi>θ</mi><msup><mi mathvariant="normal">∥</mi><mn>1</mn></msup></mrow><annotation encoding="application/x-tex">loss \leftarrow loss + -\lambda \|\theta\|^{1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.77777em;vertical-align:-0.08333em;"></span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="mord mathdefault">λ</span><span class="mord">∥</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mord"><span class="mord">∥</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span></span></span></span></span></span></span></span></p> -<p>If you have a bowl where you want to roll a marble to the lowest point, L1 loss<br> -is like using a square bowl -- if it has groves along dimensions, marble is<br> -likely to end up in one of the grooves.</p> -<h4 id="dropout-regularisation">Dropout regularisation</h4> -<p>"Simpler means more robust; during training, randomly disable hidden units"</p> -<p>During training, remove hidden and input nodes, each with probability p. This<br> -prevents co-adaptation -- multiple neurons firing together in specific<br> -combinations.</p> -<p>The analogy is if you can learn how to do a task repeatedly whilst drunk, you<br> -should be able to do the task sober. So basically, do all of the practice exams<br> -while drunk, and then you'll ace the final while sober (or you'll fail and<br> -disprove all of machine learning, choose your destiny). But if anyone asks, I<br> -didn't tell you to do that.</p> -<h2 id="convolutional-neural-networks">Convolutional neural networks</h2> -<p>Disclaimer: I'm gonna revise these notes, the prof basically covered all of CNN<br> -theory in ten minutes lol. So I don't have much here atm.</p> -<p>Hidden layer has shape of another image, with more channels.</p> -<p>Hidden nodes only wired to nearby nodes in the previous layer.</p> -<p>Weights are shared, each hidden node has same weights as the previous layer.</p> -<p>Maxpooling reduces image dimensions.</p> -<h2 id="deep-learning-vs-machine-learning">Deep learning vs machine learning</h2> -<p>In ML, you chain things together. But chaining modules that are 99% accurate<br> -doesn't mean the whole pipeline is 99% accurate, as error accumulates.</p> -<p>In deep learning, make each module differentiable - ensure that we can work out<br> -<strong>local</strong> gradient, so we can train pipeline as a whole using backpropagation.<br> -This is "end-to-end learning".</p> -<p>It's a lower level of abstraction, giving you smaller building blocks.</p> -<h2 id="generators">Generators</h2> -<p>Visual shorthand:</p> -<p><img src="_resources/4f24499ecda0424abfc6b408bf663267.png" alt=""></p> -<p>How do you turn neural network into probability distribution?</p> -<ul> -<li> -<p>option 1: take output and interpret it as parameters of multivariate normal (μ, Σ)</p> -<ul> -<li>if output has high dimensions, take Σ to be diagonal matrix</li> -<li>allows network to communicate how sure it's about the output (i.e. smaller variances in Σ mean it's more sure)</li> -<li>allows sampling from the generator, and computing prob density <img src="_resources/45614363f80f489eb6424d1ba48915a8.png" alt="">x</li> -</ul> -</li> -<li> -<p>option 2: start with an MVN, sample vector from it, feed that vector to the NN, and look at what comes out</p> -<ul> -<li> -<p>cannot easily compute prob density for an instance</p> -</li> -<li> -<p>can easily sample</p> -<p><img src="_resources/1efde6bbc5484b4481db40d089140c0b.png" alt=""></p> -</li> -</ul> -</li> -<li> -<p>option 3: both. i.e., sample input from standard MVN, interpret output as another MVN, then sample from that.</p> -<ul> -<li>input is called z</li> -<li>space of inputs is the latent space</li> -<li>naive approach: sample random point from data, sample point from model, train on how close they are. loss could be any distance between tensors, like mean-square error -<ul> -<li>doesn't work -- mode collapse.</li> -<li>if a generated point is close to a mode, the model should be rewarded, but since it's also far away from some other points, we might compute the loss to a different point</li> -<li>the different modes (areas of high prob) of data distr end up being averaged into a single point</li> -<li>we want network to imagine details, not average over all possibilities</li> -</ul> -</li> -</ul> -<p><img src="_resources/1a5455e19f984f23b9cc90fb4d99d59c.png" alt=""></p> -</li> -</ul> -<p>How do you 'fix' mode collapse?</p> -<h2 id="generative-adversarial-networks">Generative adversarial networks</h2> -<p>If you can generate adversarial examples (i.e. try to break your network), you can also add them to the dataset and then retrain your network.</p> -<p>Generator: takes input sampled from standard MVN, produces image</p> -<p>Discriminator: takes image, classifies as Pos (real) or Neg (fake)</p> -<h3 id="vanilla-gans">Vanilla GANs</h3> -<p>Training discriminator:</p> -<ul> -<li>feed examples from positive class</li> -<li>train it to classify them as Pos (just nudge the weights with backpropagation)</li> -<li>sample images from generator, train it to make them negative</li> -</ul> -<p>Training generator:</p> -<ul> -<li>freeze discriminator</li> -<li>train weights of generator to produce images that the discriminator labels as positive</li> -</ul> -<h3 id="conditional-gans">Conditional GANs</h3> -<p>If we want network to generate output probabilistically. i.e., the network has to fill in realistic details.</p> -<p>Make the generator a function, taking input and mapping it to output. Uses randomness to imagine specific output details.</p> -<p>Feed discriminator:</p> -<ul> -<li>either input/output pair from data, which it should classify as real</li> -<li>or input from data with output generated by generator, which it should classify as fake</li> -</ul> -<p>Training generator in two ways:</p> -<ol type="a"> - <li>freeze weights of discriminator, train generator to produce stuff that the discriminator will classify as real</li> - <li>feed it an input from data, backpropagate on corresponding output using L1 loss</li> -</ol> -<p>Only works if input and output matched; for some tasks, only have unmatched bags of images in two domains. Can't randomly match because mode collapse. So what do?</p> -<h3 id="cyclegan">CycleGAN</h3> -<p>Add "cycle consistency term" to loss function.</p> -<p>E.g. in horse-to-zebra example, if transform horse to zebra and back, result should be close to original image.</p> -<p>So, new goal:</p> -<ul> -<li>train horse-to-zebra transformer and zebra-to-horse transformer, such that</li> -<li>horse-discriminator can't tell generated horses (and zebras) from real ones</li> -<li>cycle consistency loss for both combined is low</li> -</ul> -<p>Think of generators doing steganography (hiding info in pictures). For example, hiding a horse inside a zebra (picture, obviously).</p> -<h3 id="stylegan">StyleGAN</h3> -<p>Feed the network the latent vector at each layer.</p> -<p>Since deconvolution starts with low resolution, high level description of image, feeding it latent vector at each layer allows it to use different parts of the vector to describe different aspects of the image ("styles").</p> -<p>Network also receives separate extra random noise per layer, which allows it to make random choices.</p> -<p>Then generate image for destination, but for a few layers (bottom, middle, or top) we use source latent vector instead.</p> -<h3 id="what-can-we-do-with-a-generator">What can we do with a generator?</h3> -<p>Gotta fill this in.</p> -<h2 id="autoencoders">Autoencoders</h2> -<p>A type of neural network that tries to make output as close to input as possible, but there is a middle layer (smaller than input) that functions as a bottleneck.</p> -<p>After network is trained, that layer becomes a compressed representation of the input.</p> -<p><img src="_resources/e935de30948c46dfabccb5d24b5e1a5e.png" alt=""></p> -<p>blue layer is latent representation of input. If autoencoder works well, expect to see similar images clustered together.</p> -<p>To find direction in latent space that we can use to make someone smile, we label instances as smiling and nonsmiling, and draw vector between their respective means. That's called the smiling vector (god I can't take this shit seriously)</p> -<h3 id="turning-an-autoencoder-into-a-generator">Turning an autoencoder into a generator</h3> -<p>How:</p> -<ul> -<li>train an autoencoder</li> -<li>encode the data to latent variables Z</li> -<li>fit MVN to Z</li> -<li>sample from the MVN</li> -<li>"decode" the sample</li> -</ul> -<p>But we're training for reconstruction error, and then turning result into autoencoder. Can we train for maximum likelihood directly?</p> -<h2 id="variational-autoencoders">Variational autoencoders</h2> -<p>Force decoder to also decode points near z correctly, and force latent distribution of data towards N(0,1). Can be derived from first principles.</p> -<p>Approximate P(z | z,θ) with neural network, and make that the q function.</p> -<p>Want to choose parameters θ (weights of neural network) to maximise log likelihood of data.</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>ln</mi><mo></mo><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mi>θ</mi><mo stretchy="false">)</mo></mrow><mo>=</mo><mi>L</mi><mo stretchy="false">(</mo><mi>q</mi><mo separator="true">,</mo><mi>θ</mi><mo stretchy="false">)</mo><mo>+</mo><mi>K</mi><mi>L</mi><mo stretchy="false">(</mo><mi>q</mi><mo separator="true">,</mo><mi>p</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\ln{P(x|\theta)} = L(q, \theta) + KL(q,p)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">L</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="mord mathdefault">L</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">p</span><span class="mclose">)</span></span></span></span> with <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mo>=</mo><mi>P</mi><mo stretchy="false">(</mo><mi>z</mi><mi mathvariant="normal">∣</mi><mi>x</mi><mo separator="true">,</mo><mi>θ</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P = P(z|x,\theta)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.04398em;">z</span><span class="mord">∣</span><span class="mord mathdefault">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span>.</p> -<ul> -<li>q(z|x) any approximation to P(z|x)</li> -<li>KL(q, p) - Kullback-Leibler divergence</li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>L</mi><mo stretchy="false">(</mo><mi>q</mi><mo separator="true">,</mo><mi>θ</mi><mo stretchy="false">)</mo><mo>=</mo><msub><mi>E</mi><mi>q</mi></msub><mi>ln</mi><mo></mo><mfrac><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo separator="true">,</mo><mi>z</mi><mi mathvariant="normal">∣</mi><mi>θ</mi><mo stretchy="false">)</mo></mrow><mrow><mi>q</mi><mo stretchy="false">(</mo><mi>z</mi><mi mathvariant="normal">∣</mi><mi>x</mi><mo stretchy="false">)</mo></mrow></mfrac></mrow><annotation encoding="application/x-tex">L(q, \theta) = E_{q} \ln{\frac{P(x,z|\theta)}{q(z|x)}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">L</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.53em;vertical-align:-0.52em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05764em;">E</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-left:-0.05764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">q</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.01em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">q</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight" style="margin-right:0.04398em;">z</span><span class="mord mtight">∣</span><span class="mord mathdefault mtight">x</span><span class="mclose mtight">)</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.485em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">P</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight">x</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.04398em;">z</span><span class="mord mtight">∣</span><span class="mord mathdefault mtight" style="margin-right:0.02778em;">θ</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.52em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></li> -</ul> -<p>We can't marginalize out hidden variable z, or compute probability over z given x. Instead, use approximation on prob of z given x, called q, and optimise both probability of x given z and z given x.</p> -<p><img src="_resources/650e25dac37b4b4db20998694f3f6146.png" alt=""></p> -<p>Solves mode collapse, because we map input to latent space and back to data space, so we know which instance the generated output should look like.</p> -<p>Sorry guys this lecture was hard to follow, I'll finish this part up when I revise for exams.</p> -</div></div> - </body> - </html> diff --git a/content/ml-notes/_resources/1a5455e19f984f23b9cc90fb4d99d59c.png b/content/ml-notes/Deep learning/1a5455e19f984f23b9cc90fb4d99d59c.png Binary files differ. diff --git a/content/ml-notes/_resources/1efde6bbc5484b4481db40d089140c0b.png b/content/ml-notes/Deep learning/1efde6bbc5484b4481db40d089140c0b.png Binary files differ. diff --git a/content/ml-notes/_resources/45614363f80f489eb6424d1ba48915a8.png b/content/ml-notes/Deep learning/45614363f80f489eb6424d1ba48915a8.png Binary files differ. diff --git a/content/ml-notes/_resources/4f24499ecda0424abfc6b408bf663267.png b/content/ml-notes/Deep learning/4f24499ecda0424abfc6b408bf663267.png Binary files differ. diff --git a/content/ml-notes/_resources/650e25dac37b4b4db20998694f3f6146.png b/content/ml-notes/Deep learning/650e25dac37b4b4db20998694f3f6146.png Binary files differ. diff --git a/content/ml-notes/_resources/8c2398ce91ed4694abf679c536b2cf61.png b/content/ml-notes/Deep learning/8c2398ce91ed4694abf679c536b2cf61.png Binary files differ. diff --git a/content/ml-notes/_resources/bf4ec9fe629e41389da29c0de7efb63d.png b/content/ml-notes/Deep learning/bf4ec9fe629e41389da29c0de7efb63d.png Binary files differ. diff --git a/content/ml-notes/_resources/e935de30948c46dfabccb5d24b5e1a5e.png b/content/ml-notes/Deep learning/e935de30948c46dfabccb5d24b5e1a5e.png Binary files differ. diff --git a/content/ml-notes/_resources/edfac0f6027c40c9a9e012e658f54d68.png b/content/ml-notes/Deep learning/edfac0f6027c40c9a9e012e658f54d68.png Binary files differ. diff --git a/content/ml-notes/Deep learning/index.md b/content/ml-notes/Deep learning/index.md @@ -0,0 +1,399 @@ ++++ +title = 'Deep learning' +template = 'page-math.html' ++++ +# Deep learning +## Deep learning systems (autodiff engines) + +### Tensors + +To scale up backpropagation, want to move from operations on scalars to tensors. + +Tensor: generalisation of vectors/matrices to higher dimensions. e.g. a 2-tensor +has two dimensions, a 4-tensor has 4 dimensions. + +You can represent data as a tensor. e.g. an RGB image is a 3-tensor of the red, +green, and blue values for each pixel. + +### Functions on tensors + +Functions have inputs and outputs, all of which are tensors. + +They implement: + +- `forward(...)`: computing outputs given the inputs +- `backward(...)`: computing gradients over inputs, given gradients over + outputs + +The modules we chain together are defined in a computation graph: + +![](bf4ec9fe629e41389da29c0de7efb63d.png) + +A deep learning system uses this graph to execute a computation (forward pass), +and does backpropagation to compute gradients to data nodes wrt the output +(backward pass). + +Autodiff engine + +- Perform computation by chaining functions +- keeps track of all computation in a computation graph +- when computation done, walk backward through computation graph for + backpropagation +- eager evaluation: build graph as we perform computation + +## Backpropagation revisited + +Functions can have any number of inputs and outputs, which must be tensors. + +The final output must be a scalar (i.e. always take derivative of scalar +function). + +### Multivariate chain rule + +How do you take derivatives when variables aren't scalars? + +Multiple inputs: + +![](edfac0f6027c40c9a9e012e658f54d68.png) + +How do you find the derivative with two inputs? Use the multivariate chain rule, +i.e. take single derivative for each input and then sum them. + +$\frac{\partial c}{\partial x} = \frac{\partial c}{\partial a} \frac{\partial a}{\partial x} + \frac{\partial c}{\partial b} \frac{\partial b}{\partial x}$ + +### Backpropagation with tensors - matrix calculus + +Start with scalar derivatives: one output over one input (just pick a random +one) + +Tensor derivative: put all possible scalar derivatives into a tensor. + +But how to arrange/order the tensor? + +Solution: accumulate the gradient product. + +forward(x): given input x, compute output y + +backward(l<sub>y</sub>): given $l_{y} = \frac{\partial loss}{\partial y}$, compute +$\frac{\partial loss}{\partial y} \frac{\partial y}{\partial x}$. + +convention: gradient of A has same shape as A + +#### Example: + +Let: + +- k = Wx + b +- forward(W, x, b): compute Wx + b +- backward(l<sub>k</sub>): compute $\frac{\partial l}{\partial k} \frac{\partial k}{\partial + W}, \quad \frac{\partial l}{\partial k} \frac{\partial k}{\partial x}, \quad + \frac{\partial l}{\partial k} \frac{\partial k}{\partial b}$ + +Steps: + +1. work out scalar derivative: $\frac{\partial l}{\partial k} \frac{\partial +k}{\partial W_{23}}$ +2. apply multivariate chain rule $\frac{\partial l}{\partial k} \frac{\partial +k}{\partial W_{23}} = ... = \frac{\partial l}{\partial k_{2}} x_{3}$ +3. now we know that $\frac{\partial l}{\partial k} \frac{\partial k}{\partial W_{ij}} = +\frac{\partial l}{\partial k_{i}} x_{j}$ +4. so, $\frac{\partial l}{\partial k} \frac{\partial k}{\partial W} = \frac{\partial +l}{\partial k} x^{T}$ + +## Making deep neural nets work + +### Overcoming vanishing gradients + +If weights of network are initialized too high, activations will hit rightmost +part of gradient, so local gradient for each node will be very close to zero. So +network won't start learning. + +If they are too negative, then hit leftmost part of sigmoid, and get the same +problem. + +![](8c2398ce91ed4694abf679c536b2cf61.png) + +ReLU preserves derivatives for nodes whose activations it lets through. Kills +derivatives for nodes that produce negative value, but as long as network is +properly initialised, around half of values in batch will always produce +positive input for ReLU. + +Still risk that durin training, the network will move to configuration where +neuron always produces negative input for every instance in data. In that case, +end up with a dead neuron - its gradient will always be zero, no weights below +that neuron will change anymore (unless they also feed into a non-dead neuron). + +Initialization: + +- assume that the layer input is roughly distributed so that its mean is 0 and + variance is 1 in every direction (standardise/normalise data so this is true + for first layer) +- initialisation designed to pick a random matrix that keeps these properties + true + +### Minibatch gradient descent + +Like stochastic gradient descent, but with small batches of instances, instead +of single instances. + +- smaller batches: close to stochastic gradient descent, more noisy, less + parallelism +- bigger batches: more like regular gradient descnet, more parallelism, limit + is memory + +In general, stay between 16 and 128 instances. + +### Optimizers + +#### Momentum + +If gradient descent is a hiker in a snowstorm, moment gradient descent is a +boulder rolling down the hill. + +Gradient doesn't affect its movement directly, but acts as a force on moving +object. If gradient is zero, updates continue in the same direction, but slowed +down by a 'friction constant' (μ). + +Regular gradient descent: $w \leftarrow w - \eta \nabla loss(w)$ + +With momentum: + +- $v \leftarrow \mu v - \eta \nabla loss(w)$ +- $w \leftarrow w + v$ + +#### Nesterov momentum + +In regular momentum, actual stem taken is sum of two vectors: the momentum step +(in direction we took last iteration) and gradient step (in direction of +steepest descent at current point) + +This evaluates gradient after the momentum step, since we are taking that step +anyway. Makes the gradient a bit more accurate. + +#### Adam + +Combines idea of momentum with idea that each weight should have its own +learning rate. + +Normalize gradients: keep running mean m and uncentered variance v, for each +parameter over the gradient. Subtract these instead of the gradient. + +Calculations: + +- $m \leftarrow \beta_{1} * m + (1 - \beta_{1}) \nabla loss(w)$ +- $v \leftarrow \beta_{2} * v + (1 - \beta_{2}) (\nabla loss(w))^{2}$ +- $w \leftarrow w - \eta \frac{m}{\sqrt{v} + \epsilon}$ + +### Regularizers + +The bigger your model is, the bigger the capacity for overfitting. + +Regularizers pull the model back towards simpler models, but don't eliminate +more complex solutions. + +#### L2 regularizer + +"Simpler means smaller parameters" + +Take all params, stick them in one vector ("θ"). Then $loss_{reg} = loss + +\lambda \|\theta\|$ + +Models with bigger weights get higher loss, but if it's worth it (i.e. original +loss decreases enough), they can still beat simpler models. + +If you have a bowl where you want to roll a marble to the lowest point, L2 loss +is like tipping the bowl slightly to the right (shifting the lowest point). + +#### L1 regulariser + +"Simpler means smaller parameters and more zero parameters" + +lp norm: $\|\theta\|^{p} = \sqrt[p]{w<sup>{p}+b</sup>{p}}$ $loss \leftarrow loss + +\lambda \|\theta\|^{1}$ + +If you have a bowl where you want to roll a marble to the lowest point, L1 loss +is like using a square bowl -- if it has groves along dimensions, marble is +likely to end up in one of the grooves. + +#### Dropout regularisation + +"Simpler means more robust; during training, randomly disable hidden units" + +During training, remove hidden and input nodes, each with probability p. This +prevents co-adaptation -- multiple neurons firing together in specific +combinations. + +The analogy is if you can learn how to do a task repeatedly whilst drunk, you +should be able to do the task sober. So basically, do all of the practice exams +while drunk, and then you'll ace the final while sober (or you'll fail and +disprove all of machine learning, choose your destiny). But if anyone asks, I +didn't tell you to do that. + +## Convolutional neural networks + +Disclaimer: I'm gonna revise these notes, the prof basically covered all of CNN +theory in ten minutes lol. So I don't have much here atm. + +Hidden layer has shape of another image, with more channels. + +Hidden nodes only wired to nearby nodes in the previous layer. + +Weights are shared, each hidden node has same weights as the previous layer. + +Maxpooling reduces image dimensions. + +## Deep learning vs machine learning + +In ML, you chain things together. But chaining modules that are 99% accurate +doesn't mean the whole pipeline is 99% accurate, as error accumulates. + +In deep learning, make each module differentiable - ensure that we can work out +**local** gradient, so we can train pipeline as a whole using backpropagation. +This is "end-to-end learning". + +It's a lower level of abstraction, giving you smaller building blocks. + +## Generators + +Visual shorthand: + +![](4f24499ecda0424abfc6b408bf663267.png) + +How do you turn neural network into probability distribution? + +- option 1: take output and interpret it as parameters of multivariate normal (μ, Σ) + - if output has high dimensions, take Σ to be diagonal matrix + - allows network to communicate how sure it's about the output (i.e. smaller variances in Σ mean it's more sure) + - allows sampling from the generator, and computing prob density ![](45614363f80f489eb6424d1ba48915a8.png)x +- option 2: start with an MVN, sample vector from it, feed that vector to the NN, and look at what comes out + - cannot easily compute prob density for an instance + + - can easily sample + + ![](1efde6bbc5484b4481db40d089140c0b.png) +- option 3: both. i.e., sample input from standard MVN, interpret output as another MVN, then sample from that. + - input is called z + - space of inputs is the latent space + - naive approach: sample random point from data, sample point from model, train on how close they are. loss could be any distance between tensors, like mean-square error + - doesn't work -- mode collapse. + - if a generated point is close to a mode, the model should be rewarded, but since it's also far away from some other points, we might compute the loss to a different point + - the different modes (areas of high prob) of data distr end up being averaged into a single point + - we want network to imagine details, not average over all possibilities + + ![](1a5455e19f984f23b9cc90fb4d99d59c.png) + +How do you 'fix' mode collapse? + +## Generative adversarial networks + +If you can generate adversarial examples (i.e. try to break your network), you can also add them to the dataset and then retrain your network. + +Generator: takes input sampled from standard MVN, produces image + +Discriminator: takes image, classifies as Pos (real) or Neg (fake) + +### Vanilla GANs + +Training discriminator: + +- feed examples from positive class +- train it to classify them as Pos (just nudge the weights with backpropagation) +- sample images from generator, train it to make them negative + +Training generator: + +- freeze discriminator +- train weights of generator to produce images that the discriminator labels as positive + +### Conditional GANs + +If we want network to generate output probabilistically. i.e., the network has to fill in realistic details. + +Make the generator a function, taking input and mapping it to output. Uses randomness to imagine specific output details. + +Feed discriminator: + +- either input/output pair from data, which it should classify as real +- or input from data with output generated by generator, which it should classify as fake + +Training generator in two ways: +<ol type="a"> + <li>freeze weights of discriminator, train generator to produce stuff that the discriminator will classify as real</li> + <li>feed it an input from data, backpropagate on corresponding output using L1 loss</li> +</ol> + +Only works if input and output matched; for some tasks, only have unmatched bags of images in two domains. Can't randomly match because mode collapse. So what do? + +### CycleGAN + +Add "cycle consistency term" to loss function. + +E.g. in horse-to-zebra example, if transform horse to zebra and back, result should be close to original image. + +So, new goal: + +- train horse-to-zebra transformer and zebra-to-horse transformer, such that +- horse-discriminator can't tell generated horses (and zebras) from real ones +- cycle consistency loss for both combined is low + +Think of generators doing steganography (hiding info in pictures). For example, hiding a horse inside a zebra (picture, obviously). + +### StyleGAN + +Feed the network the latent vector at each layer. + +Since deconvolution starts with low resolution, high level description of image, feeding it latent vector at each layer allows it to use different parts of the vector to describe different aspects of the image ("styles"). + +Network also receives separate extra random noise per layer, which allows it to make random choices. + +Then generate image for destination, but for a few layers (bottom, middle, or top) we use source latent vector instead. + +### What can we do with a generator? + +Gotta fill this in. + +## Autoencoders + +A type of neural network that tries to make output as close to input as possible, but there is a middle layer (smaller than input) that functions as a bottleneck. + +After network is trained, that layer becomes a compressed representation of the input. + +![](e935de30948c46dfabccb5d24b5e1a5e.png) + +blue layer is latent representation of input. If autoencoder works well, expect to see similar images clustered together. + +To find direction in latent space that we can use to make someone smile, we label instances as smiling and nonsmiling, and draw vector between their respective means. That's called the smiling vector (god I can't take this shit seriously) + +### Turning an autoencoder into a generator + +How: + +- train an autoencoder +- encode the data to latent variables Z +- fit MVN to Z +- sample from the MVN +- "decode" the sample + +But we're training for reconstruction error, and then turning result into autoencoder. Can we train for maximum likelihood directly? + +## Variational autoencoders + +Force decoder to also decode points near z correctly, and force latent distribution of data towards N(0,1). Can be derived from first principles. + +Approximate P(z \| z,θ) with neural network, and make that the q function. + +Want to choose parameters θ (weights of neural network) to maximise log likelihood of data. + +$\ln{P(x|\theta)} = L(q, \theta) + KL(q,p)$ with $P = P(z|x,\theta)$. + +- q(z\|x) any approximation to P(z\|x) +- KL(q, p) - Kullback-Leibler divergence +- $L(q, \theta) = E_{q} \ln{\frac{P(x,z|\theta)}{q(z|x)}}$ + +We can't marginalize out hidden variable z, or compute probability over z given x. Instead, use approximation on prob of z given x, called q, and optimise both probability of x given z and z given x. + +![](650e25dac37b4b4db20998694f3f6146.png) + +Solves mode collapse, because we map input to latent space and back to data space, so we know which instance the generated output should look like. + +Sorry guys this lecture was hard to follow, I'll finish this part up when I revise for exams. diff --git a/content/ml-notes/Introduction.html b/content/ml-notes/Introduction.html @@ -1,84 +0,0 @@ - - <!DOCTYPE html> - <html> - <head> - <meta charset="UTF-8"> - - <title>Introduction</title> - <link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head> - <body> - -<div id="rendered-md"><h1 id="introduction">Introduction</h1> -<nav class="table-of-contents"><ul><li><a href="#introduction">Introduction</a><ul><li><a href="#what-is-ml">What is ML?</a></li><li><a href="#supervised-ml">Supervised ML</a><ul><li><a href="#classification">Classification</a></li><li><a href="#regression">Regression</a></li></ul></li><li><a href="#unsupervised-ml">Unsupervised ML</a></li><li><a href="#what-isnt-ml">What isn't ML?</a></li></ul></li></ul></nav><h2 id="what-is-ml">What is ML?</h2> -<p>Deductive vs inductive reasoning:</p> -<ul> -<li>Deductive (conclusion by logic): discrete, unambiguous, provable, known rules</li> -<li>Inductive (conclusion from experience): fuzzy, ambiguous, experimental, unknown rules</li> -</ul> -<p>ML lets systems learn and improve from experience without being explicitly programmed (for a specific situation).</p> -<p>Used in software, analytics, data mining, data science, statistics.</p> -<p>Problem is suitable for ML <em>if we can't solve it explicitly</em>.</p> -<ul> -<li>when approximate solutions are ok</li> -<li>when reliability is not the biggest focus</li> -</ul> -<p>Why don't we have explicit solutions? Sometimes could be expensive, or could change over time, or other reasons.</p> -<p><img src="_resources/6610df2f6a4a4d21ad34c09c3468f115.png" alt="overview-diagram.png"></p> -<p>An intelligent agent:</p> -<ul> -<li>online learning: acting + learning simultaneously</li> -<li>reinforcement learning: online learning in a world based on delayed feedback</li> -</ul> -<p>Offline learning: separate learning and acting</p> -<ul> -<li>take fixed dataset of examples</li> -<li>train model on that dataset</li> -<li>test the model, and if it works, use it in prod</li> -</ul> -<h2 id="supervised-ml">Supervised ML</h2> -<p>Supervised: explicit examples of input and output. Learn to predict output for unseen input.</p> -<p>learning tasks:</p> -<ul> -<li>classification: assign class to each example</li> -<li>regression: assign number to each example</li> -</ul> -<h3 id="classification">Classification</h3> -<p>how do you reduce a problem to classification? e.g. every pixel in a grayscale image is a feature, label each feature</p> -<p>classification: output labels are classes (categorical data)</p> -<p>linear classifier: just draw a line, plane, or hyperplane</p> -<ul> -<li>feature space: contains features</li> -<li>model space: contains models. the bright spots have low loss.</li> -<li>loss function: performance of model on data, the lower the better</li> -</ul> -<p>decision tree classifier: every node is a condition for a feature, go down branch based on condition. would look like a step function in a graph.</p> -<p>k-nearest-neighbors: lazy, doesn't do anything, just remembers the data (?? have to look this up in more detail)<br> -features: numerical or categorical</p> -<p>binary classification: only have two classes</p> -<p>multiclass classification: more than two classes</p> -<h3 id="regression">Regression</h3> -<p>regression: output labels are numbers. the model we're trying to learn is a function from feature space to ℜ</p> -<p>loss function: maps model to number that expresses how well it fits the data</p> -<p>common example: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi><mo stretchy="false">(</mo><mi>p</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><msub><mo>∑</mo><mi>i</mi></msub><mo stretchy="false">(</mo><msub><mi>f</mi><mi>p</mi></msub><mo stretchy="false">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo stretchy="false">)</mo><mo>−</mo><msub><mi>y</mi><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">loss(p) = \frac{1}{n} \sum_i (f_p (x_i) - y_i)^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mopen">(</span><span class="mord mathdefault">p</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-left:-0.10764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">p</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span></p> -<p>takes difference between model prediction and target value (residual), then square and sum all residuals</p> -<p>overfitting: the model is too specific to the data, it's memorizing the data instead of generalizing</p> -<p>split test and training data. don't judge performance on training data, the aim is to minimise loss on <em>test</em> data.</p> -<h2 id="unsupervised-ml">Unsupervised ML</h2> -<p>Unsupervised: only inputs provided, find <em>any</em> pattern that explains something about data.</p> -<p>learning tasks:</p> -<ul> -<li>clustering: classification, except no target column, so model outputs cluster id</li> -<li>density estimation: model outputs a number (probability density), should be high for instances of data that are likely. e.g. fitting prob distribution to data</li> -<li>generative modeling: build a model from which you can sample new examples</li> -</ul> -<h2 id="what-isnt-ml">What isn't ML?</h2> -<p>ML is a subdomain of AI.</p> -<ul> -<li>AI, but not ML: automated reasoning, planning</li> -<li>Data Science, not ML: gathering, harmonising, and interpreting data</li> -<li>Data mining is more closely related, but e.g. finding fraud in transaction networks is closer to data mining</li> -<li>Stats wants to figure out the truth, whereas with ML it just has to work well enough, but doesn't necessarily have to be true</li> -</ul> -</div></div> - </body> - </html> diff --git a/content/ml-notes/_resources/6610df2f6a4a4d21ad34c09c3468f115.png b/content/ml-notes/Introduction/6610df2f6a4a4d21ad34c09c3468f115.png Binary files differ. diff --git a/content/ml-notes/Introduction/index.md b/content/ml-notes/Introduction/index.md @@ -0,0 +1,94 @@ ++++ +title = 'Introduction' +template = 'page-math.html' ++++ +# Introduction +## What is ML? +Deductive vs inductive reasoning: + + * Deductive (conclusion by logic): discrete, unambiguous, provable, known rules + * Inductive (conclusion from experience): fuzzy, ambiguous, experimental, unknown rules + +ML lets systems learn and improve from experience without being explicitly programmed (for a specific situation). + +Used in software, analytics, data mining, data science, statistics. + +Problem is suitable for ML _if we can't solve it explicitly_. + + * when approximate solutions are ok + * when reliability is not the biggest focus + +Why don't we have explicit solutions? Sometimes could be expensive, or could change over time, or other reasons. + +![overview-diagram.png](6610df2f6a4a4d21ad34c09c3468f115.png) + +An intelligent agent: + + * online learning: acting + learning simultaneously + * reinforcement learning: online learning in a world based on delayed feedback + +Offline learning: separate learning and acting + + * take fixed dataset of examples + * train model on that dataset + * test the model, and if it works, use it in prod + +## Supervised ML +Supervised: explicit examples of input and output. Learn to predict output for unseen input. + +learning tasks: + + * classification: assign class to each example + * regression: assign number to each example + +### Classification +how do you reduce a problem to classification? e.g. every pixel in a grayscale image is a feature, label each feature + +classification: output labels are classes (categorical data) + +linear classifier: just draw a line, plane, or hyperplane + + * feature space: contains features + * model space: contains models. the bright spots have low loss. + * loss function: performance of model on data, the lower the better + +decision tree classifier: every node is a condition for a feature, go down branch based on condition. would look like a step function in a graph. + +k-nearest-neighbors: lazy, doesn't do anything, just remembers the data (?? have to look this up in more detail) +features: numerical or categorical + +binary classification: only have two classes + +multiclass classification: more than two classes + +### Regression +regression: output labels are numbers. the model we're trying to learn is a function from feature space to ℜ + +loss function: maps model to number that expresses how well it fits the data + +common example: $loss(p) = \frac{1}{n} \sum_i (f_p (x_i) - y_i)^2$ + +takes difference between model prediction and target value (residual), then square and sum all residuals + +overfitting: the model is too specific to the data, it's memorizing the data instead of generalizing + +split test and training data. don't judge performance on training data, the aim is to minimise loss on _test_ data. + +## Unsupervised ML +Unsupervised: only inputs provided, find _any_ pattern that explains something about data. + +learning tasks: + +* clustering: classification, except no target column, so model outputs cluster id +* density estimation: model outputs a number (probability density), should be high for instances of data that are likely. e.g. fitting prob distribution to data +* generative modeling: build a model from which you can sample new examples + +## What isn't ML? +ML is a subdomain of AI. + +* AI, but not ML: automated reasoning, planning +* Data Science, not ML: gathering, harmonising, and interpreting data +* Data mining is more closely related, but e.g. finding fraud in transaction networks is closer to data mining +* Stats wants to figure out the truth, whereas with ML it just has to work well enough, but doesn't necessarily have to be true + + diff --git a/content/ml-notes/Linear models.html b/content/ml-notes/Linear models.html @@ -1,279 +0,0 @@ - - <!DOCTYPE html> - <html> - <head> - <meta charset="UTF-8"> - <link rel="stylesheet" href="pluginAssets/highlight.js/atom-one-light.css"> - <title>Linear models</title> - <link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head> - <body> - -<div id="rendered-md"><h1 id="linear-models">Linear models</h1> -<nav class="table-of-contents"><ul><li><a href="#linear-models">Linear models</a><ul><li><a href="#defining-a-model">Defining a model</a></li><li><a href="#but-which-model-fits-best">But which model fits best?</a><ul><li><a href="#mean-squared-error-loss">Mean squared error loss</a></li><li><a href="#optimization-searching">Optimization & searching</a><ul><li><a href="#black-box-optimisation">Black box optimisation</a><ul><li><a href="#random-search">Random search</a></li><li><a href="#simulated-annealing">Simulated annealing</a></li><li><a href="#parallel-search">Parallel search</a></li><li><a href="#branching-search">Branching search</a></li></ul></li><li><a href="#gradient-descent">Gradient descent</a></li><li><a href="#classification-losses">Classification losses</a><ul><li><a href="#least-squares-loss">Least-squares loss</a></li></ul></li></ul></li></ul></li><li><a href="#neural-networks-feedforward">Neural networks (feedforward)</a><ul><li><a href="#overview">Overview</a></li><li><a href="#classification">Classification</a></li><li><a href="#dealing-with-loss-gradient-descent-backpropagation">Dealing with loss - gradient descent & backpropagation</a></li></ul></li><li><a href="#support-vector-machines-svms">Support vector machines (SVMs)</a></li><li><a href="#summary-of-classification-loss-functions">Summary of classification loss functions</a></li></ul></li></ul></nav><h2 id="defining-a-model">Defining a model</h2> -<p>1 feature x: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>f</mi><mrow><mi>w</mi><mo separator="true">,</mo><mi>b</mi></mrow></msub><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>w</mi><mi>x</mi><mo>+</mo><mi>b</mi></mrow><annotation encoding="application/x-tex">f_{w,b}(x) = wx + b</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.036108em;vertical-align:-0.286108em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361079999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.10764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">b</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.66666em;vertical-align:-0.08333em;"></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord mathdefault">b</span></span></span></span></p> -<p>2 features x<sub>1</sub>, x<sub>2</sub>: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>f</mi><mrow><msub><mi>w</mi><mn>1</mn></msub><mo separator="true">,</mo><msub><mi>w</mi><mn>2</mn></msub><mo separator="true">,</mo><mi>b</mi></mrow></msub><mo stretchy="false">(</mo><msub><mi>x</mi><mn>1</mn></msub><mo separator="true">,</mo><msub><mi>x</mi><mn>2</mn></msub><mo stretchy="false">)</mo><mo>=</mo><msub><mi>w</mi><mn>1</mn></msub><msub><mi>x</mi><mn>1</mn></msub><mo>+</mo><msub><mi>w</mi><mn>2</mn></msub><msub><mi>x</mi><mn>2</mn></msub><mo>+</mo><mi>b</mi></mrow><annotation encoding="application/x-tex">f_{w_1,w_2, b}(x_1, x_2) = w_1 x_1 + w_2 x_2 + b</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.036108em;vertical-align:-0.286108em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361079999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.10764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31731428571428577em;"><span style="top:-2.357em;margin-left:-0.02691em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span><span class="mpunct mtight">,</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31731428571428577em;"><span style="top:-2.357em;margin-left:-0.02691em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">b</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.73333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.73333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord mathdefault">b</span></span></span></span></p> -<p>Generally,</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable rowspacing="0.24999999999999992em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><msub><mi>f</mi><mrow><mi>w</mi><mo separator="true">,</mo><mi>b</mi></mrow></msub><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><msub><mi>w</mi><mn>1</mn></msub><msub><mi>x</mi><mn>1</mn></msub><mo>+</mo><msub><mi>w</mi><mn>2</mn></msub><msub><mi>x</mi><mn>2</mn></msub><mo>+</mo><msub><mi>w</mi><mn>3</mn></msub><msub><mi>x</mi><mn>3</mn></msub><mo>+</mo><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mo>+</mo><mi>b</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><msup><mi>w</mi><mi>T</mi></msup><mi>x</mi><mo>+</mo><mi>b</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo>∑</mo><mi>i</mi></munder><msub><mi>w</mi><mi>i</mi></msub><msub><mi>x</mi><mi>i</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mi mathvariant="normal">∥</mi><mi>w</mi><mi mathvariant="normal">∥</mi><mi mathvariant="normal">∥</mi><mi>x</mi><mi mathvariant="normal">∥</mi><mi>cos</mi><mo></mo><mi>α</mi></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex"> -\begin{aligned} -f_{w, b}(x) &= w_1 x_1 + w_2 x_2 + w_3 x_3 + ... + b \\ - &= w^T x + b \\ - &= \sum_{i} w_i x_i \\ - &= \|w\| \|x\| \cos{\alpha} -\end{aligned} -</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:7.179005000000002em;vertical-align:-3.3395025000000014em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:3.8395025em;"><span style="top:-6.049507500000001em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361079999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.10764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">b</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span><span style="top:-4.4981765000000005em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"></span></span><span style="top:-2.7881715em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"></span></span><span style="top:-0.3705024999999993em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:3.3395025000000014em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:3.8395025em;"><span style="top:-6.049507500000001em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord">.</span><span class="mord">.</span><span class="mord">.</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">b</span></span></span><span style="top:-4.4981765000000005em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">b</span></span></span><span style="top:-2.7881715em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0500050000000003em;"><span style="top:-1.872331em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.050005em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.277669em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-0.3705024999999993em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">∥</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mord">∥</span><span class="mord">∥</span><span class="mord mathdefault">x</span><span class="mord">∥</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">cos</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.0037em;">α</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:3.3395025000000014em;"><span></span></span></span></span></span></span></span></span></span></span></p> -<p>with w is vector w<sub>1</sub> to w<sub>n</sub>, x is x<sub>1</sub> to x<sub>n</sub></p> -<p>with <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>w</mi><mo>=</mo><mrow><mo fence="true">(</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>w</mi><mn>1</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mo lspace="0em" rspace="0em">…</mo></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>w</mi><mi>n</mi></msub></mstyle></mtd></mtr></mtable><mo fence="true">)</mo></mrow></mrow><annotation encoding="application/x-tex">w = \begin{pmatrix} w_1 \\ \dots \\ w_n \end{pmatrix}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:3.60004em;vertical-align:-1.55002em;"></span><span class="minner"><span class="mopen"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.05002em;"><span style="top:-2.2500000000000004em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎝</span></span></span><span style="top:-2.8100000000000005em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎜</span></span></span><span style="top:-4.05002em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎛</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.55002em;"><span></span></span></span></span></span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.05em;"><span style="top:-4.21em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.0099999999999993em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="minner">…</span></span></span><span style="top:-1.8099999999999994em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.5500000000000007em;"><span></span></span></span></span></span></span></span><span class="mclose"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.05002em;"><span style="top:-2.2500000000000004em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎠</span></span></span><span style="top:-2.8100000000000005em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎟</span></span></span><span style="top:-4.05002em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎞</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.55002em;"><span></span></span></span></span></span></span></span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>=</mo><mrow><mo fence="true">(</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>x</mi><mn>1</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mo lspace="0em" rspace="0em">…</mo></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>x</mi><mi>n</mi></msub></mstyle></mtd></mtr></mtable><mo fence="true">)</mo></mrow></mrow><annotation encoding="application/x-tex">x = \begin{pmatrix} x_1 \\ \dots \\ x_n \end{pmatrix}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:3.60004em;vertical-align:-1.55002em;"></span><span class="minner"><span class="mopen"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.05002em;"><span style="top:-2.2500000000000004em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎝</span></span></span><span style="top:-2.8100000000000005em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎜</span></span></span><span style="top:-4.05002em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎛</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.55002em;"><span></span></span></span></span></span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.05em;"><span style="top:-4.21em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.0099999999999993em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="minner">…</span></span></span><span style="top:-1.8099999999999994em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.5500000000000007em;"><span></span></span></span></span></span></span></span><span class="mclose"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.05002em;"><span style="top:-2.2500000000000004em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎠</span></span></span><span style="top:-2.8100000000000005em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎟</span></span></span><span style="top:-4.05002em;"><span class="pstrut" style="height:3.1550000000000002em;"></span><span class="delimsizinginner delim-size4"><span>⎞</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.55002em;"><span></span></span></span></span></span></span></span></span></span></span></p> -<h2 id="but-which-model-fits-best">But which model fits best?</h2> -<p>Define loss function, then search for model whihc best fits loss<br> -function.</p> -<h3 id="mean-squared-error-loss">Mean squared error loss</h3> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mtext>loss</mtext><mrow><mi>x</mi><mo separator="true">,</mo><mi>y</mi></mrow></msub><mo stretchy="false">(</mo><mi>p</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><msub><mo>∑</mo><mi>j</mi></msub><mo stretchy="false">(</mo><msub><mi>f</mi><mi>p</mi></msub><mo stretchy="false">(</mo><msup><mi>x</mi><mi>j</mi></msup><mo stretchy="false">)</mo><mo>−</mo><msup><mi>y</mi><mi>j</mi></msup><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">\text{loss}_{x,y}(p) = \frac{1}{n} \sum_j (f_p (x^j) - y^j)^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.036108em;vertical-align:-0.286108em;"></span><span class="mord"><span class="mord text"><span class="mord">loss</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">p</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.280926em;vertical-align:-0.43581800000000004em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.43581800000000004em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-left:-0.10764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">p</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.0746639999999998em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span></p> -<p>Defines residuals that show how far from mean (?)</p> -<p>Why square? Make everything positive, but also penalize outliers</p> -<h3 id="optimization-searching">Optimization & searching</h3> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><mi>p</mi><mo>^</mo></mover><mo>=</mo><msub><mo><mi mathvariant="normal">arg min</mi><mo></mo></mo><mi>p</mi></msub><msub><mtext>loss</mtext><mrow><mi>x</mi><mo separator="true">,</mo><mi>y</mi></mrow></msub><mo stretchy="false">(</mo><mi>p</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\hat p = \argmin_p \text{loss}_{x,y}(p)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathdefault">p</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.16666em;"><span class="mord">^</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.19444em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.130248em;vertical-align:-0.380248em;"></span><span class="mop"><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">i</span><span class="mord mathrm">n</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.057252000000000025em;"><span style="top:-2.4558600000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">p</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.380248em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord text"><span class="mord">loss</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">p</span><span class="mclose">)</span></span></span></span></p> -<p>To escape local minima: add randomness, add multiple models</p> -<p>To converge faster: combine known good models (breeding), inspect the<br> -local neighbourhood</p> -<h4 id="black-box-optimisation">Black box optimisation</h4> -<p>Simple, only need to compute loss function, and a few more TODO<br> -things</p> -<h5 id="random-search">Random search</h5> -<p>Start with random point p in model space.</p> -<pre class="hljs"><code><span class="hljs-attr">loop</span>:<span class="hljs-string"></span> - <span class="hljs-attr">pick</span> <span class="hljs-string">random point p' close to p</span> - <span class="hljs-attr">if</span> <span class="hljs-string">loss(p') < loss(p):</span> - <span class="hljs-attr">p</span> <span class="hljs-string"><- p'</span> -</code></pre> -<p>You need to define what 'close to' means though.</p> -<p>Convexity: the property of having one minimum (i.e. if for any<br> -two points, the line between those points is above the function)</p> -<p>The issue with random search is it can get stuck in a local<br> -minimum. In many situations, local minima are fine, we don't<br> -<em>always</em> need an algorithm for a guaranteed global minimum.</p> -<p>In discrete model spaces (which have a more graph-like<br> -structure), you need to figure out a transition function.</p> -<h5 id="simulated-annealing">Simulated annealing</h5> -<p>'Improved' random search.</p> -<pre class="hljs"><code>pick <span class="hljs-built_in">random</span> <span class="hljs-built_in">point</span> p<span class="hljs-number">'</span> <span class="hljs-built_in">close</span> to p -loop: -pick <span class="hljs-built_in">random</span> <span class="hljs-built_in">point</span> p<span class="hljs-number">'</span> <span class="hljs-built_in">close</span> to p - -..etc TODO the lecturer was going fast as fuck -</code></pre> -<h5 id="parallel-search">Parallel search</h5> -<p>Can also do these searches in parallel, or even parallel with<br> -some communication between searches.</p> -<p>Population methods, eg. evolutionary algorithms:</p> -<pre class="hljs"><code><span class="hljs-keyword">start</span> <span class="hljs-keyword">with</span> population <span class="hljs-keyword">of</span> k models -<span class="hljs-keyword">loop</span>: - <span class="hljs-keyword">rank</span> population <span class="hljs-keyword">by</span> loss - remove the half <span class="hljs-keyword">with</span> worst loss - <span class="hljs-string">"breed"</span> <span class="hljs-keyword">new</span> population <span class="hljs-keyword">of</span> k models - <span class="hljs-keyword">optionally</span>, <span class="hljs-keyword">add</span> a <span class="hljs-keyword">little</span> noise <span class="hljs-keyword">to</span> <span class="hljs-keyword">each</span> <span class="hljs-keyword">child</span> -</code></pre> -<h5 id="branching-search">Branching search</h5> -<p>Coming closer to gradient descent:</p> -<pre class="hljs"><code>pick random point p in model spce -loop: - pick k random <span class="hljs-keyword">points</span> {p_i} <span class="hljs-keyword">close</span> <span class="hljs-keyword">to</span> <span class="hljs-keyword">p</span> - <span class="hljs-keyword">p</span>' <- argmin_p_i loss(p_i) - if -TODO again he switched the fuckin slide -</code></pre> -<h4 id="gradient-descent">Gradient descent</h4> -<p>Good, but doesn't help with global/local minima.</p> -<p>In 2D space, the tangent line is the slope. In higher spaces, the<br> -plane/hyperplane is the gradient (analog of slope).</p> -<p>Gradient:<br> -<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∇</mi><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo separator="true">,</mo><mi>y</mi><mo stretchy="false">)</mo><mo>=</mo><mo stretchy="false">(</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>x</mi></mrow></mfrac><mo separator="true">,</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant="normal">∂</mi><mi>y</mi></mrow></mfrac><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\nabla f(x, y) = (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∇</span><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.4133239999999998em;vertical-align:-0.481108em;"></span><span class="mopen">(</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9322159999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.10764em;">f</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9322159999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault mtight" style="margin-right:0.10764em;">f</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.481108em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mclose">)</span></span></span></span></p> -<p>Tangent hyperplane: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>g</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi mathvariant="normal">∇</mi><mi>f</mi><mo stretchy="false">(</mo><mi>p</mi><msup><mo stretchy="false">)</mo><mi>T</mi></msup><mi>x</mi><mo>+</mo><mi>c</mi></mrow><annotation encoding="application/x-tex">g(x) = \nabla f(p)^T x + c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">g</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.0913309999999998em;vertical-align:-0.25em;"></span><span class="mord">∇</span><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathdefault">p</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">c</span></span></span></span></p> -<p>Gives best approximation at point p.</p> -<p>The direction of steepest ascent:</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable rowspacing="0.24999999999999992em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mi>g</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><msup><mi>w</mi><mi>T</mi></msup><mi>x</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mi mathvariant="normal">∥</mi><mi>w</mi><mi mathvariant="normal">∥</mi><mi mathvariant="normal">∥</mi><mi>x</mi><mi mathvariant="normal">∥</mi><mi>cos</mi><mo></mo><mi>α</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mi mathvariant="normal">∥</mi><mi>x</mi><mi mathvariant="normal">∥</mi></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mn>1</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>→</mo><mi mathvariant="normal">∣</mi><mi mathvariant="normal">∣</mi><mi>w</mi><mi mathvariant="normal">∣</mi><mi mathvariant="normal">∣</mi><mi>cos</mi><mo></mo><mi>α</mi></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex"> -\begin{aligned} -g(x) &= w^T x \\ - &= \|w\| \|x\| \cos{\alpha} \\ - \|x\| &= 1 \\ - &\rightarrow ||w|| \cos{\alpha} -\end{aligned} -</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:6.051330999999999em;vertical-align:-2.7756654999999992em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:3.2756655000000006em;"><span style="top:-5.3843345000000005em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">g</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span><span style="top:-3.8843345000000005em;"><span class="pstrut" style="height:3em;"></span><span class="mord"></span></span><span style="top:-2.3843345000000005em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">∥</span><span class="mord mathdefault">x</span><span class="mord">∥</span></span></span><span style="top:-0.8843345000000009em;"><span class="pstrut" style="height:3em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:2.7756654999999992em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:3.2756655000000006em;"><span style="top:-5.3843345000000005em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mord mathdefault">x</span></span></span><span style="top:-3.8843345000000005em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">∥</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mord">∥</span><span class="mord">∥</span><span class="mord mathdefault">x</span><span class="mord">∥</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">cos</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.0037em;">α</span></span></span></span><span style="top:-2.3843345000000005em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">1</span></span></span><span style="top:-0.8843345000000009em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">→</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">∣</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mord">∣</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">cos</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.0037em;">α</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:2.7756654999999992em;"><span></span></span></span></span></span></span></span></span></span></span></p> -<p>The angle is maximised when cos(α) is 1, so α is 0. So the gradient<br> -is the direction of steepest ascent</p> -<pre class="hljs"><code>pick <span class="hljs-selector-tag">a</span> random point <span class="hljs-selector-tag">p</span> <span class="hljs-keyword">in</span> model space -loop: - <span class="hljs-selector-tag">p</span> <- <span class="hljs-selector-tag">p</span> - \eta \nabla loss(p) -</code></pre> -<p>Usually set η (step size, learning rate) between 0.0001 and 0.1.</p> -<p>Take partial derivatives of loss function, then calculate them.</p> -<p>Cons:</p> -<ul> -<li>only works for continuous model spaces, with smooth loss<br> -functions, for which we can work out the gradient</li> -<li>does not escape local minima</li> -</ul> -<p>Pros:</p> -<ul> -<li>very fast, low memory</li> -<li>very accurate</li> -</ul> -<p>If the model is linear, you don't actually need to search, you<br> -could just set partial derivatives equal to zero and solve.</p> -<p>Sometimes the loss function shouldn't be the same as the evaluation<br> -function, because you might not get a smooth function.</p> -<h4 id="classification-losses">Classification losses</h4> -<h5 id="least-squares-loss">Least-squares loss</h5> -<p>Apply the least-squares calculation, you get a smooth function.<br> -Then you can do gradient descent.</p> -<h2 id="neural-networks-feedforward">Neural networks (feedforward)</h2> -<h3 id="overview">Overview</h3> -<p>Learns a feature extractor together with the classifier</p> -<p>Neuron has inputs (dendrites) and one output (axon) The simplified<br> -version for computers is the 'perceptron':</p> -<ul> -<li>inputs are features (x)</li> -<li>multiply each input with a weight (w)</li> -<li>add a bias node (b)</li> -<li>y = w<sub>1</sub>x<sub>1</sub> + w<sub>2</sub>x<sub>2</sub> + b</li> -<li>output class A if y > 0, otherwise class B</li> -</ul> -<p>Nonlinearity:</p> -<ul> -<li> -<p>sigmoid function <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>σ</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mrow><mn>1</mn><mo>+</mo><msup><mi>e</mi><mrow><mo>−</mo><mi>x</mi></mrow></msup></mrow></mfrac></mrow><annotation encoding="application/x-tex">\sigma(x) = \frac{1}{1+e^{-x}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2484389999999999em;vertical-align:-0.403331em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mbin mtight">+</span><span class="mord mtight"><span class="mord mathdefault mtight">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7026642857142857em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mathdefault mtight">x</span></span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.403331em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></p> -<p><img src="_resources/b149c9058e4548719393205f11b3fd74.png" alt=""></p> -</li> -<li> -<p>ReLU<br> -<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>r</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mrow><mo fence="true">{</mo><mtable rowspacing="0.3599999999999999em" columnalign="left left" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>x</mi></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mtext>if </mtext><mi>x</mi><mo>></mo><mn>0</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mtext>otherwise</mtext></mstyle></mtd></mtr></mtable></mrow></mrow><annotation encoding="application/x-tex">r(x) = \begin{cases} x &\text{if } x > 0 \\ 0 &\text{otherwise} \end{cases}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:3.0000299999999998em;vertical-align:-1.25003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size4">{</span></span><span class="mord"><span class="mtable"><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.69em;"><span style="top:-3.69em;"><span class="pstrut" style="height:3.008em;"></span><span class="mord"><span class="mord mathdefault">x</span></span></span><span style="top:-2.25em;"><span class="pstrut" style="height:3.008em;"></span><span class="mord"><span class="mord">0</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.19em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:1em;"></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.69em;"><span style="top:-3.69em;"><span class="pstrut" style="height:3.008em;"></span><span class="mord"><span class="mord text"><span class="mord">if </span></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">0</span></span></span><span style="top:-2.25em;"><span class="pstrut" style="height:3.008em;"></span><span class="mord"><span class="mord text"><span class="mord">otherwise</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.19em;"><span></span></span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></p> -<p><img src="_resources/f9abb2f4be3640919753fd9709e0e764.png" alt=""></p> -</li> -</ul> -<p>Feedforward network: a multilayer perceptron -- hidden layer(s) between<br> -input and output layers</p> -<p>Every edge has weights, and the network learns by adapting the weights.</p> -<p>It trains both feature extractor and linear model at once.</p> -<h3 id="classification">Classification</h3> -<p>Binary:</p> -<ul> -<li>add a sigmoid to the output layer</li> -<li>the result is then the probability that the result is positive given<br> -the input</li> -</ul> -<p>Multiclass:</p> -<ul> -<li>softmax activation:</li> -<li>for output nodes o: o<sub>i</sub> = w<sup>T</sup>h + b</li> -<li>then result <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>y</mi><mi>i</mi></msub><mo>=</mo><mfrac><mrow><mi>e</mi><mi>x</mi><mi>p</mi><mo stretchy="false">(</mo><msub><mi>o</mi><mi>i</mi></msub><mo stretchy="false">)</mo></mrow><mrow><msub><mo>∑</mo><mi>j</mi></msub><mi>e</mi><mi>x</mi><mi>p</mi><mo stretchy="false">(</mo><msub><mi>o</mi><mi>j</mi></msub><mo stretchy="false">)</mo></mrow></mfrac></mrow><annotation encoding="application/x-tex">y_i = \frac{exp(o_i)}{\sum_{j}exp(o_{j})}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.677227em;vertical-align:-0.667227em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.01em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mop mtight"><span class="mop op-symbol small-op mtight" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.14964714285714287em;"><span style="top:-2.1785614285714283em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.46032428571428574em;"><span></span></span></span></span></span></span><span class="mspace mtight" style="margin-right:0.19516666666666668em;"></span><span class="mord mathdefault mtight">e</span><span class="mord mathdefault mtight">x</span><span class="mord mathdefault mtight">p</span><span class="mopen mtight">(</span><span class="mord mtight"><span class="mord mathdefault mtight">o</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2818857142857143em;"><span></span></span></span></span></span></span><span class="mclose mtight">)</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.485em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">e</span><span class="mord mathdefault mtight">x</span><span class="mord mathdefault mtight">p</span><span class="mopen mtight">(</span><span class="mord mtight"><span class="mord mathdefault mtight">o</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.667227em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></li> -</ul> -<h3 id="dealing-with-loss-gradient-descent-backpropagation">Dealing with loss - gradient descent & backpropagation</h3> -<p>Stochastic gradient descent:</p> -<ol> -<li>Pick random weights w for the whole model</li> -<li>loop: -<ul> -<li>for x in X: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>w</mi><mo>←</mo><mi>w</mi><mo>−</mo><mi>η</mi><mi mathvariant="normal">∇</mi><mi>l</mi><mi>o</mi><mi>s</mi><msub><mi>s</mi><mi>x</mi></msub><mo stretchy="false">(</mo><mi>w</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">w \leftarrow w - \eta\nabla loss_{x}(w)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.66666em;vertical-align:-0.08333em;"></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">η</span><span class="mord">∇</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord"><span class="mord mathdefault">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mclose">)</span></span></span></span></li> -</ul> -</li> -</ol> -<p>For complex models, symbolic and numeric methods for computing the<br> -gradient are expensive. Use backpropagation:</p> -<ul> -<li>break computation down into chain of modules</li> -<li>work out local derivative of each module symbolically (like you<br> -would on paper)</li> -<li>do forward pass for a given input x. compute f(x), remember<br> -intermediate values.</li> -<li>compute local derivatives for x, and multiply to compute global<br> -derivative (because chain rule)</li> -</ul> -<p>For feedforward network, you look at derivative of loss function with<br> -respect to the weights</p> -<h2 id="support-vector-machines-svms">Support vector machines (SVMs)</h2> -<p>Uses a kernel to expand the feature space</p> -<p>Margin: line for which the space to the nearest positive and negative<br> -points is as big as possible.</p> -<p>Support vectors: the points that the margin just touches</p> -<p><img src="_resources/610c2acbde354f9fb8e54b0a9efb4b1f.png" alt=""></p> -<p>The support vector machine tries to find this line.</p> -<p>Objective of SVM:</p> -<ul> -<li>maximize 2x the size of the margin</li> -<li>such that all positive points are either 1 or above 1, negative<br> -points are either at or below 1</li> -<li>hard margin SVM: -<ul> -<li>minimize <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mn>1</mn><mn>2</mn></mfrac><mi mathvariant="normal">∥</mi><mi>w</mi><mi mathvariant="normal">∥</mi></mrow><annotation encoding="application/x-tex">\frac{1}{2} \|w\|</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord">∥</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mord">∥</span></span></span></span></li> -<li>st y<sup>i</sup>(w<sup>T</sup>x<sup>i</sup> + b) ≥ 1 for all x<sup>i</sup></li> -<li>but if data is not linearly separable, cannot satisfy this<br> -constraint</li> -</ul> -</li> -<li>soft margin SVM: -<ul> -<li>minimize <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mn>1</mn><mn>2</mn></mfrac><mi mathvariant="normal">∥</mi><mi>w</mi><mi mathvariant="normal">∥</mi><mo>+</mo><mi>C</mi><msub><mo>∑</mo><mi>i</mi></msub><msub><mi>p</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">\frac{1}{2} \|w\| + C \sum_{i}p_{i}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord">∥</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mord">∥</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.0497100000000001em;vertical-align:-0.29971000000000003em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">C</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>, p<sup>i</sup> ≥ 0</li> -<li>st y<sup>i</sup>(w<sup>T</sup>x<sup>i</sup> + b) ≥ 1 - p<sup>i</sup> for all x<sup>i</sup></li> -</ul> -</li> -</ul> -<p><img src="_resources/36c6819f0f3a4a7381214b8baf48b2f1.png" alt=""></p> -<p>For loss, two options:</p> -<ul> -<li>express everything in terms of w, get rid of constraints: -<ul> -<li>allows gradient descent</li> -<li>good for neural networks</li> -<li>get SVM loss: -<ul> -<li>p<sup>i</sup> = max (0, y<sup>i</sup>(w<sup>T</sup>x<sup>i</sup>+b)-1)</li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mn>1</mn><mn>2</mn></mfrac><mi mathvariant="normal">∥</mi><mi>w</mi><mi mathvariant="normal">∥</mi><mo>+</mo><mi>C</mi><msub><mo>∑</mo><mi>i</mi></msub><mi>max</mi><mo></mo><mrow><mn>0</mn><mo separator="true">,</mo><msup><mi>y</mi><mi>i</mi></msup><mo stretchy="false">(</mo><msup><mi>w</mi><mi>T</mi></msup><msup><mi>x</mi><mi>i</mi></msup><mo>+</mo><mi>b</mi><mo stretchy="false">)</mo><mo>−</mo><mn>1</mn></mrow></mrow><annotation encoding="application/x-tex">\frac{1}{2} \|w\| + C\sum_{i}\max{0, y^i(w^{T}x^{i}+b)-1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord">∥</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mord">∥</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.141041em;vertical-align:-0.29971000000000003em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">C</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">max</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">b</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord">1</span></span></span></span></span></li> -<li>no constraints</li> -</ul> -</li> -</ul> -</li> -<li>express everything in terms of support vectors, get rid of w -<ul> -<li>doesn't allow error backpropagation</li> -<li>allows the kernel trick: -<ul> -<li>if you have an algorithm which operates only on dot product<br> -of instances, you can substitute the dot product for a<br> -kernel function.</li> -<li>kernel function k(x<sup>i</sup>, x<sup>j</sup>) computes dot product of x<sup>i</sup><br> -and x<sup>j</sup> in a high-dimensional feature space, without<br> -explicitly computing the features themselves</li> -<li>polynomial kernel: k(a,b) = (a<sup>T</sup> b + 1)<sup>d</sup> -<ul> -<li>feature space for d=2: all squares, all cross products,<br> -all single features</li> -<li>feature space for d=3: all cubes and squares, all<br> -two-way and three-way cross products, all single<br> -features</li> -</ul> -</li> -<li>RBF kernel: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi><mo stretchy="false">(</mo><mi>a</mi><mo separator="true">,</mo><mi>b</mi><mo stretchy="false">)</mo><mo>=</mo><mi>e</mi><mi>x</mi><mi>p</mi><mo stretchy="false">(</mo><mo>−</mo><mi>γ</mi><mi mathvariant="normal">∥</mi><mi>a</mi><mo>−</mo><mi>b</mi><mi mathvariant="normal">∥</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">k(a,b) = exp(-\gamma \|a-b\|)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03148em;">k</span><span class="mopen">(</span><span class="mord mathdefault">a</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">b</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">e</span><span class="mord mathdefault">x</span><span class="mord mathdefault">p</span><span class="mopen">(</span><span class="mord">−</span><span class="mord mathdefault" style="margin-right:0.05556em;">γ</span><span class="mord">∥</span><span class="mord mathdefault">a</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">b</span><span class="mord">∥</span><span class="mclose">)</span></span></span></span>, feature space<br> -is infinite dimensional</li> -</ul> -</li> -<li>have to optimise under constraints: Lagrange multipliers -<ul> -<li>minimize f(a) such that g<sub>i</sub>(a) ≥ 0 for i ∈ [1, <em>n</em>]</li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>L</mi><mo stretchy="false">(</mo><mi>a</mi><mo separator="true">,</mo><msub><mi>α</mi><mn>1</mn></msub><mo separator="true">,</mo><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mo separator="true">,</mo><msub><mi>α</mi><mi>n</mi></msub><mo stretchy="false">)</mo><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo><mo>−</mo><msub><mo>∑</mo><mi>i</mi></msub><msub><mi>α</mi><mi>i</mi></msub><msub><mi>g</mi><mi>i</mi></msub><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">L(a, \alpha_{1}, ..., \alpha_n) = f(a) - \sum_{i} \alpha_{i}g_{i}(a)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">L</span><span class="mopen">(</span><span class="mord mathdefault">a</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.0037em;">α</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.0037em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">.</span><span class="mord">.</span><span class="mord">.</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.0037em;">α</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.0037em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathdefault">a</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.0497100000000001em;vertical-align:-0.29971000000000003em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.0037em;">α</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.0037em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">g</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">a</span><span class="mclose">)</span></span></span></span></li> -<li>solve <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∇</mi><mi>L</mi><mo>=</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">\nabla L = 0</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord">∇</span><span class="mord mathdefault">L</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">0</span></span></span></span> such that α<sub>I</sub> ≥ 0 for i ∈ [1, <em>n</em>]</li> -</ul> -</li> -<li>result: -<ul> -<li>minimize <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>−</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><msub><mo>∑</mo><mi>i</mi></msub><msub><mo>∑</mo><mi>j</mi></msub><msup><mi>α</mi><mi>i</mi></msup><msup><mi>α</mi><mi>j</mi></msup><msup><mi>y</mi><mi>i</mi></msup><msup><mi>y</mi><mi>j</mi></msup><mi>k</mi><mo stretchy="false">(</mo><msup><mi>x</mi><mi>i</mi></msup><mo separator="true">,</mo><msup><mi>x</mi><mi>j</mi></msup><mo stretchy="false">)</mo><mo>+</mo><msub><mo>∑</mo><mi>i</mi></msub><msup><mi>α</mi><mi>i</mi></msup></mrow><annotation encoding="application/x-tex">-\frac{1}{2} \sum_i \sum_j \alpha^i \alpha^j y^i y^j k(x^i, x^j) + \sum_i \alpha^i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.280926em;vertical-align:-0.43581800000000004em;"></span><span class="mord">−</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.43581800000000004em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.0037em;">α</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.0037em;">α</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.03148em;">k</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.124374em;vertical-align:-0.29971000000000003em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.0037em;">α</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span></span></span></span></span></span></span></li> -<li>such that 0 ≤ α<sup>i</sup> ≤ C; <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo>∑</mo><mi>i</mi></msub><msup><mi>α</mi><mi>i</mi></msup><msup><mi>y</mi><mi>i</mi></msup><mo>=</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">\sum_i \alpha^i y^i = 0</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.124374em;vertical-align:-0.29971000000000003em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.0037em;">α</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">0</span></span></span></span></li> -</ul> -</li> -</ul> -</li> -</ul> -<h2 id="summary-of-classification-loss-functions">Summary of classification loss functions</h2> -<p><img src="_resources/b08d80ef6b5241578c3d432a466db7ea.png" alt=""></p> -</div></div> - </body> - </html> diff --git a/content/ml-notes/_resources/36c6819f0f3a4a7381214b8baf48b2f1.png b/content/ml-notes/Linear models/36c6819f0f3a4a7381214b8baf48b2f1.png Binary files differ. diff --git a/content/ml-notes/_resources/610c2acbde354f9fb8e54b0a9efb4b1f.png b/content/ml-notes/Linear models/610c2acbde354f9fb8e54b0a9efb4b1f.png Binary files differ. diff --git a/content/ml-notes/_resources/b08d80ef6b5241578c3d432a466db7ea.png b/content/ml-notes/Linear models/b08d80ef6b5241578c3d432a466db7ea.png Binary files differ. diff --git a/content/ml-notes/_resources/b149c9058e4548719393205f11b3fd74.png b/content/ml-notes/Linear models/b149c9058e4548719393205f11b3fd74.png Binary files differ. diff --git a/content/ml-notes/_resources/f9abb2f4be3640919753fd9709e0e764.png b/content/ml-notes/Linear models/f9abb2f4be3640919753fd9709e0e764.png Binary files differ. diff --git a/content/ml-notes/Linear models/index.md b/content/ml-notes/Linear models/index.md @@ -0,0 +1,310 @@ ++++ +title = 'Linear models' +template = 'page-math.html' ++++ +# Linear models +## Defining a model + +1 feature x: $f_{w,b}(x) = wx + b$ + +2 features x<sub>1</sub>, x<sub>2</sub>: $f_{w_1,w_2, b}(x_1, x_2) = w_1 x_1 + w_2 x_2 + b$ + +Generally, + +$ +\begin{aligned} +f_{w, b}(x) &= w_1 x_1 + w_2 x_2 + w_3 x_3 + ... + b \\ + &= w^T x + b \\ + &= \sum_{i} w_i x_i \\ + &= \|w\| \|x\| \cos{\alpha} +\end{aligned} +$ + +with w is vector w<sub>1</sub> to w<sub>n</sub>, x is x<sub>1</sub> to x<sub>n</sub> + +with $w = \begin{pmatrix} w_1 \\ \dots \\ w_n \end{pmatrix}$ and $x = \begin{pmatrix} x_1 \\ \dots \\ x_n \end{pmatrix}$ +## But which model fits best? + +Define loss function, then search for model whihc best fits loss +function. + +### Mean squared error loss +$\text{loss}_{x,y}(p) = \frac{1}{n} \sum_j (f_p (x<sup>j) - y</sup>j)^2$ + +Defines residuals that show how far from mean (?) + +Why square? Make everything positive, but also penalize outliers + +### Optimization & searching + +$\hat p = \argmin_p \text{loss}_{x,y}(p)$ + +To escape local minima: add randomness, add multiple models + +To converge faster: combine known good models (breeding), inspect the +local neighbourhood + +#### Black box optimisation + +Simple, only need to compute loss function, and a few more TODO +things + +##### Random search + +Start with random point p in model space. + +``` {.example} +loop: + pick random point p' close to p + if loss(p') < loss(p): + p <- p' +``` + +You need to define what 'close to' means though. + +Convexity: the property of having one minimum (i.e. if for any +two points, the line between those points is above the function) + +The issue with random search is it can get stuck in a local +minimum. In many situations, local minima are fine, we don't +*always* need an algorithm for a guaranteed global minimum. + +In discrete model spaces (which have a more graph-like +structure), you need to figure out a transition function. + +##### Simulated annealing + +'Improved' random search. + +``` {.example} +pick random point p' close to p +loop: +pick random point p' close to p + +..etc TODO the lecturer was going fast as fuck +``` + +##### Parallel search + +Can also do these searches in parallel, or even parallel with +some communication between searches. + +Population methods, eg. evolutionary algorithms: + +``` {.example} +start with population of k models +loop: + rank population by loss + remove the half with worst loss + "breed" new population of k models + optionally, add a little noise to each child +``` + +##### Branching search + +Coming closer to gradient descent: + +``` {.example} +pick random point p in model spce +loop: + pick k random points {p_i} close to p + p' <- argmin_p_i loss(p_i) + if +TODO again he switched the fuckin slide +``` + +#### Gradient descent + +Good, but doesn't help with global/local minima. + +In 2D space, the tangent line is the slope. In higher spaces, the +plane/hyperplane is the gradient (analog of slope). + +Gradient: +$\nabla f(x, y) = (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y})$ + +Tangent hyperplane: $g(x) = \nabla f(p)^T x + c$ + +Gives best approximation at point p. + +The direction of steepest ascent: + +$ +\begin{aligned} +g(x) &= w^T x \\ + &= \|w\| \|x\| \cos{\alpha} \\ + \|x\| &= 1 \\ + &\rightarrow ||w|| \cos{\alpha} +\end{aligned} +$ + +The angle is maximised when cos(α) is 1, so α is 0. So the gradient +is the direction of steepest ascent + +``` {.example} +pick a random point p in model space +loop: + p <- p - \eta \nabla loss(p) +``` + +Usually set η (step size, learning rate) between 0.0001 and 0.1. + +Take partial derivatives of loss function, then calculate them. + +Cons: + +- only works for continuous model spaces, with smooth loss + functions, for which we can work out the gradient +- does not escape local minima + +Pros: + +- very fast, low memory +- very accurate + +If the model is linear, you don't actually need to search, you +could just set partial derivatives equal to zero and solve. + +Sometimes the loss function shouldn't be the same as the evaluation +function, because you might not get a smooth function. + +#### Classification losses +##### Least-squares loss +Apply the least-squares calculation, you get a smooth function. +Then you can do gradient descent. + +## Neural networks (feedforward) + +### Overview + +Learns a feature extractor together with the classifier + +Neuron has inputs (dendrites) and one output (axon) The simplified +version for computers is the \'perceptron\': + +- inputs are features (x) +- multiply each input with a weight (w) +- add a bias node (b) +- y = w<sub>1</sub>x<sub>1</sub> + w<sub>2</sub>x<sub>2</sub> + b +- output class A if y \> 0, otherwise class B + +Nonlinearity: + +- sigmoid function $\sigma(x) = \frac{1}{1+e^{-x}}$ + + ![](b149c9058e4548719393205f11b3fd74.png) + +- ReLU + $r(x) = \begin{cases} x &\text{if } x > 0 \\ 0 &\text{otherwise} \end{cases}$ + + ![](f9abb2f4be3640919753fd9709e0e764.png) + +Feedforward network: a multilayer perceptron -- hidden layer(s) between +input and output layers + +Every edge has weights, and the network learns by adapting the weights. + +It trains both feature extractor and linear model at once. + +### Classification + +Binary: + +- add a sigmoid to the output layer +- the result is then the probability that the result is positive given + the input + +Multiclass: + +- softmax activation: +- for output nodes o: o<sub>i</sub> = w<sup>T</sup>h + b +- then result $y_i = \frac{exp(o_i)}{\sum_{j}exp(o_{j})}$ + +### Dealing with loss - gradient descent & backpropagation + +Stochastic gradient descent: + +1. Pick random weights w for the whole model +2. loop: + - for x in X: $w \leftarrow w - \eta\nabla loss_{x}(w)$ + +For complex models, symbolic and numeric methods for computing the +gradient are expensive. Use backpropagation: + +- break computation down into chain of modules +- work out local derivative of each module symbolically (like you + would on paper) +- do forward pass for a given input x. compute f(x), remember + intermediate values. +- compute local derivatives for x, and multiply to compute global + derivative (because chain rule) + +For feedforward network, you look at derivative of loss function with +respect to the weights + +## Support vector machines (SVMs) + +Uses a kernel to expand the feature space + +Margin: line for which the space to the nearest positive and negative +points is as big as possible. + +Support vectors: the points that the margin just touches + +![](610c2acbde354f9fb8e54b0a9efb4b1f.png) + +The support vector machine tries to find this line. + +Objective of SVM: + +- maximize 2x the size of the margin +- such that all positive points are either 1 or above 1, negative + points are either at or below 1 +- hard margin SVM: + - minimize $\frac{1}{2} \|w\|$ + - st y<sup>i</sup>(w<sup>T</sup>x<sup>i</sup> + b) ≥ 1 for all x<sup>i</sup> + - but if data is not linearly separable, cannot satisfy this + constraint +- soft margin SVM: + - minimize $\frac{1}{2} \|w\| + C \sum_{i}p_{i}$, p<sup>i</sup> ≥ 0 + - st y<sup>i</sup>(w<sup>T</sup>x<sup>i</sup> + b) ≥ 1 - p<sup>i</sup> for all x<sup>i</sup> + +![](36c6819f0f3a4a7381214b8baf48b2f1.png) + +For loss, two options: + +- express everything in terms of w, get rid of constraints: + - allows gradient descent + - good for neural networks + - get SVM loss: + - p<sup>i</sup> = max (0, y<sup>i</sup>(w<sup>T</sup>x<sup>i</sup>+b)-1) + - $\frac{1}{2} \|w\| + C\sum_{i}\max{0, y<sup>i(w</sup>{T}x^{i}+b)-1}$ + - no constraints +- express everything in terms of support vectors, get rid of w + - doesn\'t allow error backpropagation + - allows the kernel trick: + - if you have an algorithm which operates only on dot product + of instances, you can substitute the dot product for a + kernel function. + - kernel function k(x<sup>i</sup>, x<sup>j</sup>) computes dot product of x<sup>i</sup> + and x<sup>j</sup> in a high-dimensional feature space, without + explicitly computing the features themselves + - polynomial kernel: k(a,b) = (a<sup>T</sup> b + 1)<sup>d</sup> + - feature space for d=2: all squares, all cross products, + all single features + - feature space for d=3: all cubes and squares, all + two-way and three-way cross products, all single + features + - RBF kernel: $k(a,b) = exp(-\gamma \|a-b\|)$, feature space + is infinite dimensional + - have to optimise under constraints: Lagrange multipliers + - minimize f(a) such that g<sub>i</sub>(a) ≥ 0 for i ∈ \[1, *n*\] + - $L(a, \alpha_{1}, ..., \alpha_n) = f(a) - \sum_{i} \alpha_{i}g_{i}(a)$ + - solve $\nabla L = 0$ such that α<sub>I</sub> ≥ 0 for i ∈ \[1, *n*\] + - result: + - minimize $-\frac{1}{2} \sum_i \sum_j \alpha^i \alpha<sup>j y</sup>i y<sup>j k(x</sup>i, x^j) + \sum_i \alpha^i$ + - such that 0 ≤ α<sup>i</sup> ≤ C; $\sum_i \alpha<sup>i y</sup>i = 0$ + +## Summary of classification loss functions + +![](b08d80ef6b5241578c3d432a466db7ea.png) diff --git a/content/ml-notes/Matrix models.html b/content/ml-notes/Matrix models.html @@ -1,130 +0,0 @@ - - <!DOCTYPE html> - <html> - <head> - <meta charset="UTF-8"> - - <title>Matrix models</title> - <link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head> - <body> - -<div id="rendered-md"><h1 id="matrix-models">Matrix models</h1> -<nav class="table-of-contents"><ul><li><a href="#matrix-models">Matrix models</a><ul><li><a href="#recommender-systems">Recommender systems</a></li><li><a href="#matrix-factorization">Matrix factorization</a><ul><li><a href="#bias-control">Bias control</a></li><li><a href="#the-cold-start-problem">The 'cold start' problem</a></li></ul></li><li><a href="#graph-models">Graph models</a></li><li><a href="#validating-embedding-models">Validating embedding models</a></li></ul></li></ul></nav><h2 id="recommender-systems">Recommender systems</h2> -<p>using the example of movie recommendations for users.</p> -<p>Recommendation using only explicit info:</p> -<ul> -<li>we have no representation for users or movies, only 'atomic' objects that we want to compare for similarity</li> -<li>we saw this with word embedding, represented each word by its own vector and optimised values of vectors</li> -</ul> -<p>embedding models:</p> -<ul> -<li>train length k embedding for each user, and one for each movie, and arrange into matrices U and M.</li> -<li>the matrices are parameters of the model</li> -</ul> -<p><img src="_resources/caf202f13ca04b06a4a3d1e9e8a3c702.png" alt="9277af676d9256fb745e41d8cf59dcd1.png"></p> -<p>to make a prediction, define that dot product of user vector and movie vector should be high if user would like the movie.<br> -this is a choice, but it's a logical one.</p> -<p>multiplying U<sup>T</sup> with M gives matrix of rating predictions for every user/movie pair.<br> -so we want to take rating matrix R, and decompose it as product of two factors ("matrix factorization/decomposition")</p> -<h2 id="matrix-factorization">Matrix factorization</h2> -<p>You get an optimisation problem: choose U and M st U<sup>T</sup> M ≈ R.</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable rowspacing="0.24999999999999992em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><munder><mo><mi mathvariant="normal">arg min</mi><mo></mo></mo><mrow><mi>U</mi><mo separator="true">,</mo><mi>M</mi></mrow></munder></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mi mathvariant="normal">∥</mi><mi>R</mi><mo>−</mo><msup><mi>U</mi><mi>T</mi></msup><mi>M</mi><msubsup><mi mathvariant="normal">∥</mi><mi>F</mi><mn>2</mn></msubsup></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo>∑</mo><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi></mrow></munder><mo stretchy="false">(</mo><msub><mi>R</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>−</mo><mo stretchy="false">[</mo><msup><mi>U</mi><mi>T</mi></msup><mi>M</mi><msub><mo stretchy="false">]</mo><mrow><mi>i</mi><mi>j</mi></mrow></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo>∑</mo><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi></mrow></munder><mo stretchy="false">(</mo><msub><mi>R</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>−</mo><msubsup><mi>u</mi><mi>i</mi><mi>T</mi></msubsup><msub><mi>m</mi><mi>j</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{aligned} -\argmin_{U,M} &= \| R - U^T M \|_{F}^{2} \\ -&= \sum_{i,j} (R_{ij} - \lbrack U^T M \rbrack_{ij})^2 \\ -&= \sum_{i,j} (R_{ij} - u_{i}^{T} m_j)^2 -\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:7.793774000000001em;vertical-align:-3.6468870000000004em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:4.146887em;"><span style="top:-6.305561000000001em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.6678600000000001em;"><span style="top:-2.161229em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.10903em;">U</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.10903em;">M</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">i</span><span class="mord mathrm">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.074879em;"><span></span></span></span></span></span></span></span><span style="top:-3.8806770000000004em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"></span></span><span style="top:-1.1168949999999997em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:3.6468870000000004em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:4.146887em;"><span style="top:-6.305561000000001em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">∥</span><span class="mord mathdefault" style="margin-right:0.00773em;">R</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span><span class="mord"><span class="mord">∥</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">F</span></span></span></span><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.8806770000000004em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8723309999999997em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.413777em;"><span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.00773em;">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.00773em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mopen">[</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span><span class="mclose"><span class="mclose">]</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-1.1168949999999997em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8723309999999997em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.413777em;"><span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.00773em;">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.00773em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault">m</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:3.6468870000000004em;"><span></span></span></span></span></span></span></span></span></span></span></p> -<p>but, R is not complete, for most user/movie pairs we don't know the rating.<br> -so, sometimes it's better to only optimise for known ratings:</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable rowspacing="0.24999999999999992em" columnalign="right" columnspacing=""><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><munder><mo><mi mathvariant="normal">arg min</mi><mo></mo></mo><mrow><mi>U</mi><mo separator="true">,</mo><mi>M</mi></mrow></munder><munder><mo>∑</mo><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi><mo>∈</mo><msub><mi>R</mi><mtext>known</mtext></msub></mrow></munder><mo stretchy="false">(</mo><msub><mi>R</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>−</mo><msubsup><mi>u</mi><mi>i</mi><mi>T</mi></msubsup><msub><mi>m</mi><mi>j</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{aligned} \argmin_{U,M} \sum_{i,j \in R_{\text{known}}} (R_{ij} - u_{i}^{T} m_j)^2 \end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:2.780449em;vertical-align:-1.1402245000000002em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.6402244999999998em;"><span style="top:-3.6402245em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.6678600000000001em;"><span style="top:-2.161229em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.10903em;">U</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.10903em;">M</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">i</span><span class="mord mathrm">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.074879em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8556639999999998em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span><span class="mrel mtight">∈</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.00773em;">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em;"><span style="top:-2.3487714285714287em;margin-left:-0.00773em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">known</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15122857142857138em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.430444em;"><span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.00773em;">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:-0.00773em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault">m</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.1402245000000002em;"><span></span></span></span></span></span></span></span></span></span></span></p> -<p>Alternating least squares - alternative to gradient descent</p> -<ul> -<li>idea: if we know M, computing U is easy, and vice versa</li> -<li>so, starting with random U and m: -<ul> -<li>fix M, compute new U</li> -<li>fix U, compute new M</li> -</ul> -</li> -</ul> -<p>Stochastic gradient descent is usually more practical.</p> -<p>If we only have positive ratings, we have two options:</p> -<ul> -<li>ensure that U<sup>T</sup> M always represent probabilities; maximise probability of data.</li> -<li>sample random movie user pairs as negative training samples (i.e., assume that users don't really know)</li> -</ul> -<h3 id="bias-control">Bias control</h3> -<ul> -<li>control for user bias -<ul> -<li>ratings depend between users, they're subjective (no shit)</li> -<li>if we can explicitly model bias of a user, the matrix factorisation only needs to predict how much a user would deviate from their average rating for a particular movie</li> -</ul> -</li> -<li>control for movie bias -<ul> -<li>some movies are universally hated, some are universally loved</li> -<li>unpopular opinion: The Rise of Skywalker wasn't really that bad</li> -</ul> -</li> -<li>control for temporal bias -<ul> -<li>data is not stable over time</li> -<li>e.g. meaning of specific ratings can change</li> -<li></li> -</ul> -</li> -</ul> -<p>For user/movie biases, use an additive scalar, which is learned along with embeddings.<br> -One for each user, one for each movie, and one general bias over all ratings.</p> -<p>Make the biases and embeddings time dependent.<br> -e.g. cut time into a small number of chunks, learn separate embedding for each chunk.</p> -<h3 id="the-cold-start-problem">The 'cold start' problem</h3> -<p>When a user/movie is added to the database, we have no ratings, so matrix factorization can't build an embedding.<br> -Have to rely on implicit feedback and side information.</p> -<p>Using implicit "likes" (e.g. movies watched but not rated, movies browsed, movies hovered over...)</p> -<ul> -<li>add separate movie embedding x, compute second user embedding -<ul> -<li>this is sum of x-embeddings of all movies user x has liked</li> -</ul> -</li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msubsup><mi>u</mi><mi>i</mi><mrow><mi>i</mi><mi>m</mi><mi>p</mi></mrow></msubsup><mo>=</mo><msub><mo>∑</mo><mrow><mi>j</mi><mo>∈</mo><mi>N</mi><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msub><msub><mi>x</mi><mi>j</mi></msub></mrow><annotation encoding="application/x-tex">u_{i}^{imp} = \sum_{j \in N(i)} x_j</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.219436em;vertical-align:-0.276864em;"></span><span class="mord"><span class="mord mathdefault">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.942572em;"><span style="top:-2.4231360000000004em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.1809080000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span><span class="mord mathdefault mtight">m</span><span class="mord mathdefault mtight">p</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.276864em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.22471em;vertical-align:-0.47471em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.22528999999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span><span class="mrel mtight">∈</span><span class="mord mathdefault mtight" style="margin-right:0.10903em;">N</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight">i</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.47471em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.311664em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></li> -<li>add this implicit feedback embedding to existing one before computing dot product</li> -</ul> -<p>Using side info (age, login, browser resolution...)</p> -<ul> -<li>for simplification, assume all features are binary</li> -<li>assign each feature an embedding, sum over all features that apply to user, creating third user embedding</li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msubsup><mi>u</mi><mi>i</mi><mrow><mi>s</mi><mi>i</mi><mi>d</mi><mi>e</mi></mrow></msubsup><mo>=</mo><msub><mo>∑</mo><mrow><mi>f</mi><mo>∈</mo><mi>A</mi><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msub><msub><mi>y</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">u_{i}^{side} = \sum_{f \in A(i)} y_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.107772em;vertical-align:-0.258664em;"></span><span class="mord"><span class="mord mathdefault">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.849108em;"><span style="top:-2.441336em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">s</span><span class="mord mathdefault mtight">i</span><span class="mord mathdefault mtight">d</span><span class="mord mathdefault mtight">e</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.258664em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.22471em;vertical-align:-0.47471em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.22528999999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.10764em;">f</span><span class="mrel mtight">∈</span><span class="mord mathdefault mtight">A</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight">i</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.47471em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361079999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.10764em;">f</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span></li> -<li>then you add this before computing dot product</li> -</ul> -<h2 id="graph-models">Graph models</h2> -<p>graph convolutional network:</p> -<ul> -<li>start with node embeddings N₀</li> -<li>compute new embedding for each node: average of neighbor embeddings and its own - AN₀</li> -<li>multiply by weight matrix W, add activation - N₁ = σ(AN₀W)</li> -<li>if you apply this multiple times, you get a multi-layered structure</li> -</ul> -<p>Link prediction: assume graph is incomplete, try to predict which nodes should be linked.<br> -We can treat this like a matrix factorization problem.<br> -GCN: perform convolutions on embeddings, multiply them out to generate predictions, compare to training data, and backpropagate loss</p> -<p>Node classification: for each node, try to predict a label.<br> -With node embeddings, we can use a regular classifier, but how do we get good embeddings?<br> -GCN for classification: ensure embedding size of last layer is equal to number of classes, apply softmax activation to embeddings, interpret them as probabilities over classes</p> -<p>GCN issues:</p> -<ul> -<li>depth is a problem, high connectivity diffuses info</li> -<li>usually full-batch, can't easily break up graph into minibatches</li> -<li>pooling not selective, all neighbors mixed equally before weights are applied</li> -</ul> -<h2 id="validating-embedding-models">Validating embedding models</h2> -<p>inductive vs transductive learning: in transduction, learning is allowed to see features of all data, but labels of only training data.<br> -embedding models only support transductive learning; if we don't know objects until after training, won't have embedding vectors.<br> -like with graph models, we need to know the <em>whole</em> graph.</p> -<p>to evaluate matrix factorization, give training alg all users and movies, but withhold some ratings.<br> -same for links in a graph.<br> -for node classification, give the alg the whole graph, and table linking node ids to labels, but withhold some labels.</p> -</div></div> - </body> - </html> diff --git a/content/ml-notes/_resources/caf202f13ca04b06a4a3d1e9e8a3c702.png b/content/ml-notes/Matrix models/caf202f13ca04b06a4a3d1e9e8a3c702.png Binary files differ. diff --git a/content/ml-notes/Matrix models/index.md b/content/ml-notes/Matrix models/index.md @@ -0,0 +1,112 @@ ++++ +title = 'Matrix models' +template = 'page-math.html' ++++ +# Matrix models + +## Recommender systems +using the example of movie recommendations for users. + +Recommendation using only explicit info: +- we have no representation for users or movies, only 'atomic' objects that we want to compare for similarity +- we saw this with word embedding, represented each word by its own vector and optimised values of vectors + +embedding models: +- train length k embedding for each user, and one for each movie, and arrange into matrices U and M. +- the matrices are parameters of the model + +![9277af676d9256fb745e41d8cf59dcd1.png](caf202f13ca04b06a4a3d1e9e8a3c702.png) + +to make a prediction, define that dot product of user vector and movie vector should be high if user would like the movie. +this is a choice, but it's a logical one. + +multiplying U<sup>T</sup> with M gives matrix of rating predictions for every user/movie pair. +so we want to take rating matrix R, and decompose it as product of two factors ("matrix factorization/decomposition") + +## Matrix factorization +You get an optimisation problem: choose U and M st U<sup>T</sup> M ≈ R. + +$\begin{aligned} +\argmin_{U,M} &= \| R - U^T M \|_{F}^{2} \\ +&= \sum_{i,j} (R_{ij} - \lbrack U^T M \rbrack_{ij})^2 \\ +&= \sum_{i,j} (R_{ij} - u_{i}<sup>{T} m_j)</sup>2 +\end{aligned}$ + +but, R is not complete, for most user/movie pairs we don't know the rating. +so, sometimes it's better to only optimise for known ratings: + +$\begin{aligned} \argmin_{U,M} \sum_{i,j \in R_{\text{known}}} (R_{ij} - u_{i}<sup>{T} m_j)</sup>2 \end{aligned}$ + +Alternating least squares - alternative to gradient descent +- idea: if we know M, computing U is easy, and vice versa +- so, starting with random U and m: + - fix M, compute new U + - fix U, compute new M + +Stochastic gradient descent is usually more practical. + +If we only have positive ratings, we have two options: +- ensure that U<sup>T</sup> M always represent probabilities; maximise probability of data. +- sample random movie user pairs as negative training samples (i.e., assume that users don't really know) + +### Bias control +- control for user bias + - ratings depend between users, they're subjective (no shit) + - if we can explicitly model bias of a user, the matrix factorisation only needs to predict how much a user would deviate from their average rating for a particular movie +- control for movie bias + - some movies are universally hated, some are universally loved + - unpopular opinion: The Rise of Skywalker wasn't really that bad +- control for temporal bias + - data is not stable over time + - e.g. meaning of specific ratings can change + - +For user/movie biases, use an additive scalar, which is learned along with embeddings. +One for each user, one for each movie, and one general bias over all ratings. + +Make the biases and embeddings time dependent. +e.g. cut time into a small number of chunks, learn separate embedding for each chunk. + +### The 'cold start' problem +When a user/movie is added to the database, we have no ratings, so matrix factorization can't build an embedding. +Have to rely on implicit feedback and side information. + +Using implicit "likes" (e.g. movies watched but not rated, movies browsed, movies hovered over...) +- add separate movie embedding x, compute second user embedding + - this is sum of x-embeddings of all movies user x has liked +- $u_{i}^{imp} = \sum_{j \in N(i)} x_j$ +- add this implicit feedback embedding to existing one before computing dot product + +Using side info (age, login, browser resolution...) +- for simplification, assume all features are binary +- assign each feature an embedding, sum over all features that apply to user, creating third user embedding +- $u_{i}^{side} = \sum_{f \in A(i)} y_f$ +- then you add this before computing dot product + +## Graph models +graph convolutional network: +- start with node embeddings N₀ +- compute new embedding for each node: average of neighbor embeddings and its own - AN₀ +- multiply by weight matrix W, add activation - N₁ = σ(AN₀W) +- if you apply this multiple times, you get a multi-layered structure + +Link prediction: assume graph is incomplete, try to predict which nodes should be linked. +We can treat this like a matrix factorization problem. +GCN: perform convolutions on embeddings, multiply them out to generate predictions, compare to training data, and backpropagate loss + +Node classification: for each node, try to predict a label. +With node embeddings, we can use a regular classifier, but how do we get good embeddings? +GCN for classification: ensure embedding size of last layer is equal to number of classes, apply softmax activation to embeddings, interpret them as probabilities over classes + +GCN issues: +- depth is a problem, high connectivity diffuses info +- usually full-batch, can't easily break up graph into minibatches +- pooling not selective, all neighbors mixed equally before weights are applied + +## Validating embedding models +inductive vs transductive learning: in transduction, learning is allowed to see features of all data, but labels of only training data. +embedding models only support transductive learning; if we don't know objects until after training, won't have embedding vectors. +like with graph models, we need to know the _whole_ graph. + +to evaluate matrix factorization, give training alg all users and movies, but withhold some ratings. +same for links in a graph. +for node classification, give the alg the whole graph, and table linking node ids to labels, but withhold some labels. diff --git a/content/ml-notes/Methodology.html b/content/ml-notes/Methodology.html @@ -1,298 +0,0 @@ - - <!DOCTYPE html> - <html> - <head> - <meta charset="UTF-8"> - - <title>Methodology</title> - <link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head> - <body> - -<div id="rendered-md"><h1 id="ml-methodology">ML: Methodology</h1> -<nav class="table-of-contents"><ul><li><a href="#ml-methodology">ML: Methodology</a><ul><li><a href="#performing-an-experiment">Performing an experiment</a><ul><li><a href="#what-if-you-need-to-test-many-models">What if you need to test many models?</a></li><li><a href="#the-modern-recipe">The modern recipe</a></li><li><a href="#cross-validation">Cross-validation</a></li></ul></li><li><a href="#what-to-report">What to report</a><ul><li><a href="#classification">Classification</a><ul><li><a href="#whats-a-good-error-5">What's a good error (5%)?</a></li><li><a href="#performance-metrics">Performance metrics</a><ul><li><a href="#confusion-matrix-contingency-table">Confusion matrix (contingency table)</a></li><li><a href="#precision-and-recall">Precision and recall</a></li></ul></li></ul></li><li><a href="#regression">Regression</a></li><li><a href="#errors-confidence-intervals">Errors & confidence intervals</a></li></ul></li><li><a href="#the-no-free-lunch-theorem-and-principle">The no-free-lunch theorem and principle</a></li><li><a href="#cleaning-your-data">Cleaning your data</a><ul><li><a href="#missing-data">Missing data</a></li><li><a href="#outliers">Outliers</a></li><li><a href="#class-imbalance">Class imbalance</a></li></ul></li><li><a href="#choosing-features">Choosing features</a></li><li><a href="#normalisation-standardisation">Normalisation & standardisation</a><ul><li><a href="#normalisation">Normalisation</a></li><li><a href="#standardisation">Standardisation</a></li><li><a href="#whitening">Whitening</a></li></ul></li><li><a href="#dimensionality-reduction">Dimensionality reduction</a></li></ul></li></ul></nav><h2 id="performing-an-experiment">Performing an experiment</h2> -<p>Never judge your performance on the training data (or you'll fail the<br> -course and life).</p> -<p>The proportion of training to test data is not important, the <em>absolute<br> -size</em> of the test data is. Aim to have min 500 examples in<br> -test data (ideal 10 000 or more).</p> -<h3 id="what-if-you-need-to-test-many-models">What if you need to test many models?</h3> -<p>e.g. k-nearest neighbours, which classifies a point based on the<br> -classification of its k nearest neighbours.</p> -<h3 id="the-modern-recipe">The modern recipe</h3> -<ol> -<li>Split data into train, validation, and test data. Sample randomly,<br> -at least 500 examples in test set.</li> -<li>Choose model, hyperparameters, etc. only based on the training set.<br> -Test on validation. Don't use test set for anything.</li> -<li>State the hypothesis.</li> -<li>During the final run, train on training + validation data.</li> -<li>Test hypothesis <em>once</em> on the test data. Usually at the<br> -very end of the project, when you write report/paper.</li> -</ol> -<p>Don't re-use test data:</p> -<ul> -<li>you'd pick the wrong model</li> -<li>it would inflate your performance estimate</li> -</ul> -<p>For temporal data, you'll probably want to keep the data ordered by<br> -time.</p> -<p>Which hyperparameters to try?</p> -<ul> -<li>trial-and-error (via intuition)</li> -<li>grid search: define finite set of values for each hyperparam, try<br> -all combinations</li> -<li>random search</li> -</ul> -<h3 id="cross-validation">Cross-validation</h3> -<p>You still split your data, but every run, a different slice becomes the<br> -validation data. Then you average the results for the final result.</p> -<p>If it's temporal data, you might want to do walk-forward validation,<br> -where you always expand your data slices forward in time.</p> -<h2 id="what-to-report">What to report</h2> -<h3 id="classification">Classification</h3> -<h4 id="whats-a-good-error-5">What's a good error (5%)?</h4> -<p>It depends, just like in every class so far:</p> -<ul> -<li>Class imbalance: how much more likely is a positive example than a<br> -negative example?</li> -<li>Cost imbalance: how much worse is mislabeled positive than<br> -mislabeled negative? e.g. how bad is it to mark a real email as spam<br> -vs letting a spam message into your inbox?</li> -</ul> -<h4 id="performance-metrics">Performance metrics</h4> -<h5 id="confusion-matrix-contingency-table">Confusion matrix (contingency table)</h5> -<p>Metrics for a single classifier.</p> -<p>The margins give four totals: actual number of each class present in<br> -data, number of each class predicted by the classifier.</p> -<p><img src="_resources/6fa59a4013a0431a9561c4b00b29e8b9.png" alt=""></p> -<h5 id="precision-and-recall">Precision and recall</h5> -<p>Also for a single classifier.</p> -<ul> -<li>Precision: proportion of returned positives that are<br> -<em>actually</em> positive</li> -<li>Recall: proportion of existing positives that the classifier found</li> -</ul> -<p><img src="_resources/e395f5797bc3479090c5c7128b77f074.png" alt=""></p> -<p>You can then calculate rates:</p> -<ul> -<li>True positive rate (TPR): proportion of actual positives that we<br> -classified correctly</li> -<li>False positive rate (FPR): proportion of actual negatives that we<br> -misclassified as positive</li> -</ul> -<p><img src="_resources/4a756d1ddc51411cbb13957c08b20a8f.png" alt=""></p> -<p>ROC (receiver-operating characteristics) space: plot true positives<br> -against false positives. the best classifier is in the top left corner.</p> -<p><img src="_resources/aba7a57e16944be7b654c26df0acae65.png" alt=""></p> -<p>Ranking classifier: also gives score of how negative/positive a point<br> -is.</p> -<ul> -<li>turning classifier into ranking classifier: -<ul> -<li>for linear classifier, measure distance from decision boundary,<br> -and now you can scale classifier from timid to bold by moving<br> -the decision boundary</li> -<li>for tree classifier: sort by class proportion in each segment</li> -</ul> -</li> -<li>ranking errors: one per every pair of instances that's ranked<br> -wrongly (a negative point is ranked more positively than a positive<br> -point)</li> -</ul> -<p>Coverage matrix: shows what happens to TPR and FPR if we move threshold<br> -from right to left (more or less identical to ROC space)</p> -<p>If we draw line between two classifiers, we can create classifier for<br> -every point on that line by picking output of one of the classifiers at<br> -random. E.g. with 50/50 probability, end up halfway between the two. The<br> -area under the curve of classifiers we can create ("convex hull") is<br> -good indication of quality of classifier -- the bigger this area, the<br> -more useful classifiers we can achieve. Good way to compare classifiers<br> -with class or cost imbalance, if we're unsure of our preferences.</p> -<h3 id="regression">Regression</h3> -<p>Loss function: mean squared errors<br> -(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac><msub><mo>∑</mo><mi>i</mi></msub><mo stretchy="false">(</mo><mover accent="true"><msub><mi>y</mi><mi>i</mi></msub><mo>^</mo></mover><mo>−</mo><msub><mi>y</mi><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">\frac{1}{n} \sum_i (\hat{y_i} - y_i)^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord">^</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.19444em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span>)</p> -<p>Evaluation function: root mean squared error<br> -(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msqrt><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac><msub><mo>∑</mo><mi>i</mi></msub><mo stretchy="false">(</mo><mover accent="true"><msub><mi>y</mi><mi>i</mi></msub><mo>^</mo></mover><mo>−</mo><msub><mi>y</mi><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></msqrt></mrow><annotation encoding="application/x-tex">\sqrt{\frac{1}{n} \sum_i (\hat{y_i} - y_i)^2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.84em;vertical-align:-0.604946em;"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.235054em;"><span class="svg-align" style="top:-3.8em;"><span class="pstrut" style="height:3.8em;"></span><span class="mord" style="padding-left:1em;"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord">^</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.19444em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-3.195054em;"><span class="pstrut" style="height:3.8em;"></span><span class="hide-tail" style="min-width:1.02em;height:1.8800000000000001em;"><svg width='400em' height='1.8800000000000001em' viewBox='0 0 400000 1944' preserveAspectRatio='xMinYMin slice'><path d='M983 90 -l0 -0 -c4,-6.7,10,-10,18,-10 H400000v40 -H1013.1s-83.4,268,-264.1,840c-180.7,572,-277,876.3,-289,913c-4.7,4.7,-12.7,7,-24,7 -s-12,0,-12,0c-1.3,-3.3,-3.7,-11.7,-7,-25c-35.3,-125.3,-106.7,-373.3,-214,-744 -c-10,12,-21,25,-33,39s-32,39,-32,39c-6,-5.3,-15,-14,-27,-26s25,-30,25,-30 -c26.7,-32.7,52,-63,76,-91s52,-60,52,-60s208,722,208,722 -c56,-175.3,126.3,-397.3,211,-666c84.7,-268.7,153.8,-488.2,207.5,-658.5 -c53.7,-170.3,84.5,-266.8,92.5,-289.5z -M1001 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.604946em;"><span></span></span></span></span></span></span></span></span>)</p> -<ul> -<li>you may want to report this, because minimised at same places as MSE,<br> -but has same units as the original output value, so easier to<br> -interpret</li> -</ul> -<p>Bias: distance from true MSE (which is unknown) to the optimum MSE.</p> -<ul> -<li>high bias: model doesn't fit generating distribution.<br> -"underfitting"</li> -<li>reduce by increasing model capacity or features</li> -</ul> -<p>Variance: spread of different experiments' MSE around the true MSE</p> -<ul> -<li>high variance: high model capacity, sensitivity to random<br> -fluctuations. "overfitting"</li> -<li>reduce by reducing model capacity, adding regularization, reducing<br> -tree depth</li> -</ul> -<p>specifically for k-NN regression: increasing k increases bias and<br> -decreases variance</p> -<p>Dartboard example:</p> -<p><img src="_resources/748a8a36136244d9bc6e3c5c8e4060cb.png" alt=""></p> -<h3 id="errors-confidence-intervals">Errors & confidence intervals</h3> -<p>Statistics tries to answer: can observed results be attributed to <em>real<br> -characteristics</em> of the models, or are they observed <em>by<br> -chance</em>?</p> -<p>If you see error bars, the author has to indicate what they mean --<br> -there's no convention.</p> -<p>Standard deviation: measure of spread, variance</p> -<p>Standard error, confidence interval: measure of confidence</p> -<p>If the population distribution is normal, the standard error of the mean<br> -is calculated by <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mi>σ</mi><msqrt><mi>n</mi></msqrt></mfrac></mrow><annotation encoding="application/x-tex">\frac{\sigma}{\sqrt{n}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.233392em;vertical-align:-0.538em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.695392em;"><span style="top:-2.6258665em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord sqrt mtight"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8059050000000001em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mtight" style="padding-left:0.833em;"><span class="mord mathdefault mtight">n</span></span></span><span style="top:-2.765905em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail mtight" style="min-width:0.853em;height:1.08em;"><svg width='400em' height='1.08em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702 -c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14 -c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54 -c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10 -s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429 -c69,-144,104.5,-217.7,106.5,-221 -l0 -0 -c5.3,-9.3,12,-14,20,-14 -H400000v40H845.2724 -s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7 -c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z -M834 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.234095em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.538em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span>(because the<br> -sample distribution is the t distribution)</p> -<p>Re confidence intervals: the correct phrasing is "if we repeat the<br> -experiment many times, computing the confidence interval each time, the<br> -true mean would be inside the interval in 95% of those experiments"</p> -<p>Use statistics in ML to show confidence and spread.</p> -<h2 id="the-no-free-lunch-theorem-and-principle">The no-free-lunch theorem and principle</h2> -<p>Answer to question "what is the best ML method/model in general?"</p> -<p>Theorem: "any two optimization algorithms are equivalent when their<br> -performance is averaged across all possible problems"</p> -<p>i.e. you can't say shit in general.</p> -<p>A few outs:</p> -<ul> -<li>universal distribution, the datasets for which our methods works are<br> -the likely ones</li> -<li>Occam's razor, the simplest solution/explanation is often the best</li> -</ul> -<p>Principle: there is no single best learning method; whether an algorithm<br> -is good depends on the domain</p> -<p>Inductive bias: the aspects of a learning algorithm, which implicitly or<br> -explicitly make it suitable for certain problems make it unsuitable for<br> -others</p> -<h2 id="cleaning-your-data">Cleaning your data</h2> -<h3 id="missing-data">Missing data</h3> -<p>Simplest way - remove features for which values missing. Maybe they're not important, probably, hopefully.</p> -<p>Or remove instances (rows) with missing data. The problem is if data wasn't corrupted uniformly, removing rows with missing values changes the data distribution. An example is if people refuse to answer questions.</p> -<p>Generally, think about the real-world use case -- can you also expect missing data there?</p> -<ul> -<li>if yes: keep them in test set, make a model that can consume them</li> -<li>if no: try to get a test set without missing values, test methods for completing data only in the training set</li> -</ul> -<p>Guessing the missing data ("imputation"):</p> -<ul> -<li>categorical: use the <dfn title="the value that occurs most often">mode</dfn></li> -<li>numerical: use the mean</li> -<li>or, make the feature a target value and train a model</li> -</ul> -<h3 id="outliers">Outliers</h3> -<p>Are they mistakes?:</p> -<ul> -<li>Yes: deal with them.</li> -<li>No: leave them alone, check model for strong assumptions of normally distributed data</li> -</ul> -<p>Can we expect them in production?</p> -<ul> -<li>Yes: make sure model can deal with them</li> -<li>No: remove them, get a test dataset representing production</li> -</ul> -<p>Watch out for MSE, it's based on assumption of normally distributed randomness. If you get data with big outliers, it fucks up.</p> -<h3 id="class-imbalance">Class imbalance</h3> -<p><def title="i.e. how much more likely is a positive example than a negative example?">Class imbalance</def> is a problem, but how do you improve training?</p> -<ul> -<li>Use a big test set</li> -<li>Don't rely on accuracy -- try ROC plots, precision-recall plots, AUC, look at confusion matrix...</li> -<li>Resample training data -<ul> -<li>oversample: sample with replacements. leads to more data, but creates duplicates and increases likelihood of overfitting.</li> -<li>undersample: doesn't lead to duplicates, but you throw away data. might be useful for multiple-pass algorithms</li> -</ul> -</li> -<li>Use data augmentation for minority class -<ul> -<li>oversample minority with new data derived from existing data</li> -<li>example: SMOTE, which finds small clusters of points in minority class, and generates their mean as new minority class point</li> -</ul> -</li> -</ul> -<h2 id="choosing-features">Choosing features</h2> -<p>Even if data is a table, you shouldn't just use columns as features.<br> -Some algorithms work only on numeric features, some only on categorical, some on both.</p> -<p>Converting between categoric/numeric:</p> -<ul> -<li>numeric to categoric - you're bound to lose information, but it might be tolerable</li> -<li>categoric to numeric -<ul> -<li>integer coding: make everything an integer - imposes false ordering on unordered data. generally not a good idea.</li> -<li>one-hot coding: one categorical feature becomes several numeric features. for each element, you say whether or not the feature applies (0 or 1).</li> -</ul> -</li> -</ul> -<p>Expanding features: adding extra features derived from existing features (improves performance).<br> -For example, when you have results that don't fit on a line, but <em>do</em> fit on a curve, you can add a derived feature x².<br> -If we don't have any intuition for extra features to add, just add all cross products, or use functions like sin/log.</p> -<h2 id="normalisation-standardisation">Normalisation & standardisation</h2> -<p>Create a uniform scale.</p> -<h3 id="normalisation">Normalisation</h3> -<p>Fit to [0,1].<br> -Scales the data linearly, smallest point becomes zero, largest point becomes 1:<br> -<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>χ</mi><mo>←</mo><mfrac><mrow><mi>χ</mi><mo>−</mo><msub><mi>χ</mi><mrow><mi>m</mi><mi>i</mi><mi>n</mi></mrow></msub></mrow><mrow><msub><mi>χ</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub><mo>−</mo><msub><mi>χ</mi><mi>min</mi><mo></mo></msub></mrow></mfrac></mrow><annotation encoding="application/x-tex">\chi \leftarrow \frac{\chi - \chi_{min}}{\chi_{max} - \chi_{\min}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathdefault">χ</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.335547em;vertical-align:-0.481108em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.854439em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathdefault mtight">χ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16454285714285719em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">m</span><span class="mord mathdefault mtight">a</span><span class="mord mathdefault mtight">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span><span class="mbin mtight">−</span><span class="mord mtight"><span class="mord mathdefault mtight">χ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3340428571428572em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mop mtight"><span class="mtight">m</span><span class="mtight">i</span><span class="mtight">n</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">χ</span><span class="mbin mtight">−</span><span class="mord mtight"><span class="mord mathdefault mtight">χ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">m</span><span class="mord mathdefault mtight">i</span><span class="mord mathdefault mtight">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.481108em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></p> -<h3 id="standardisation">Standardisation</h3> -<p>Fit to 1D standard normal distribution.<br> -Rescale data so mean becomes zero, standard deviation becomes 1. Make it look like the data came from a standard normal distribution.<br> -<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>χ</mi><mo>←</mo><mfrac><mrow><mi>χ</mi><mo>−</mo><mi>μ</mi></mrow><mi>σ</mi></mfrac></mrow><annotation encoding="application/x-tex">\chi \leftarrow \frac{\chi - \mu}{\sigma}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathdefault">χ</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.199439em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.854439em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">χ</span><span class="mbin mtight">−</span><span class="mord mathdefault mtight">μ</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></p> -<h3 id="whitening">Whitening</h3> -<p>Fit to multivariate standard normal distribution.<br> -If the data is correlated, you don't end up with a spherical shape after normalising/standardising. So you have to choose a different basis (coordinate system) for the points.</p> -<p>Back to linear algebra - choose a basis</p> -<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>B</mi><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>c</mi></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>d</mi></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>1.26</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mo>−</mo><mn>0.3</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.9</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0.5</mn></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow></mrow><annotation encoding="application/x-tex">B = \begin{bmatrix} c & d \end{bmatrix} = \begin{bmatrix} 1.26 & -0.3 \\ 0.9 & 0.5 \end{bmatrix} -</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.20001em;vertical-align:-0.35001em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size1">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8500000000000001em;"><span style="top:-3.01em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">c</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.35000000000000003em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8500000000000001em;"><span style="top:-3.01em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">d</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.35000000000000003em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size1">]</span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span><span class="mord">.</span><span class="mord">2</span><span class="mord">6</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">0</span><span class="mord">.</span><span class="mord">9</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">−</span><span class="mord">0</span><span class="mord">.</span><span class="mord">3</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">0</span><span class="mord">.</span><span class="mord">5</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span></span></span></span></span></p> -<p>Then if you want to convert a coordinate to this basis, multiply <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>B</mi><mi>x</mi></mrow><annotation encoding="application/x-tex">Bx</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span><span class="mord mathdefault">x</span></span></span></span>. If you want to convert from this basis to the standard, multiply <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mi>x</mi></mrow><annotation encoding="application/x-tex">B^{-1} x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05017em;">B</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mord mathdefault">x</span></span></span></span>.</p> -<p>Since the inverse of a matrix is computationally expensive, prefer orthonormal bases (the basis vectors are <def title="perpendicular to each other">orthogonal</def> and <def title="have length 1">normal</def>). Because then <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>B</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mo>=</mo><msup><mi>B</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">B^{-1} = B^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05017em;">B</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8413309999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05017em;">B</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span>, and the transpose is much easier to compute.</p> -<p>Steps:</p> -<ol> -<li>Compute sample mean <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><msub><mo>∑</mo><mi>i</mi></msub><msub><mi>x</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">m = \frac{1}{n} \sum_i x_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">m</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> and sample covariance <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi><mo>=</mo><mfrac><mn>1</mn><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></mfrac><mi>X</mi><msup><mi>X</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">S = \frac{1}{n-1} X X^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05764em;">S</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2484389999999999em;vertical-align:-0.403331em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.403331em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span> (where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi><mo>=</mo><mo stretchy="false">[</mo><msub><mi>x</mi><mn>1</mn></msub><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msub><mi>x</mi><mi>n</mi></msub><mo stretchy="false">]</mo><mo>−</mo><mi>m</mi></mrow><annotation encoding="application/x-tex">X = [x_1, \dots, x_n] -m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">]</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">m</span></span></span></span>).</li> -<li>Find some A st <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi><mo>=</mo><mi>A</mi><msup><mi>A</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">S = AA^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05764em;">S</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8413309999999999em;vertical-align:0em;"></span><span class="mord mathdefault">A</span><span class="mord"><span class="mord mathdefault">A</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span>:<br> -* Cholesky decomposition<br> -* Singular value decomposition<br> -* Matrix square root</li> -<li>White the data: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>←</mo><msup><mi>A</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>m</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">x \leftarrow A^{-1} (x-m)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault">A</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">m</span><span class="mclose">)</span></span></span></span></li> -</ol> -<p>So whitening means we choose new basis vectors for a coordinate system where the features are not correlated, and variance is 1 in every direction.</p> -<h2 id="dimensionality-reduction">Dimensionality reduction</h2> -<p>Opposite of feature expansion - reducing number of features in data by deriving new features from old ones, hopefully without losing essential information.</p> -<p>Good for efficiency, reducing variance of model performance, and visualisation.</p> -<p>Principal component analysis (PCA): whitening with some extra properties. Afte applying, you throw away all but first k dimensions, and get very good projection of data down to k dimensions.</p> -<ol> -<li>Mean-center the data</li> -<li>Compute sample covariance S</li> -<li>Compute singular value decomposition: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>U</mi><mi>Z</mi><msup><mi>U</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">UZU^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8413309999999999em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">U</span><span class="mord mathdefault" style="margin-right:0.07153em;">Z</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span><br> -* SVD is usually computed from X or A (set equal to X or A)<br> -* Z is diagonal, whose diagonal values sorted from largest to smallest are the eigenvalues.<br> -* U is an orthonormal basis, whose columns are the eigenvectors of A. -<ul> -<li>Eigenvectors: a matrix transforms vectors, with some getting stretched and rotated. If a vector only gets stretched/flipped, but its direction doesn't change, it's an eigenvector. Translating to math, if Au = λu, u is an eigenvector, and λ is its corresponding scalar eigenvalue.</li> -</ul> -</li> -<li>Transform data: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>←</mo><msup><mi>U</mi><mi>T</mi></msup><mi>x</mi></mrow><annotation encoding="application/x-tex">x \leftarrow U^T x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">←</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8413309999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mord mathdefault">x</span></span></span></span>. To whiten, also divide by diag(Z)</li> -<li>Discard all but first k features (keep only features corresponding to biggest eigenvectors)</li> -</ol> -</div></div> - </body> - </html> diff --git a/content/ml-notes/_resources/4a756d1ddc51411cbb13957c08b20a8f.png b/content/ml-notes/Methodology/4a756d1ddc51411cbb13957c08b20a8f.png Binary files differ. diff --git a/content/ml-notes/_resources/6fa59a4013a0431a9561c4b00b29e8b9.png b/content/ml-notes/Methodology/6fa59a4013a0431a9561c4b00b29e8b9.png Binary files differ. diff --git a/content/ml-notes/_resources/748a8a36136244d9bc6e3c5c8e4060cb.png b/content/ml-notes/Methodology/748a8a36136244d9bc6e3c5c8e4060cb.png Binary files differ. diff --git a/content/ml-notes/_resources/aba7a57e16944be7b654c26df0acae65.png b/content/ml-notes/Methodology/aba7a57e16944be7b654c26df0acae65.png Binary files differ. diff --git a/content/ml-notes/_resources/e395f5797bc3479090c5c7128b77f074.png b/content/ml-notes/Methodology/e395f5797bc3479090c5c7128b77f074.png Binary files differ. diff --git a/content/ml-notes/Methodology/index.md b/content/ml-notes/Methodology/index.md @@ -0,0 +1,302 @@ ++++ +title = 'Methodology' +template = 'page-math.html' ++++ +# ML: Methodology + +## Performing an experiment + +Never judge your performance on the training data (or you'll fail the +course and life). + +The proportion of training to test data is not important, the _absolute +size_ of the test data is. Aim to have min 500 examples in +test data (ideal 10 000 or more). + +### What if you need to test many models? + +e.g. k-nearest neighbours, which classifies a point based on the +classification of its k nearest neighbours. + +### The modern recipe + +1. Split data into train, validation, and test data. Sample randomly, + at least 500 examples in test set. +2. Choose model, hyperparameters, etc. only based on the training set. + Test on validation. Don't use test set for anything. +3. State the hypothesis. +4. During the final run, train on training + validation data. +5. Test hypothesis _once_ on the test data. Usually at the + very end of the project, when you write report/paper. + +Don't re-use test data: + +- you'd pick the wrong model +- it would inflate your performance estimate + +For temporal data, you'll probably want to keep the data ordered by +time. + +Which hyperparameters to try? + +- trial-and-error (via intuition) +- grid search: define finite set of values for each hyperparam, try + all combinations +- random search + +### Cross-validation + +You still split your data, but every run, a different slice becomes the +validation data. Then you average the results for the final result. + +If it's temporal data, you might want to do walk-forward validation, +where you always expand your data slices forward in time. + +## What to report + +### Classification + +#### What's a good error (5%)? + +It depends, just like in every class so far: + +- Class imbalance: how much more likely is a positive example than a + negative example? +- Cost imbalance: how much worse is mislabeled positive than + mislabeled negative? e.g. how bad is it to mark a real email as spam + vs letting a spam message into your inbox? + +#### Performance metrics + +##### Confusion matrix (contingency table) + +Metrics for a single classifier. + +The margins give four totals: actual number of each class present in +data, number of each class predicted by the classifier. + +![](6fa59a4013a0431a9561c4b00b29e8b9.png) + +##### Precision and recall + +Also for a single classifier. + +- Precision: proportion of returned positives that are + _actually_ positive +- Recall: proportion of existing positives that the classifier found + +![](e395f5797bc3479090c5c7128b77f074.png) + +You can then calculate rates: + +- True positive rate (TPR): proportion of actual positives that we + classified correctly +- False positive rate (FPR): proportion of actual negatives that we + misclassified as positive + +![](4a756d1ddc51411cbb13957c08b20a8f.png) + +ROC (receiver-operating characteristics) space: plot true positives +against false positives. the best classifier is in the top left corner. + +![](aba7a57e16944be7b654c26df0acae65.png) + +Ranking classifier: also gives score of how negative/positive a point +is. + +- turning classifier into ranking classifier: + - for linear classifier, measure distance from decision boundary, + and now you can scale classifier from timid to bold by moving + the decision boundary + - for tree classifier: sort by class proportion in each segment +- ranking errors: one per every pair of instances that's ranked + wrongly (a negative point is ranked more positively than a positive + point) + +Coverage matrix: shows what happens to TPR and FPR if we move threshold +from right to left (more or less identical to ROC space) + +If we draw line between two classifiers, we can create classifier for +every point on that line by picking output of one of the classifiers at +random. E.g. with 50/50 probability, end up halfway between the two. The +area under the curve of classifiers we can create ("convex hull") is +good indication of quality of classifier -- the bigger this area, the +more useful classifiers we can achieve. Good way to compare classifiers +with class or cost imbalance, if we're unsure of our preferences. + +### Regression + +Loss function: mean squared errors +($\frac{1}{n} \sum_i (\hat{y_i} - y_i)^2$) + +Evaluation function: root mean squared error +($\sqrt{\frac{1}{n} \sum_i (\hat{y_i} - y_i)^2}$) + +- you may want to report this, because minimised at same places as MSE, + but has same units as the original output value, so easier to + interpret + +Bias: distance from true MSE (which is unknown) to the optimum MSE. + +- high bias: model doesn't fit generating distribution. + "underfitting" +- reduce by increasing model capacity or features + +Variance: spread of different experiments' MSE around the true MSE + +- high variance: high model capacity, sensitivity to random + fluctuations. "overfitting" +- reduce by reducing model capacity, adding regularization, reducing + tree depth + +specifically for k-NN regression: increasing k increases bias and +decreases variance + +Dartboard example: + +![](748a8a36136244d9bc6e3c5c8e4060cb.png) + +### Errors & confidence intervals + +Statistics tries to answer: can observed results be attributed to _real +characteristics_ of the models, or are they observed _by +chance_? + +If you see error bars, the author has to indicate what they mean -- +there's no convention. + +Standard deviation: measure of spread, variance + +Standard error, confidence interval: measure of confidence + +If the population distribution is normal, the standard error of the mean +is calculated by $\frac{\sigma}{\sqrt{n}}$(because the +sample distribution is the t distribution) + +Re confidence intervals: the correct phrasing is "if we repeat the +experiment many times, computing the confidence interval each time, the +true mean would be inside the interval in 95% of those experiments" + +Use statistics in ML to show confidence and spread. + +## The no-free-lunch theorem and principle + +Answer to question "what is the best ML method/model in general?" + +Theorem: "any two optimization algorithms are equivalent when their +performance is averaged across all possible problems" + +i.e. you can't say shit in general. + +A few outs: + +- universal distribution, the datasets for which our methods works are + the likely ones +- Occam's razor, the simplest solution/explanation is often the best + +Principle: there is no single best learning method; whether an algorithm +is good depends on the domain + +Inductive bias: the aspects of a learning algorithm, which implicitly or +explicitly make it suitable for certain problems make it unsuitable for +others + +## Cleaning your data +### Missing data +Simplest way - remove features for which values missing. Maybe they're not important, probably, hopefully. + +Or remove instances (rows) with missing data. The problem is if data wasn't corrupted uniformly, removing rows with missing values changes the data distribution. An example is if people refuse to answer questions. + +Generally, think about the real-world use case -- can you also expect missing data there? +* if yes: keep them in test set, make a model that can consume them +* if no: try to get a test set without missing values, test methods for completing data only in the training set + +Guessing the missing data ("imputation"): +* categorical: use the <dfn title="the value that occurs most often">mode</dfn> +* numerical: use the mean +* or, make the feature a target value and train a model + +### Outliers +Are they mistakes?: +* Yes: deal with them. +* No: leave them alone, check model for strong assumptions of normally distributed data + +Can we expect them in production? +* Yes: make sure model can deal with them +* No: remove them, get a test dataset representing production + +Watch out for MSE, it's based on assumption of normally distributed randomness. If you get data with big outliers, it fucks up. + +### Class imbalance +<def title="i.e. how much more likely is a positive example than a negative example?">Class imbalance</def> is a problem, but how do you improve training? +* Use a big test set +* Don't rely on accuracy -- try ROC plots, precision-recall plots, AUC, look at confusion matrix... +* Resample training data + * oversample: sample with replacements. leads to more data, but creates duplicates and increases likelihood of overfitting. + * undersample: doesn't lead to duplicates, but you throw away data. might be useful for multiple-pass algorithms +* Use data augmentation for minority class + * oversample minority with new data derived from existing data + * example: SMOTE, which finds small clusters of points in minority class, and generates their mean as new minority class point + +## Choosing features +Even if data is a table, you shouldn't just use columns as features. +Some algorithms work only on numeric features, some only on categorical, some on both. + +Converting between categoric/numeric: +* numeric to categoric - you're bound to lose information, but it might be tolerable +* categoric to numeric + * integer coding: make everything an integer - imposes false ordering on unordered data. generally not a good idea. + * one-hot coding: one categorical feature becomes several numeric features. for each element, you say whether or not the feature applies (0 or 1). + +Expanding features: adding extra features derived from existing features (improves performance). +For example, when you have results that don't fit on a line, but _do_ fit on a curve, you can add a derived feature x². +If we don't have any intuition for extra features to add, just add all cross products, or use functions like sin/log. + +## Normalisation & standardisation +Create a uniform scale. + +### Normalisation +Fit to [0,1]. +Scales the data linearly, smallest point becomes zero, largest point becomes 1: +$\chi \leftarrow \frac{\chi - \chi_{min}}{\chi_{max} - \chi_{\min}}$ + +### Standardisation +Fit to 1D standard normal distribution. +Rescale data so mean becomes zero, standard deviation becomes 1. Make it look like the data came from a standard normal distribution. +$\chi \leftarrow \frac{\chi - \mu}{\sigma}$ + +### Whitening +Fit to multivariate standard normal distribution. +If the data is correlated, you don't end up with a spherical shape after normalising/standardising. So you have to choose a different basis (coordinate system) for the points. + +Back to linear algebra - choose a basis +$$B = \begin{bmatrix} c & d \end{bmatrix} = \begin{bmatrix} 1.26 & -0.3 \\ 0.9 & 0.5 \end{bmatrix}$$ + +Then if you want to convert a coordinate to this basis, multiply $Bx$. If you want to convert from this basis to the standard, multiply $B^{-1} x$. + +Since the inverse of a matrix is computationally expensive, prefer orthonormal bases (the basis vectors are <def title="perpendicular to each other">orthogonal</def> and <def title="have length 1">normal</def>). Because then $B<sup>{-1} = B</sup>T$, and the transpose is much easier to compute. + +Steps: + 1. Compute sample mean $m = \frac{1}{n} \sum_i x_i$ and sample covariance $S = \frac{1}{n-1} X X^T$ (where $X = [x_1, \dots, x_n] -m$). + 2. Find some A st $S = AA^T$: + * Cholesky decomposition + * Singular value decomposition + * Matrix square root + 3. White the data: $x \leftarrow A^{-1} (x-m)$ + +So whitening means we choose new basis vectors for a coordinate system where the features are not correlated, and variance is 1 in every direction. +## Dimensionality reduction +Opposite of feature expansion - reducing number of features in data by deriving new features from old ones, hopefully without losing essential information. + +Good for efficiency, reducing variance of model performance, and visualisation. + +Principal component analysis (PCA): whitening with some extra properties. Afte applying, you throw away all but first k dimensions, and get very good projection of data down to k dimensions. + 1. Mean-center the data + 2. Compute sample covariance S + 3. Compute singular value decomposition: $UZU^T$ + * SVD is usually computed from X or A (set equal to X or A) + * Z is diagonal, whose diagonal values sorted from largest to smallest are the eigenvalues. + * U is an orthonormal basis, whose columns are the eigenvectors of A. + * Eigenvectors: a matrix transforms vectors, with some getting stretched and rotated. If a vector only gets stretched/flipped, but its direction doesn't change, it's an eigenvector. Translating to math, if Au = λu, u is an eigenvector, and λ is its corresponding scalar eigenvalue. + 4. Transform data: $x \leftarrow U^T x$. To whiten, also divide by diag(Z) + 5. Discard all but first k features (keep only features corresponding to biggest eigenvectors) diff --git a/content/ml-notes/Models for sequential data.html b/content/ml-notes/Models for sequential data.html @@ -1,111 +0,0 @@ - - <!DOCTYPE html> - <html> - <head> - <meta charset="UTF-8"> - - <title>Models for sequential data</title> - <link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head> - <body> - -<div id="rendered-md"><h1 id="models-for-sequential-data">Models for sequential data</h1> -<nav class="table-of-contents"><ul><li><a href="#models-for-sequential-data">Models for sequential data</a><ul><li><a href="#sequences">Sequences</a></li><li><a href="#markov-models">Markov models</a></li><li><a href="#embedding-models">Embedding models</a></li><li><a href="#recurrent-neural-networks">Recurrent neural networks</a></li><li><a href="#lstms">LSTMs</a></li></ul></li></ul></nav><h2 id="sequences">Sequences</h2> -<p>They consists of numbers or symbols:<br> -- numeric 1 dimensional, e.g. stock price over time. can be n-dimensional<br> -- symbolic (categorical) 1-dimensional, like english sequence of words/characters. can be n-dimensional, with multiple categorical features per timestamp (like sheet music)</p> -<p>we could have one sequence per instance, and try to classify the sequences (like email spam/not spam)<br> -or the whole dataset is a sequence, and instances are ordered.</p> -<p>single sequence feature extraction:</p> -<ul> -<li>make it a regression problem, each point is represented by the m values before it</li> -<li>gives us a table with a target label (value at time t) and m features (the m preceding values)</li> -<li>you could also use mean/variance statistics</li> -<li>but shit: if the data is shuffled, the classifier is trained on data that comes from the future (relative to the test data)</li> -</ul> -<p>major key: think about the real-world use case. e.g. if we want to predict future values, the training data shouldn't contain things that happen later than test data.</p> -<p>you can do walk-forward validation, if target labels have meaningful ordering in time:</p> -<p><img src="_resources/9e5ba66c8e834fc383006a31ff012558.png" alt="519e53c90f0dae96942ac72ed59aacc0.png"></p> -<p>When modelling probability, break the sequence into its tokens, like words in a sentence. Each token is modeled as a random variable (<em>not</em> independent).<br> -So you end up with joint distribution P(W₄, W₃, W₂, W₁) (with some arbitrary number of parameters.</p> -<p>Can apply chain rule of probability:</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable rowspacing="0.24999999999999992em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mi>P</mi><mo stretchy="false">(</mo><msub><mi>W</mi><mn>4</mn></msub><mo separator="true">,</mo><msub><mi>W</mi><mn>3</mn></msub><mo separator="true">,</mo><msub><mi>W</mi><mn>2</mn></msub><mo separator="true">,</mo><msub><mi>W</mi><mn>1</mn></msub><mo stretchy="false">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>W</mi><mn>4</mn></msub><mo separator="true">,</mo><msub><mi>W</mi><mn>3</mn></msub><mo separator="true">,</mo><msub><mi>W</mi><mn>2</mn></msub><mi mathvariant="normal">∣</mi><msub><mi>W</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>W</mi><mn>1</mn></msub><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>W</mi><mn>4</mn></msub><mo separator="true">,</mo><msub><mi>W</mi><mn>3</mn></msub><mi mathvariant="normal">∣</mi><msub><mi>W</mi><mn>2</mn></msub><mo separator="true">,</mo><msub><mi>W</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>W</mi><mn>2</mn></msub><mi mathvariant="normal">∣</mi><msub><mi>W</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>W</mi><mn>1</mn></msub><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>W</mi><mn>4</mn></msub><mi mathvariant="normal">∣</mi><msub><mi>W</mi><mn>3</mn></msub><mo separator="true">,</mo><msub><mi>W</mi><mn>2</mn></msub><mo separator="true">,</mo><msub><mi>W</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>W</mi><mn>3</mn></msub><mi mathvariant="normal">∣</mi><msub><mi>W</mi><mn>2</mn></msub><mo separator="true">,</mo><msub><mi>W</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>W</mi><mn>2</mn></msub><mi mathvariant="normal">∣</mi><msub><mi>W</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>W</mi><mn>1</mn></msub><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{aligned} -P(W_4, W_3, W_2, W_1) &= P(W_4, W_3, W_2 | W_1) P(W_1) \\ - &= P(W_4, W_3 | W_2, W_1) P(W_2 | W_1) P(W_1) \\ - &= P(W_4 | W_3, W_2, W_1) P(W_3 | W_2, W_1) P(W_2 | W_1) P(W_1) -\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:4.500000000000002em;vertical-align:-2.000000000000001em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.5000000000000004em;"><span style="top:-4.66em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">4</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span><span style="top:-3.16em;"><span class="pstrut" style="height:3em;"></span><span class="mord"></span></span><span style="top:-1.6599999999999993em;"><span class="pstrut" style="height:3em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:2.000000000000001em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.5000000000000004em;"><span style="top:-4.66em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">4</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span><span style="top:-3.16em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">4</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span><span style="top:-1.6599999999999993em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">4</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:2.000000000000001em;"><span></span></span></span></span></span></span></span></span></span></span></p> -<p>i.e.: can rewrite probability of sentences as product of probability of each word, with condition on its history.<br> -with log probability, you get a sum: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>log</mi><mo></mo><mrow><mi>P</mi><mo stretchy="false">(</mo><mtext>sentence</mtext><mo stretchy="false">)</mo></mrow><mo>=</mo><msub><mo>∑</mo><mrow><mi>w</mi><mi>o</mi><mi>r</mi><mi>d</mi></mrow></msub><mi>log</mi><mo></mo><mrow><mi>P</mi><mo stretchy="false">(</mo><mtext>word</mtext><mi mathvariant="normal">∣</mi><mtext>words before it</mtext><mo stretchy="false">)</mo></mrow></mrow><annotation encoding="application/x-tex">\log{P(\text{sentence})} = \sum_{word} \log{P(\text{word} | \text{words before it})}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord text"><span class="mord">sentence</span></span><span class="mclose">)</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.0497100000000001em;vertical-align:-0.29971000000000003em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1863979999999999em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="mord mathdefault mtight">o</span><span class="mord mathdefault mtight" style="margin-right:0.02778em;">r</span><span class="mord mathdefault mtight">d</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord text"><span class="mord">word</span></span><span class="mord">∣</span><span class="mord text"><span class="mord">words before it</span></span><span class="mclose">)</span></span></span></span></span></p> -<h2 id="markov-models">Markov models</h2> -<p>Markov assumption: limit the amount of memory for previous tokens. e.g. retain a max of 2 words.<br> -The "order" is the number of words retained in the conditional.</p> -<p>For example, if the conditional is <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mtext>i, will, graduate, in, a, decade</mtext><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(x | \text{i, will, graduate, in, a, decade})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord text"><span class="mord">i, will, graduate, in, a, decade</span></span><span class="mclose">)</span></span></span></span> and it's a third-order model, the Markov assumption is <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mtext>i, will, graduate</mtext><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(x | \text{i, will, graduate})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord text"><span class="mord">i, will, graduate</span></span><span class="mclose">)</span></span></span></span>.</p> -<p>With Markov assumption and chain rule, can model sequence as limited-memory conditional probabilities. These can be estimated from a corpus (huge piece of text).</p> -<p>For example, to estimate prob of the word 'prize' given "won a", count how often "won a prize" occurs in text as proportion of total occurrences of "won a":</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mtext>prize</mtext><mi mathvariant="normal">∣</mi><mtext>a, won</mtext><mo stretchy="false">)</mo><mo>≈</mo><mfrac><mtext># won a prize</mtext><mtext># won a</mtext></mfrac></mrow><annotation encoding="application/x-tex">P(\text{prize} | \text{a, won}) \approx \frac{\text{\# won a prize}}{\text{\# won a}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord text"><span class="mord">prize</span></span><span class="mord">∣</span><span class="mord text"><span class="mord">a, won</span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.4133239999999998em;vertical-align:-0.481108em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9322159999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight"># won a</span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight"># won a prize</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.481108em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></p> -<p>The word snippets are "n-grams". Three words is a trigram, two words is a bigram. I guess one word is just a gram. And maybe 1000 words would be a kilogram.</p> -<p>Sequential sampling: start with small seed of words, then sample next word according to its probability given the previous words.</p> -<h2 id="embedding-models">Embedding models</h2> -<p>Model object x by embedding vector e<sub>x</sub>. The similarities of these vectors represent similarities between words.<br> -Creates embedding vectors for words, where distances and directions reflect semantic meaning.</p> -<p>Distributional hypothesis: words that occur in same context often have similar meanings.</p> -<p>1-hot vector: represent words as atomic objects in a monolithic vector</p> -<p>Word2Vec:</p> -<ul> -<li>slide context window over sequence, trying to predict distribution P(y|x) - which words likely to occur in context window given middle word</li> -<li>create dataset of word pairs from text</li> -<li>feed this dataset to two-layer network, which predicts context</li> -<li>softmax activation over 10k outputs is expensive, so need some tricks to make it feasible</li> -<li>after training, discard second layer (softmax) and only use embeddings produced by first layer</li> -</ul> -<h2 id="recurrent-neural-networks">Recurrent neural networks</h2> -<p>Neural network with cycles in it (used for sequences).</p> -<p>Can be used for:</p> -<ul> -<li>sequence to sequence, e.g. translating English to French</li> -<li>sequence to label, e.g. sequence classification</li> -<li>label to sequence, e.g. sentence generation</li> -</ul> -<p>Example, fully connected network with input x extended by three nodes, to which the hidden layer is copied:</p> -<p><img src="_resources/761f27ac323e4070bf7df80a540a6831.png" alt="128d6b585d9c1bdaf5a568bc022d8763.png"></p> -<p>Visual shorthand:</p> -<ul> -<li>rectangle is vector of nodes</li> -<li>arrow feeding into the rectangle annotated with a weight matrix means fully connected transformation</li> -<li>if line doesn't have weight, it's a copy of input vector</li> -<li>if two lines flow into each other, concatenate their vectors</li> -</ul> -<p><img src="_resources/44c66910d5c84b08a5a833c3633f9595.png" alt="d32563c1b461ded32d46894a7de2198d.png"> <img src="_resources/92011008806443b784e5f7bc2d3b0a77.png" alt="afb898aeecf247a602f622999280ed61.png"></p> -<p>Training RNNs:</p> -<ul> -<li>provide input seq x, target seq t</li> -<li>backpropagation through time: -<ul> -<li> -<p>unroll:</p> -<ul> -<li>every step in seq is applied in parallel to copy of the network</li> -<li>recurrent connection flows from previous copy to next</li> -<li>the whole thing is a feedforward net (network without cyles)</li> -<li>hidden layer inits to zero vector</li> -</ul> -<p><img src="_resources/82d2db6b48d4409582ac544450821b12.png" alt="5a22d7340c41f70a48e9815f7576ec3a.png"></p> -</li> -</ul> -</li> -</ul> -<p>Basic RNNs work well, but don't learn to remember information for a long time.<br> -Can't have a long term mem for everything, need to be selective. In order to remember things long term, you need to forget a lot of other stuff (such is life).</p> -<h2 id="lstms">LSTMs</h2> -<p>"Long short-term memory".<br> -Selective forgetting and remembering, controlled by learnable "gates". Side note, from now on I'm not "studying", but I'm "selectively forgetting and remembering".</p> -<p>The gating mechanism takes two input vectors, which are combined with sigmoid and <abbr title="sigmoid rescaled so its outputs are between -1 and 1">tanh</abbr> activations.<br> -It produces an additive value -- want to figure out how much of input to add to some other vectors.<br> -The <abbr title="sigmoid rescaled so its outputs are between -1 and 1">tanh</abbr> is like a mapping of input to range(-1, 1) -- limits the effect of the addition vector.<br> -The sigmoid is like a selection vector.</p> -<p><img src="_resources/e8795f123d904061a5f2ae90dfdc2c4e.png" alt="c60c95b58a2975c98df7c548b0585b9b.png"></p> -<p>Basic operation of LSTM is a "cell". There are two recurrent connections between cells: the current output y, and the cell state C.</p> -<p>I don't yet know how much detail we need to know about this, so I'll fill it in later based on exam questions.</p> -<p>The prof's summary: "incredibly powerful language models. Tricky to train, very opaque." Yep, opaque and complicated, indeed.</p> -</div></div> - </body> - </html> diff --git a/content/ml-notes/_resources/44c66910d5c84b08a5a833c3633f9595.png b/content/ml-notes/Models for sequential data/44c66910d5c84b08a5a833c3633f9595.png Binary files differ. diff --git a/content/ml-notes/_resources/761f27ac323e4070bf7df80a540a6831.png b/content/ml-notes/Models for sequential data/761f27ac323e4070bf7df80a540a6831.png Binary files differ. diff --git a/content/ml-notes/_resources/82d2db6b48d4409582ac544450821b12.png b/content/ml-notes/Models for sequential data/82d2db6b48d4409582ac544450821b12.png Binary files differ. diff --git a/content/ml-notes/_resources/92011008806443b784e5f7bc2d3b0a77.png b/content/ml-notes/Models for sequential data/92011008806443b784e5f7bc2d3b0a77.png Binary files differ. diff --git a/content/ml-notes/_resources/9e5ba66c8e834fc383006a31ff012558.png b/content/ml-notes/Models for sequential data/9e5ba66c8e834fc383006a31ff012558.png Binary files differ. diff --git a/content/ml-notes/_resources/e8795f123d904061a5f2ae90dfdc2c4e.png b/content/ml-notes/Models for sequential data/e8795f123d904061a5f2ae90dfdc2c4e.png Binary files differ. diff --git a/content/ml-notes/Models for sequential data/index.md b/content/ml-notes/Models for sequential data/index.md @@ -0,0 +1,123 @@ ++++ +title = 'Models for sequential data' +template = 'page-math.html' ++++ +# Models for sequential data + +## Sequences +They consists of numbers or symbols: + - numeric 1 dimensional, e.g. stock price over time. can be n-dimensional + - symbolic (categorical) 1-dimensional, like english sequence of words/characters. can be n-dimensional, with multiple categorical features per timestamp (like sheet music) + +we could have one sequence per instance, and try to classify the sequences (like email spam/not spam) +or the whole dataset is a sequence, and instances are ordered. + +single sequence feature extraction: +- make it a regression problem, each point is represented by the m values before it +- gives us a table with a target label (value at time t) and m features (the m preceding values) +- you could also use mean/variance statistics +- but shit: if the data is shuffled, the classifier is trained on data that comes from the future (relative to the test data) + +major key: think about the real-world use case. e.g. if we want to predict future values, the training data shouldn't contain things that happen later than test data. + +you can do walk-forward validation, if target labels have meaningful ordering in time: + +![519e53c90f0dae96942ac72ed59aacc0.png](9e5ba66c8e834fc383006a31ff012558.png) + +When modelling probability, break the sequence into its tokens, like words in a sentence. Each token is modeled as a random variable (_not_ independent). +So you end up with joint distribution P(W₄, W₃, W₂, W₁) (with some arbitrary number of parameters. + +Can apply chain rule of probability: + +$\begin{aligned} +P(W_4, W_3, W_2, W_1) &= P(W_4, W_3, W_2 | W_1) P(W_1) \\ + &= P(W_4, W_3 | W_2, W_1) P(W_2 | W_1) P(W_1) \\ + &= P(W_4 | W_3, W_2, W_1) P(W_3 | W_2, W_1) P(W_2 | W_1) P(W_1) +\end{aligned}$ + +i.e.: can rewrite probability of sentences as product of probability of each word, with condition on its history. +with log probability, you get a sum: $\log{P(\text{sentence})} = \sum_{word} \log{P(\text{word} | \text{words before it})}$ + +## Markov models +Markov assumption: limit the amount of memory for previous tokens. e.g. retain a max of 2 words. +The "order" is the number of words retained in the conditional. + +For example, if the conditional is $P(x | \text{i, will, graduate, in, a, decade})$ and it's a third-order model, the Markov assumption is $P(x | \text{i, will, graduate})$. + +With Markov assumption and chain rule, can model sequence as limited-memory conditional probabilities. These can be estimated from a corpus (huge piece of text). + +For example, to estimate prob of the word 'prize' given "won a", count how often "won a prize" occurs in text as proportion of total occurrences of "won a": + +$P(\text{prize} | \text{a, won}) \approx \frac{\text{\# won a prize}}{\text{\# won a}}$ + +The word snippets are "n-grams". Three words is a trigram, two words is a bigram. I guess one word is just a gram. And maybe 1000 words would be a kilogram. + +Sequential sampling: start with small seed of words, then sample next word according to its probability given the previous words. + +## Embedding models +Model object x by embedding vector e<sub>x</sub>. The similarities of these vectors represent similarities between words. +Creates embedding vectors for words, where distances and directions reflect semantic meaning. + +Distributional hypothesis: words that occur in same context often have similar meanings. + +1-hot vector: represent words as atomic objects in a monolithic vector + +Word2Vec: +- slide context window over sequence, trying to predict distribution P(y|x) - which words likely to occur in context window given middle word +- create dataset of word pairs from text +- feed this dataset to two-layer network, which predicts context +- softmax activation over 10k outputs is expensive, so need some tricks to make it feasible +- after training, discard second layer (softmax) and only use embeddings produced by first layer + +## Recurrent neural networks +Neural network with cycles in it (used for sequences). + +Can be used for: +- sequence to sequence, e.g. translating English to French +- sequence to label, e.g. sequence classification +- label to sequence, e.g. sentence generation + +Example, fully connected network with input x extended by three nodes, to which the hidden layer is copied: + +![128d6b585d9c1bdaf5a568bc022d8763.png](761f27ac323e4070bf7df80a540a6831.png) + +Visual shorthand: +- rectangle is vector of nodes +- arrow feeding into the rectangle annotated with a weight matrix means fully connected transformation +- if line doesn't have weight, it's a copy of input vector +- if two lines flow into each other, concatenate their vectors + +![d32563c1b461ded32d46894a7de2198d.png](44c66910d5c84b08a5a833c3633f9595.png) ![afb898aeecf247a602f622999280ed61.png](92011008806443b784e5f7bc2d3b0a77.png) + +Training RNNs: +- provide input seq x, target seq t +- backpropagation through time: + - unroll: + - every step in seq is applied in parallel to copy of the network + - recurrent connection flows from previous copy to next + - the whole thing is a feedforward net (network without cyles) + - hidden layer inits to zero vector + + ![5a22d7340c41f70a48e9815f7576ec3a.png](82d2db6b48d4409582ac544450821b12.png) + +Basic RNNs work well, but don't learn to remember information for a long time. +Can't have a long term mem for everything, need to be selective. In order to remember things long term, you need to forget a lot of other stuff (such is life). + +## LSTMs +"Long short-term memory". +Selective forgetting and remembering, controlled by learnable "gates". Side note, from now on I'm not "studying", but I'm "selectively forgetting and remembering". + +*[tanh]: sigmoid rescaled so its outputs are between -1 and 1 + +The gating mechanism takes two input vectors, which are combined with sigmoid and tanh activations. +It produces an additive value -- want to figure out how much of input to add to some other vectors. +The tanh is like a mapping of input to range(-1, 1) -- limits the effect of the addition vector. +The sigmoid is like a selection vector. + +![c60c95b58a2975c98df7c548b0585b9b.png](e8795f123d904061a5f2ae90dfdc2c4e.png) + +Basic operation of LSTM is a "cell". There are two recurrent connections between cells: the current output y, and the cell state C. + +I don't yet know how much detail we need to know about this, so I'll fill it in later based on exam questions. + +The prof's summary: "incredibly powerful language models. Tricky to train, very opaque." Yep, opaque and complicated, indeed. diff --git a/content/ml-notes/Probability.html b/content/ml-notes/Probability.html @@ -1,288 +0,0 @@ - - <!DOCTYPE html> - <html> - <head> - <meta charset="UTF-8"> - <link rel="stylesheet" href="pluginAssets/katex/katex.css"> - <title>Probability</title> - <link href="./style.css" rel="stylesheet"> - </head> - <body> - -<div id="rendered-md"><h1 id="probability">Probability</h1> -<nav class="table-of-contents"><ul><li><a href="#probability">Probability</a><ul><li><a href="#probability-basics">Probability basics</a><ul><li><a href="#probability-theory">Probability theory</a></li></ul></li><li><a href="#naive-bayesian-classifiers">(Naive) Bayesian classifiers</a></li><li><a href="#logistic-regression-classifier">Logistic "regression" (classifier)</a></li><li><a href="#information-theory">Information theory</a><ul><li><a href="#maximum-likelihood">Maximum likelihood</a></li><li><a href="#normal-distributions-gaussians">Normal distributions (Gaussians)</a><ul><li><a href="#1d-normal-distribution-gaussian">1D normal distribution (Gaussian)</a></li><li><a href="#regression-with-gaussian-errors">Regression with Gaussian errors</a></li><li><a href="#n-d-normal-distribution-multivariate-gaussian">n-D normal distribution (multivariate Gaussian)</a></li><li><a href="#gaussian-mixture-model">Gaussian mixture model</a></li></ul></li><li><a href="#expectation-maximisation">Expectation-maximisation</a></li></ul></li></ul></li></ul></nav><h2 id="probability-basics">Probability basics</h2> -<p>What even is probability?</p> -<ul> -<li>Frequentism: probability is only property of repeated experiments</li> -<li>Bayesianism: probability is expression of our uncertainty and of our<br> -beliefs</li> -</ul> -<h3 id="probability-theory">Probability theory</h3> -<p>Definitions:</p> -<ul> -<li>sample space: the possible outcomes, can be discrete or continuous<br> -(like real numbers)</li> -<li>event space: set of the things that have probability (subsets of<br> -sample space)</li> -<li>random variable: a way to describe events, takes values with some<br> -probability -<ul> -<li>notation P(X = x) = 0.2 means that X takes the value x with<br> -probability 0.2</li> -</ul> -</li> -<li>for random variables X and Y: -<ul> -<li>joint probability P(X, Y): gives probability of each atomic<br> -event (specified single value for each random variable)</li> -<li>marginal probability: if you sum a row/column of the joint<br> -distribution (also called "marginalizing out" a variable)</li> -<li>conditional probability P(X | Y): probability of X given Y,<br> -i.e. the probability over one variable if another variable is<br> -known</li> -</ul> -</li> -<li>independence: X and Y independent if P(X, Y) = P(X) P(Y)</li> -<li>conditional independence: -<ul> -<li>X and Y conditionally independent if P(X | Y) = P(X)</li> -<li>X and Y conditionally independent given Z if P(X, Y | Z) =<br> -P(X|Z) P(Y|Z)</li> -</ul> -</li> -</ul> -<p>Identities:</p> -<ul> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mi>y</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo>∩</mo><mi>y</mi><mo stretchy="false">)</mo></mrow><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>y</mi><mo stretchy="false">)</mo></mrow></mfrac><mo>=</mo><mi>P</mi><mo stretchy="false">(</mo><mi>X</mi><mo separator="true">,</mo><mi>Y</mi><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><mi>Y</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>y</mi><mi mathvariant="normal">∣</mi><mi>x</mi><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>y</mi><mo stretchy="false">)</mo></mrow></mfrac></mrow><annotation encoding="application/x-tex">P(x | y) = \frac{P(x \cap y)}{P(y)} = P(X,Y) P(Y) = \frac{P(y | x) P(x)}{P(y)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.53em;vertical-align:-0.52em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.01em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">P</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span><span class="mclose mtight">)</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.485em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">P</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight">x</span><span class="mbin mtight">∩</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.52em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.53em;vertical-align:-0.52em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.01em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">P</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span><span class="mclose mtight">)</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.485em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">P</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span><span class="mord mtight">∣</span><span class="mord mathdefault mtight">x</span><span class="mclose mtight">)</span><span class="mord mathdefault mtight" style="margin-right:0.13889em;">P</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight">x</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.52em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo>∪</mo><mi>y</mi><mo stretchy="false">)</mo><mo>=</mo><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>+</mo><mi>P</mi><mo stretchy="false">(</mo><mi>y</mi><mo stretchy="false">)</mo><mo>−</mo><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo>∩</mo><mi>y</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(x \cup y) = P(x) + P(y) - P(x \cap y)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">∪</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">∩</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mclose">)</span></span></span></span></li> -</ul> -<p>Maximum likelihood estimation:<br> -<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><mi>θ</mi><mo>^</mo></mover><mo>=</mo><msub><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mi>θ</mi></msub><mi>P</mi><mo stretchy="false">(</mo><mi>X</mi><mi mathvariant="normal">∣</mi><mi>θ</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\hat{\theta} = \argmax_{\theta} P(X | \theta)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9578799999999998em;vertical-align:0em;"></span><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.16666em;"><span class="mord">^</span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mop"><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.24196799999999993em;"><span style="top:-2.4558600000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02778em;">θ</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.24414em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span></p> -<p>Fitting a normal distribution:</p> -<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable rowspacing="0.24999999999999992em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mover accent="true"><mi>μ</mi><mo>^</mo></mover><mo separator="true">,</mo><mover accent="true"><mi>σ</mi><mo>^</mo></mover></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>μ</mi><mo separator="true">,</mo><mi>σ</mi></mrow></munder><mi>P</mi><mo stretchy="false">(</mo><msup><mi>X</mi><mn>1</mn></msup><mo separator="true">,</mo><msup><mi>X</mi><mn>2</mn></msup><mo separator="true">,</mo><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">∣</mi><mi>μ</mi><mo separator="true">,</mo><mi>σ</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>μ</mi><mo separator="true">,</mo><mi>σ</mi></mrow></munder><munder><mo>∏</mo><mi>i</mi></munder><mi>N</mi><mo stretchy="false">(</mo><msup><mi>X</mi><mi>i</mi></msup><mi mathvariant="normal">∣</mi><mi>μ</mi><mo separator="true">,</mo><mi>σ</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{aligned} - \hat{\mu}, \hat{\sigma} &= \argmax_{\mu, \sigma} P(X^1, X^2, ... | \mu, \sigma) \\ - &= \argmax_{\mu, \sigma} \prod_i N(X^i | \mu, \sigma) \\ -\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:4.822330000000001em;vertical-align:-2.1611650000000004em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.6611650000000004em;"><span style="top:-4.847062000000001em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">μ</span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.22222em;"><span class="mord">^</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.19444em;"><span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord">^</span></span></span></span></span></span></span></span></span><span style="top:-2.4665090000000003em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:2.1611650000000004em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.6611650000000004em;"><span style="top:-4.847062000000001em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">μ</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">.</span><span class="mord">.</span><span class="mord">.</span><span class="mord">∣</span><span class="mord mathdefault">μ</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mclose">)</span></span></span><span style="top:-2.4665090000000003em;"><span class="pstrut" style="height:3.0500050000000005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">μ</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0500050000000003em;"><span style="top:-1.872331em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span><span style="top:-3.050005em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∏</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.277669em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">N</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8746639999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord mathdefault">μ</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mclose">)</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:2.1611650000000004em;"><span></span></span></span></span></span></span></span></span></span></span></span></p> -<p>Probabilistic classifiers return a probability over all classes, given<br> -features.</p> -<h2 id="naive-bayesian-classifiers">(Naive) Bayesian classifiers</h2> -<p>This is a generative classifier -- learn P(X|Y) and P(Y), apply Bayes<br> -rule.</p> -<p>Choose class y that maximises P(y|x) -- the probability of class given<br> -data. Then expand using Bayes' rule. Denominator doesn't affect which<br> -class gets highest probability, so just fit models to P(x|y) and P(y)<br> -to maximise quantity P(x|y)P(y).</p> -<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable rowspacing="0.24999999999999992em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mi>c</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>y</mi><mo>∈</mo><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mo separator="true">,</mo><mi>n</mi><mi>e</mi><mi>g</mi></mrow></mrow></munder><mi>P</mi><mo stretchy="false">(</mo><mi>y</mi><mi mathvariant="normal">∣</mi><mi>x</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>y</mi><mo>∈</mo><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mo separator="true">,</mo><mi>n</mi><mi>e</mi><mi>g</mi></mrow></mrow></munder><mfrac><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mi>y</mi><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><mi>y</mi><mo stretchy="false">)</mo></mrow><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mfrac></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>y</mi><mo>∈</mo><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mo separator="true">,</mo><mi>n</mi><mi>e</mi><mi>g</mi></mrow></mrow></munder><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mi>y</mi><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><mi>y</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{aligned} -c(x) &= \argmax_{y \in {pos,neg}}P(y|x) \\ - &= \argmax_{y \in {pos,neg}}\frac{P(x|y) P(y)}{P(x)} \\ - &= \argmax_{y \in {pos,neg}}P(x|y)P(y) -\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:7.098644em;vertical-align:-3.299322em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:3.799322em;"><span style="top:-6.386322000000001em;"><span class="pstrut" style="height:3.427em;"></span><span class="mord"><span class="mord mathdefault">c</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span><span style="top:-3.628774em;"><span class="pstrut" style="height:3.427em;"></span><span class="mord"></span></span><span style="top:-1.458226em;"><span class="pstrut" style="height:3.427em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:3.299322em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:3.799322em;"><span style="top:-6.386322000000001em;"><span class="pstrut" style="height:3.427em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span><span class="mrel mtight">∈</span><span class="mord mtight"><span class="mord mathdefault mtight">p</span><span class="mord mathdefault mtight">o</span><span class="mord mathdefault mtight">s</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">n</span><span class="mord mathdefault mtight">e</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">g</span></span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mord">∣</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span><span style="top:-3.628774em;"><span class="pstrut" style="height:3.427em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span><span class="mrel mtight">∈</span><span class="mord mtight"><span class="mord mathdefault mtight">p</span><span class="mord mathdefault mtight">o</span><span class="mord mathdefault mtight">s</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">n</span><span class="mord mathdefault mtight">e</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">g</span></span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.427em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mclose">)</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.936em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span style="top:-1.458226em;"><span class="pstrut" style="height:3.427em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span><span class="mrel mtight">∈</span><span class="mord mtight"><span class="mord mathdefault mtight">p</span><span class="mord mathdefault mtight">o</span><span class="mord mathdefault mtight">s</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">n</span><span class="mord mathdefault mtight">e</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">g</span></span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mclose">)</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:3.299322em;"><span></span></span></span></span></span></span></span></span></span></span></span></p> -<p>Bayes classifier:</p> -<ul> -<li>choose probability distribution M (e.g. multivariate normal)</li> -<li>fit M<sub>pos</sub> to all positive points: P(X=x | pos) = M<sub>pos</sub>(x)</li> -<li>fit M<sub>neg</sub> to all negative points: P(X=x | neg) = M<sub>neg</sub>(x)</li> -<li>estmate P(Y) from class frequencies in the training data, or<br> -domain-specific information</li> -</ul> -<p>Naive Bayes:</p> -<ul> -<li>assume independence between all features, conditional on the class:<br> -<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><msub><mi>X</mi><mn>1</mn></msub><mo separator="true">,</mo><msub><mi>X</mi><mn>2</mn></msub><mi mathvariant="normal">∣</mi><mi>Y</mi><mo stretchy="false">)</mo><mo>=</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>X</mi><mn>1</mn></msub><mi mathvariant="normal">∣</mi><mi>Y</mi><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>X</mi><mn>2</mn></msub><mi mathvariant="normal">∣</mi><mi>Y</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(X_1, X_2 | Y) = P(X_1 | Y) P(X_2 | Y)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.07847em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.07847em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.07847em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.07847em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mclose">)</span></span></span></span></li> -<li>but, if particular value doesn't occur, we estimate the probability<br> -to be 0. and since the whole estimate of probability is a long<br> -product, if a factor becomes zero, everything becomes zero.</li> -</ul> -<p>Laplace smoothing:</p> -<ul> -<li>for each possible value, add an instance where all features have<br> -that value (e.g. one row with all trues and one row with all falses)</li> -<li>avoids collapses due to zero values</li> -</ul> -<h2 id="logistic-regression-classifier">Logistic "regression" (classifier)</h2> -<p>A discriminative classifier: learn function for P(Y|X) directly.</p> -<p>The logistic sigmoid:<br> -<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>σ</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mrow><mn>1</mn><mo>+</mo><msup><mi>e</mi><mrow><mo>−</mo><mi>t</mi></mrow></msup></mrow></mfrac><mo>=</mo><mfrac><msup><mi>e</mi><mi>t</mi></msup><mrow><mn>1</mn><mo>+</mo><msup><mi>e</mi><mi>t</mi></msup></mrow></mfrac></mrow><annotation encoding="application/x-tex">\sigma(t) = \frac{1}{1+e^{-t}} = \frac{e^t}{1+e^t}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2484389999999999em;vertical-align:-0.403331em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mbin mtight">+</span><span class="mord mtight"><span class="mord mathdefault mtight">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7253428571428571em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mathdefault mtight">t</span></span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.403331em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.406571em;vertical-align:-0.403331em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.00324em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mbin mtight">+</span><span class="mord mtight"><span class="mord mathdefault mtight">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7253428571428571em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight">t</span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathdefault mtight">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8703428571428571em;"><span style="top:-2.931em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight">t</span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.403331em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></p> -<ul> -<li>also, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>1</mn><mo>−</mo><mi>σ</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><mo>−</mo><mi>t</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">1-\sigma(t) = \sigma(-t)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord">−</span><span class="mord mathdefault">t</span><span class="mclose">)</span></span></span></span></li> -<li>fits results into interval [0,1]</li> -</ul> -<p>Classifier: compute linear function, apply logistic sigmoid to result<br> -<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>c</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><mi>w</mi><mo>⋅</mo><mi>x</mi><mo>+</mo><mi>b</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">c(x) = \sigma(w \cdot x + b)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">c</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.66666em;vertical-align:-0.08333em;"></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">b</span><span class="mclose">)</span></span></span></span></p> -<p>Loss function: log loss (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>−</mo><mi>log</mi><mo></mo><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>c</mi><mi>l</mi><mi>a</mi><mi>s</mi><mi>s</mi><mi mathvariant="normal">∣</mi><mi>f</mi><mi>e</mi><mi>a</mi><mi>t</mi><mi>u</mi><mi>r</mi><mi>e</mi><mi>s</mi><mo stretchy="false">)</mo></mrow></mrow><annotation encoding="application/x-tex">-\log{P(class |features)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">c</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">a</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="mord mathdefault">e</span><span class="mord mathdefault">a</span><span class="mord mathdefault">t</span><span class="mord mathdefault">u</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">e</span><span class="mord mathdefault">s</span><span class="mclose">)</span></span></span></span></span>)</p> -<ul> -<li>maximum likelihood objective: find classifier q that maximises<br> -probability of true classes</li> -<li>points near decision boundary get more influence than points far<br> -away (least squares does the opposite)</li> -<li>also sometimes called "cross-entropy loss"</li> -</ul> -<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable rowspacing="0.24999999999999992em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mi>q</mi></munder><munder><mo>∏</mo><mrow><mi>C</mi><mo separator="true">,</mo><mi>x</mi></mrow></munder><msub><mi>q</mi><mi>x</mi></msub><mo stretchy="false">(</mo><mi>C</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mi>q</mi></munder><mi>log</mi><mo></mo><mrow><munder><mo>∏</mo><mrow><mi>C</mi><mo separator="true">,</mo><mi>x</mi></mrow></munder><msub><mi>q</mi><mi>x</mi></msub><mo stretchy="false">(</mo><mi>C</mi><mo stretchy="false">)</mo></mrow></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg min</mi><mo></mo></mo><mi>q</mi></munder><mo>−</mo><mi>log</mi><mo></mo><mrow><munder><mo>∏</mo><mrow><mi>C</mi><mo separator="true">,</mo><mi>x</mi></mrow></munder><msub><mi>q</mi><mi>x</mi></msub><mo stretchy="false">(</mo><mi>C</mi><mo stretchy="false">)</mo></mrow></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg min</mi><mo></mo></mo><mi>q</mi></munder><munder><mo>∑</mo><mrow><mi>C</mi><mo separator="true">,</mo><mi>x</mi></mrow></munder><mo>−</mo><mi>log</mi><mo></mo><mrow><msub><mi>q</mi><mi>x</mi></msub><mo stretchy="false">(</mo><mi>C</mi><mo stretchy="false">)</mo></mrow></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg min</mi><mo></mo></mo><mi>q</mi></munder><mo>−</mo><munder><mo>∑</mo><mrow><mi>x</mi><mo>∈</mo><msub><mi>X</mi><mi>p</mi></msub></mrow></munder><mi>log</mi><mo></mo><mrow><msub><mi>q</mi><mi>x</mi></msub><mo stretchy="false">(</mo><mi>P</mi><mo stretchy="false">)</mo></mrow><mo>−</mo><munder><mo>∑</mo><mrow><mi>x</mi><mo>∈</mo><msub><mi>X</mi><mi>N</mi></msub></mrow></munder><mi>log</mi><mo></mo><mrow><msub><mi>q</mi><mi>x</mi></msub><mo stretchy="false">(</mo><mi>N</mi><mo stretchy="false">)</mo></mrow></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{aligned} - \argmax_q \prod_{C,x}q_x(C) &= \argmax_{q}\log{\prod_{C,x}q_x(C)} \\ - &= \argmin_{q}-\log{\prod_{C,x} q_x (C)} \\ - &= \argmin_q \sum_{C,x} - \log{q_x (C)} \\ - &= \argmin_q - \sum_{x \in X_p} \log{q_x (P)} - \sum_{x \in X_N} \log{q_x (N)} -\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:11.183008000000001em;vertical-align:-5.341504000000001em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:5.841504em;"><span style="top:-7.8415040000000005em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">q</span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8556639999999998em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.07153em;">C</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∏</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.430444em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07153em;">C</span><span class="mclose">)</span></span></span><span style="top:-5.0610550000000005em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"></span></span><span style="top:-2.2806059999999997em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"></span></span><span style="top:0.49984300000000115em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:5.341504000000001em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:5.841504em;"><span style="top:-7.8415040000000005em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">q</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8556639999999998em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.07153em;">C</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∏</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.430444em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07153em;">C</span><span class="mclose">)</span></span></span></span><span style="top:-5.0610550000000005em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.66786em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">q</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">i</span><span class="mord mathrm">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8556639999999998em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.07153em;">C</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∏</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.430444em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07153em;">C</span><span class="mclose">)</span></span></span></span><span style="top:-2.2806059999999997em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.66786em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">q</span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">i</span><span class="mord mathrm">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8556639999999998em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.07153em;">C</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.430444em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07153em;">C</span><span class="mclose">)</span></span></span></span><span style="top:0.49984300000000115em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.66786em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">q</span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">i</span><span class="mord mathrm">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8556639999999998em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span><span class="mrel mtight">∈</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16454285714285716em;"><span style="top:-2.357em;margin-left:-0.07847em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight">p</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2818857142857143em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.491656em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mclose">)</span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.855664em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span><span class="mrel mtight">∈</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.07847em;">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em;"><span style="top:-2.3567071428571427em;margin-left:-0.07847em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight" style="margin-right:0.10903em;">N</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.14329285714285717em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.394641em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.10903em;">N</span><span class="mclose">)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:5.341504000000001em;"><span></span></span></span></span></span></span></span></span></span></span></span></p> -<p>where:</p> -<ul> -<li>x: some data point</li> -<li>q<sub>x</sub>: our classifier q<sub>x</sub>(C) = q(C|x)</li> -</ul> -<p>Problem: if the classes are well separable linearly, there are many<br> -suitable classifiers, and logistic regression has no reason to prefer<br> -one over the other.</p> -<h2 id="information-theory">Information theory</h2> -<p>The relation between encoding information and probability theory.</p> -<p>Prefix-free trees assign prefix free code to set of outcomes. Benefit is<br> -that no delimiters necessary in bit/codeword string.</p> -<p>Arithmetic coding - if allow L(x) (length of code for x) to take<br> -non-integer values, we can equate codes with probability distributions.</p> -<p>Entropy of distribution: expected codelength of an element sampled from<br> -that distribution.</p> -<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable rowspacing="0.24999999999999992em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mi>H</mi><mo stretchy="false">(</mo><mi>p</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><msub><mi>E</mi><mi>p</mi></msub><mi>L</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo>∑</mo><mrow><mi>x</mi><mo>∈</mo><mi>X</mi></mrow></munder><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mi>L</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mo>−</mo><munder><mo>∑</mo><mrow><mi>x</mi><mo>∈</mo><mi>X</mi></mrow></munder><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mi>log</mi><mo></mo><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{aligned} - H(p) &= E_p L(x) \\ - &= \sum_{x \in X} P(x)L(x) \\ - &= - \sum_{x \in X} P(x) \log{P(x)} -\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:6.843422000000001em;vertical-align:-3.171711000000001em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:3.671711em;"><span style="top:-5.881716000000001em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault">p</span><span class="mclose">)</span></span></span><span style="top:-4.171710999999999em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"></span></span><span style="top:-1.4999999999999991em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:3.171711000000001em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:3.671711em;"><span style="top:-5.881716000000001em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05764em;">E</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-left:-0.05764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">p</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mord mathdefault">L</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span><span style="top:-4.171710999999999em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8556639999999998em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span><span class="mrel mtight">∈</span><span class="mord mathdefault mtight" style="margin-right:0.07847em;">X</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.321706em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mord mathdefault">L</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span><span style="top:-1.4999999999999991em;"><span class="pstrut" style="height:3.050005em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8556639999999998em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span><span class="mrel mtight">∈</span><span class="mord mathdefault mtight" style="margin-right:0.07847em;">X</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.321706em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:3.171711000000001em;"><span></span></span></span></span></span></span></span></span></span></span></span></p> -<p>Cross entropy: expected codelength if we use q, but data comes from p.</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>H</mi><mo stretchy="false">(</mo><mi>p</mi><mo separator="true">,</mo><mi>q</mi><mo stretchy="false">)</mo><mo>=</mo><msub><mi>E</mi><mi>p</mi></msub><msup><mi>L</mi><mi>q</mi></msup><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mo>−</mo><msub><mo>∑</mo><mrow><mi>x</mi><mo>∈</mo><mi>X</mi></mrow></msub><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mi>log</mi><mo></mo><mrow><mi>q</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mrow><annotation encoding="application/x-tex">H(p, q) = E_p L^q(x) = - \sum_{x \in X} p(x) \log{q(x)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault">p</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.036108em;vertical-align:-0.286108em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05764em;">E</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-left:-0.05764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">p</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault">L</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.664392em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">q</span></span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.07708em;vertical-align:-0.32708000000000004em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.17862099999999992em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span><span class="mrel mtight">∈</span><span class="mord mathdefault mtight" style="margin-right:0.07847em;">X</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.32708000000000004em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">p</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span></span></span></p> -<p>Kulback-Leibler divergence: expected difference in codelength between p<br> -and q. in other words, differencein expected codelength.</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mi>L</mi><mo stretchy="false">(</mo><mi>p</mi><mo separator="true">,</mo><mi>q</mi><mo stretchy="false">)</mo><mo>=</mo><mi>H</mi><mo stretchy="false">(</mo><mi>p</mi><mo separator="true">,</mo><mi>q</mi><mo stretchy="false">)</mo><mo>−</mo><mi>H</mi><mo stretchy="false">(</mo><mi>p</mi><mo stretchy="false">)</mo><mo>=</mo><mo>−</mo><msub><mo>∑</mo><mrow><mi>x</mi><mo>∈</mo><mi>X</mi></mrow></msub><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mi>log</mi><mo></mo><mfrac><mrow><mi>q</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mfrac></mrow><annotation encoding="application/x-tex">KL(p,q) = H(p,q) - H(p) = - \sum_{x \in X} p(x) \log{\frac{q(x)}{p(x)}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="mord mathdefault">L</span><span class="mopen">(</span><span class="mord mathdefault">p</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault">p</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">q</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault">p</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.53em;vertical-align:-0.52em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.17862099999999992em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span><span class="mrel mtight">∈</span><span class="mord mathdefault mtight" style="margin-right:0.07847em;">X</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.32708000000000004em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">p</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.01em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">p</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight">x</span><span class="mclose mtight">)</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.485em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">q</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight">x</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.52em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></p> -<h3 id="maximum-likelihood">Maximum likelihood</h3> -<p>The maximum likelihood is the model with the highest probability. Selects the model that is most suitable given the observed data.</p> -<p>(Log) likelihood: what we maximise to fit a probability model</p> -<p>Loss: what we minimise to find a machine learning model</p> -<h3 id="normal-distributions-gaussians">Normal distributions (Gaussians)</h3> -<h4 id="1d-normal-distribution-gaussian">1D normal distribution (Gaussian)</h4> -<p>Has a mean μ and standard deviation σ.</p> -<p>Not a probability function, but a probability <em>density</em> function. The only things on the graph that have probability are intervals, so to find probability, you integrate over the probability density function.</p> -<p>Definition: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>N</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mi>μ</mi><mo separator="true">,</mo><mi>σ</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><msqrt><mrow><mn>2</mn><mi>π</mi><msup><mi>σ</mi><mn>2</mn></msup></mrow></msqrt></mfrac><mi>exp</mi><mo></mo><mrow><mo stretchy="false">[</mo><mo>−</mo><mfrac><mn>1</mn><mrow><mn>2</mn><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>μ</mi><msup><mo stretchy="false">)</mo><mn>2</mn></msup><mo stretchy="false">]</mo></mrow></mrow><annotation encoding="application/x-tex">N(x | \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^{2}}} \exp{[ -\frac{1}{2\sigma^2} (x-\mu)^2 ]}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">N</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord mathdefault">μ</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.383108em;vertical-align:-0.5379999999999999em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.5153525em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord sqrt mtight"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9637821428571429em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mtight" style="padding-left:0.833em;"><span class="mord mtight">2</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">π</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7463142857142857em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span><span style="top:-2.923782142857143em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail mtight" style="min-width:0.853em;height:1.08em;"><svg width='400em' height='1.08em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702 -c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14 -c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54 -c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10 -s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429 -c69,-144,104.5,-217.7,106.5,-221 -l0 -0 -c5.3,-9.3,12,-14,20,-14 -H400000v40H845.2724 -s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7 -c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z -M834 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.07621785714285711em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.5379999999999999em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">exp</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mopen">[</span><span class="mord">−</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7463142857142857em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">μ</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mclose">]</span></span></span></span></span></p> -<p>Maximum likelihood for the mean:</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable rowspacing="0.24999999999999992em" columnalign="right left right left" columnspacing="0em 1em 0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mi>θ</mi></munder><mi>log</mi><mo></mo><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mi>θ</mi><mo stretchy="false">)</mo></mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mi>θ</mi></munder><mi>ln</mi><mo></mo><mrow><munder><mo>∏</mo><mrow><mi>x</mi><mo>∈</mo><mi>x</mi></mrow></munder><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mi>θ</mi><mo stretchy="false">)</mo></mrow></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mi>θ</mi></munder><munder><mo>∑</mo><mi>x</mi></munder><mrow><mi>ln</mi><mo></mo><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mi>θ</mi></mrow><mo stretchy="false">)</mo></mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>(because product in log is sum outside of log)</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>μ</mi><mo separator="true">,</mo><mi>σ</mi></mrow></munder><munder><mo>∑</mo><mi>x</mi></munder><mi>ln</mi><mo></mo><mfrac><mn>1</mn><msqrt><mrow><mn>2</mn><mi>π</mi><msup><mi>σ</mi><mn>2</mn></msup></mrow></msqrt></mfrac><mi>exp</mi><mo></mo><mo fence="false">⌊</mo><mo>−</mo><mfrac><mn>1</mn><mrow><mn>2</mn><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>μ</mi><msup><mo stretchy="false">)</mo><mn>2</mn></msup><mo fence="false">⌋</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>(fill in the formula)</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>μ</mi><mo separator="true">,</mo><mi>σ</mi></mrow></munder><munder><mo>∑</mo><mi>x</mi></munder><mi>ln</mi><mo></mo><mfrac><mn>1</mn><msqrt><mrow><mn>2</mn><mi>π</mi><msup><mi>σ</mi><mn>2</mn></msup></mrow></msqrt></mfrac><mo>−</mo><mfrac><mn>1</mn><mrow><mn>2</mn><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>μ</mi><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>ln</mi><mo></mo><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mi>θ</mi><mo stretchy="false">)</mo></mrow><mrow><mi mathvariant="normal">∂</mi><mi>μ</mi></mrow></mfrac></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo>∑</mo><mi>x</mi></munder><mfrac><mrow><mi mathvariant="normal">∂</mi><mo fence="false">[</mo><mi>ln</mi><mo></mo><mfrac><mn>1</mn><msqrt><mrow><mn>2</mn><mi>π</mi><msup><mi>σ</mi><mn>2</mn></msup></mrow></msqrt></mfrac><mo>−</mo><mfrac><mn>1</mn><mrow><mn>2</mn><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>μ</mi><msup><mo stretchy="false">)</mo><mn>2</mn></msup><mo fence="false">]</mo></mrow><mrow><mi mathvariant="normal">∂</mi><mi>μ</mi></mrow></mfrac></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>(because we want to maximise it)</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mo>−</mo><mfrac><mn>1</mn><mrow><mn>2</mn><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac><munder><mo>∑</mo><mi>x</mi></munder><mfrac><mrow><mi mathvariant="normal">∂</mi><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>μ</mi><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow><mrow><mi mathvariant="normal">∂</mi><mi>μ</mi></mrow></mfrac></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mo>−</mo><mfrac><mn>1</mn><msup><mi>σ</mi><mn>2</mn></msup></mfrac><munder><mo>∑</mo><mi>x</mi></munder><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>μ</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mo>−</mo><mfrac><mn>1</mn><msup><mi>σ</mi><mn>2</mn></msup></mfrac><munder><mo>∑</mo><mi>x</mi></munder><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>μ</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mn>0</mn></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>(because the max/min is where the derivative is 0)</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><munder><mo>∑</mo><mi>x</mi></munder><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>μ</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mn>0</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mo>−</mo><mi>μ</mi><mi>n</mi><mo>+</mo><munder><mo>∑</mo><mi>x</mi></munder><mi>x</mi></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mn>0</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mi>μ</mi></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><munder><mo>∑</mo><mi>x</mi></munder><mi>x</mi></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>(i.e. the arithmetic mean)</mtext></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{aligned} - \argmax_{\theta} \log{p(x | \theta)} &= \argmax_{\theta} \ln{\prod_{x \in x} p(x|\theta)} \\ - &= \argmax_{\theta} \sum_{x}{\ln{p(x|\theta})} &&\text{(because product in log is sum outside of log)}\\ - &= \argmax_{\mu, \sigma} \sum_{x}\ln{\frac{1}{\sqrt{2\pi\sigma^2}}} \exp \big\lfloor -\frac{1}{2\sigma^2} (x-\mu)^2 \big\rfloor &&\text{(fill in the formula)}\\ - &= \argmax_{\mu, \sigma} \sum_{x}\ln{\frac{1}{\sqrt{2\pi\sigma^2}}} - \frac{1}{2\sigma^2} (x-\mu)^2 \\ -\frac{\partial \ln P(x|\theta)}{\partial \mu} &= \sum_{x} \frac{\partial \big[ \ln{\frac{1}{\sqrt{2\pi\sigma^2}}} - \frac{1}{2\sigma^2} (x-\mu)^2 \big]}{\partial \mu} &&\text{(because we want to maximise it)}\\ - &= -\frac{1}{2\sigma^2} \sum_{x} \frac{\partial (x-\mu)^2}{\partial \mu} \\ - &= -\frac{1}{\sigma^2} \sum_{x} (x-\mu) \\ - -\frac{1}{\sigma^2} \sum_{x} (x-\mu) &= 0 &&\text{(because the max/min is where the derivative is 0)} \\ - \sum_{x} (x-\mu) &= 0 \\ - -\mu n + \sum_{x} x &= 0 \\ - \mu &= \frac{1}{n} \sum_{x} x &&\text{(i.e. the arithmetic mean)} -\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:31.15375299999999em;vertical-align:-15.326876499999996em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:15.826876499999996em;"><span style="top:-18.554871499999994em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.153452em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02778em;">θ</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.946548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault">p</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span><span style="top:-15.927491499999995em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"></span></span><span style="top:-13.056046499999997em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"></span></span><span style="top:-10.184601499999998em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"></span></span><span style="top:-6.856596499999997em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.427em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault">μ</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.8804400000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span style="top:-3.8154834999999947em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"></span></span><span style="top:-0.9440384999999951em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"></span></span><span style="top:1.9274065000000027em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord">−</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8999949999999999em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.250005em;"><span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">μ</span><span class="mclose">)</span></span></span><span style="top:4.527416499999999em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8999949999999999em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.250005em;"><span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">μ</span><span class="mclose">)</span></span></span><span style="top:7.127426499999995em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord">−</span><span class="mord mathdefault">μ</span><span class="mord mathdefault">n</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8999949999999999em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.250005em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">x</span></span></span><span style="top:9.998871499999991em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord mathdefault">μ</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:15.326876499999992em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:15.826876499999996em;"><span style="top:-18.554871499999994em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.153452em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02778em;">θ</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.946548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8999949999999997em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span><span class="mrel mtight">∈</span><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∏</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.2773750000000001em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">p</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span></span></span></span><span style="top:-15.927491499999995em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.153452em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02778em;">θ</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.946548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8999949999999999em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.250005em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault">p</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span></span><span class="mclose">)</span></span></span></span><span style="top:-13.056046499999997em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">μ</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8999949999999999em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.250005em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.154946em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9550540000000001em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord" style="padding-left:0.833em;"><span class="mord">2</span><span class="mord mathdefault" style="margin-right:0.03588em;">π</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-2.915054em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em;"><svg width='400em' height='1.08em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702 -c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14 -c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54 -c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10 -s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429 -c69,-144,104.5,-217.7,106.5,-221 -l0 -0 -c5.3,-9.3,12,-14,20,-14 -H400000v40H845.2724 -s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7 -c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z -M834 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.08494599999999997em;"><span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.93em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">exp</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="delimsizing size1">⌊</span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">μ</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mord"><span class="delimsizing size1">⌋</span></span></span></span><span style="top:-10.184601499999998em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999994em;"><span style="top:-2.20556em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">μ</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.030548em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8999949999999999em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.250005em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.154946em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9550540000000001em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord" style="padding-left:0.833em;"><span class="mord">2</span><span class="mord mathdefault" style="margin-right:0.03588em;">π</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-2.915054em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em;"><svg width='400em' height='1.08em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702 -c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14 -c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54 -c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10 -s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429 -c69,-144,104.5,-217.7,106.5,-221 -l0 -0 -c5.3,-9.3,12,-14,20,-14 -H400000v40H845.2724 -s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7 -c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z -M834 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.08494599999999997em;"><span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.93em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">μ</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-6.856596499999997em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8999949999999999em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.250005em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.778em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault">μ</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.928em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord"><span class="delimsizing size1">[</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.5153525em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord sqrt mtight"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9637821428571429em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mtight" style="padding-left:0.833em;"><span class="mord mtight">2</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">π</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7463142857142857em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-2.923782142857143em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail mtight" style="min-width:0.853em;height:1.08em;"><svg width='400em' height='1.08em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702 -c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14 -c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54 -c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10 -s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429 -c69,-144,104.5,-217.7,106.5,-221 -l0 -0 -c5.3,-9.3,12,-14,20,-14 -H400000v40H845.2724 -s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7 -c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z -M834 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.07621785714285711em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.5379999999999999em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7463142857142857em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">μ</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mord"><span class="delimsizing size1">]</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.8804400000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span style="top:-3.8154834999999947em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">−</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8999949999999999em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.250005em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.491108em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault">μ</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">μ</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.8804400000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span style="top:-0.9440384999999951em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">−</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8999949999999999em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.250005em;"><span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">μ</span><span class="mclose">)</span></span></span><span style="top:1.9274065000000027em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">0</span></span></span><span style="top:4.527416499999999em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">0</span></span></span><span style="top:7.127426499999995em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">0</span></span></span><span style="top:9.998871499999991em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">n</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.050005em;"><span style="top:-1.8999949999999999em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.0500049999999996em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.250005em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">x</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:15.326876499999992em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:1em;"></span><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:13.199496499999995em;"><span style="top:-15.927491499999995em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"></span></span><span style="top:-13.056046499999997em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"></span></span><span style="top:-6.856596499999997em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"></span></span><span style="top:1.9274065000000036em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"></span></span><span style="top:9.998871499999995em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:15.326876499999996em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:13.199496499999995em;"><span style="top:-15.927491499999995em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">(because product in log is sum outside of log)</span></span></span></span><span style="top:-13.056046499999997em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">(fill in the formula)</span></span></span></span><span style="top:-6.856596499999997em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">(because we want to maximise it)</span></span></span></span><span style="top:1.9274065000000036em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">(because the max/min is where the derivative is 0)</span></span></span></span><span style="top:9.998871499999995em;"><span class="pstrut" style="height:3.778em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">(i.e. the arithmetic mean)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:15.326876499999996em;"><span></span></span></span></span></span></span></span></span></span></span></p> -<p>The implication is that the maximum likelihood estimator for the mean of normal distribution is the mean of the data.</p> -<h4 id="regression-with-gaussian-errors">Regression with Gaussian errors</h4> -<p>For a regression <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>y</mi><mo>=</mo><msup><mi>x</mi><mi>T</mi></msup><mi>w</mi><mo>+</mo><mi>b</mi><mo>+</mo><mi>E</mi></mrow><annotation encoding="application/x-tex">y = x^{T} w + b + E</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.924661em;vertical-align:-0.08333em;"></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.77777em;vertical-align:-0.08333em;"></span><span class="mord mathdefault">b</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05764em;">E</span></span></span></span>, where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>E</mi><mo>∼</mo><mi>N</mi><mo stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mi>σ</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">E \sim N(0, \sigma)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05764em;">E</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">∼</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">N</span><span class="mopen">(</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mclose">)</span></span></span></span></p> -<p>If we want to maximise the likelihood of the parameters of the line, given some data:</p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable rowspacing="0.24999999999999992em" columnalign="right left right left" columnspacing="0em 1em 0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>w</mi><mo separator="true">,</mo><mi>b</mi></mrow></munder><mi>P</mi><mo stretchy="false">(</mo><mi>Y</mi><mi mathvariant="normal">∣</mi><mi>X</mi><mo separator="true">,</mo><mi>w</mi><mo separator="true">,</mo><mi>b</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>w</mi><mo separator="true">,</mo><mi>b</mi></mrow></munder><mi>ln</mi><mo></mo><mrow><munder><mo>∏</mo><mi>i</mi></munder><mi>N</mi><mo stretchy="false">(</mo><msub><mi>y</mi><mi>i</mi></msub><mi mathvariant="normal">∣</mi><msubsup><mi>x</mi><mi>i</mi><mi>T</mi></msubsup><mi>w</mi><mo>+</mo><mi>b</mi><mo separator="true">,</mo><mi>σ</mi><mo stretchy="false">)</mo></mrow></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>w</mi><mo separator="true">,</mo><mi>b</mi></mrow></munder><munder><mo>∑</mo><mi>i</mi></munder><mi>ln</mi><mo></mo><mfrac><mn>1</mn><msqrt><mrow><mn>2</mn><mi>π</mi><msup><mi>σ</mi><mn>2</mn></msup></mrow></msqrt></mfrac><mi>exp</mi><mo></mo><mo fence="false">[</mo><mo>−</mo><mfrac><mn>1</mn><mrow><mn>2</mn><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac><mo stretchy="false">(</mo><msubsup><mi>x</mi><mi>i</mi><mi>T</mi></msubsup><mi>w</mi><mo>+</mo><mi>b</mi><mo>−</mo><msub><mi>y</mi><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup><mo fence="false">]</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>(just fill in the formula)</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>w</mi><mo separator="true">,</mo><mi>b</mi></mrow></munder><mo>−</mo><munder><mo>∑</mo><mi>i</mi></munder><mfrac><mn>1</mn><mrow><mn>2</mn><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac><mo stretchy="false">(</mo><msubsup><mi>x</mi><mi>i</mi><mi>T</mi></msubsup><mi>w</mi><mo>+</mo><mi>b</mi><mo>−</mo><msub><mi>y</mi><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>(because the ln doesn’t matter for argmax)</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><mi>w</mi><mo separator="true">,</mo><mi>b</mi></mrow></munder><mo>−</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><munder><mo>∑</mo><mi>i</mi></munder><mo stretchy="false">(</mo><msubsup><mi>x</mi><mi>i</mi><mi>T</mi></msubsup><mi>w</mi><mo>+</mo><mi>b</mi><mo>−</mo><msub><mi>y</mi><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>(because the stdev doesn’t impact the result)</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><munder><mo><mi mathvariant="normal">arg min</mi><mo></mo></mo><mrow><mi>w</mi><mo separator="true">,</mo><mi>b</mi></mrow></munder><mfrac><mn>1</mn><mn>2</mn></mfrac><munder><mo>∑</mo><mi>i</mi></munder><mo stretchy="false">(</mo><msubsup><mi>x</mi><mi>i</mi><mi>T</mi></msubsup><mi>w</mi><mo>+</mo><mi>b</mi><mo>−</mo><msub><mi>y</mi><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>(which is the least squares function)</mtext></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{aligned} -\argmax_{w,b} P(Y|X,w,b) &= \argmax_{w,b} \ln{\prod_{i} N(y_i | x_{i}^{T} w + b, \sigma)} \\ - &= \argmax_{w,b} \sum_{i} \ln{\frac{1}{\sqrt{2\pi\sigma^2}}} \exp \Big[ -\frac{1}{2\sigma^2} (x_{i}^{T} w + b - y_i)^2 \Big] &&\text{(just fill in the formula)}\\ - &= \argmax_{w,b} -\sum_{i} \frac{1}{2\sigma^2} (x_{i}^{T} w + b - y_i)^2 &&\text{(because the ln doesn't matter for argmax)}\\ - &= \argmax_{w,b} -\frac{1}{2} \sum_{i} (x_{i}^{T} w + b - y_i)^2 &&\text{(because the stdev doesn't impact the result)}\\ - &= \argmin_{w,b} \frac{1}{2} \sum_{i} (x_{i}^{T} w + b - y_i)^2 &&\text{(which is the least squares function)}\\ -\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:14.22411em;vertical-align:-6.862055em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:7.362055em;"><span style="top:-9.633489999999998em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999983em;"><span style="top:-2.153452em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">b</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.082656em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">b</span><span class="mclose">)</span></span></span><span style="top:-6.734381em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"></span></span><span style="top:-3.835272em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"></span></span><span style="top:-0.9361629999999994em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"></span></span><span style="top:1.9629459999999999em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:6.862055em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:7.362055em;"><span style="top:-9.633489999999998em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999983em;"><span style="top:-2.153452em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">b</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.082656em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0500050000000003em;"><span style="top:-1.872331em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.050005em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∏</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.277669em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">N</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">b</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mclose">)</span></span></span></span><span style="top:-6.734381em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999983em;"><span style="top:-2.153452em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">b</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.082656em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0500050000000003em;"><span style="top:-1.872331em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.050005em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.277669em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.154946em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9550540000000001em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord" style="padding-left:0.833em;"><span class="mord">2</span><span class="mord mathdefault" style="margin-right:0.03588em;">π</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-2.915054em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em;"><svg width='400em' height='1.08em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702 -c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14 -c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54 -c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10 -s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429 -c69,-144,104.5,-217.7,106.5,-221 -l0 -0 -c5.3,-9.3,12,-14,20,-14 -H400000v40H845.2724 -s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7 -c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z -M834 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.08494599999999997em;"><span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.93em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">exp</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="delimsizing size2">[</span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">b</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mord"><span class="delimsizing size2">]</span></span></span></span><span style="top:-3.835272em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999983em;"><span style="top:-2.153452em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">b</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.082656em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0500050000000003em;"><span style="top:-1.872331em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.050005em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.277669em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.740108em;"><span style="top:-2.9890000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">b</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:-0.9361629999999994em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.43055999999999983em;"><span style="top:-2.153452em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">b</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.082656em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">−</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0500050000000003em;"><span style="top:-1.872331em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.050005em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.277669em;"><span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">b</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span><span style="top:1.9629459999999999em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.6678600000000001em;"><span style="top:-2.153452em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">b</span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">i</span><span class="mord mathrm">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.0826559999999998em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0500050000000003em;"><span style="top:-1.872331em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.050005em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.277669em;"><span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">b</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.03588em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:6.862055em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:1em;"></span><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:4.734381em;"><span style="top:-6.734381em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"></span></span><span style="top:-3.835272em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"></span></span><span style="top:-0.9361629999999994em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"></span></span><span style="top:1.9629459999999999em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:6.862055em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:4.734381em;"><span style="top:-6.734381em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">(just fill in the formula)</span></span></span></span><span style="top:-3.835272em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">(because the ln doesn’t matter for argmax)</span></span></span></span><span style="top:-0.9361629999999994em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">(because the stdev doesn’t impact the result)</span></span></span></span><span style="top:1.9629459999999999em;"><span class="pstrut" style="height:3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">(which is the least squares function)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:6.862055em;"><span></span></span></span></span></span></span></span></span></span></span></p> -<p>So that's why least squares assumes a normal distribution.</p> -<h4 id="n-d-normal-distribution-multivariate-gaussian">n-D normal distribution (multivariate Gaussian)</h4> -<p>The formula: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>N</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><mi>μ</mi><mo separator="true">,</mo><mi mathvariant="normal">Σ</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><msqrt><mrow><mo stretchy="false">(</mo><mn>2</mn><mi>π</mi><msup><mo stretchy="false">)</mo><mi>d</mi></msup><mi mathvariant="normal">∣</mi><mi mathvariant="normal">Σ</mi><mi mathvariant="normal">∣</mi></mrow></msqrt></mfrac><mi>exp</mi><mo></mo><mo fence="false">[</mo><mo>−</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>μ</mi><msup><mo stretchy="false">)</mo><mi>T</mi></msup><msup><mi mathvariant="normal">Σ</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><mi>μ</mi><mo stretchy="false">)</mo><mo fence="false">]</mo></mrow><annotation encoding="application/x-tex">N(x | \mu, \Sigma) = \frac{1}{\sqrt{(2\pi)^d |\Sigma |}} \exp \Big[ -\frac{1}{2} (x-\mu)^{T} \Sigma^{-1} (x-\mu) \Big]</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">N</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord mathdefault">μ</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">Σ</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.9796em;vertical-align:-0.8296000000000001em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.4529525em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord sqrt mtight"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.052925em;"><span class="svg-align" style="top:-3.428571428571429em;"><span class="pstrut" style="height:3.428571428571429em;"></span><span class="mord mtight" style="padding-left:1.19em;"><span class="mopen mtight">(</span><span class="mord mtight">2</span><span class="mord mathdefault mtight" style="margin-right:0.03588em;">π</span><span class="mclose mtight"><span class="mclose mtight">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7820285714285713em;"><span style="top:-2.786em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight">d</span></span></span></span></span></span></span></span><span class="mord mtight">∣</span><span class="mord mtight">Σ</span><span class="mord mtight">∣</span></span></span><span style="top:-3.0249250000000005em;"><span class="pstrut" style="height:3.428571428571429em;"></span><span class="hide-tail mtight" style="min-width:0.853em;height:1.5428571428571431em;"><svg width='400em' height='1.5428571428571431em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702 -c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14 -c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54 -c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10 -s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429 -c69,-144,104.5,-217.7,106.5,-221 -l0 -0 -c5.3,-9.3,12,-14,20,-14 -H400000v40H845.2724 -s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7 -c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z -M834 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.4036464285714285em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.8296000000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">exp</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="delimsizing size2">[</span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.0913309999999998em;vertical-align:-0.25em;"></span><span class="mord mathdefault">μ</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span><span class="mord"><span class="mord">Σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.80002em;vertical-align:-0.65002em;"></span><span class="mord mathdefault">μ</span><span class="mclose">)</span><span class="mord"><span class="delimsizing size2">]</span></span></span></span></span></p> -<h4 id="gaussian-mixture-model">Gaussian mixture model</h4> -<p>Basically, combine Gaussians to represent more complex shapes.</p> -<p>Example with three components:</p> -<ul> -<li>three components: N(μ₁, Σ₁), N(μ₂, Σ₂), N(μ₃, Σ₃)</li> -<li>three weights: w₁, w₂, w₃ with <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>∑</mo><msub><mi>w</mi><mi>i</mi></msub><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">\sum w_{i} = 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.00001em;vertical-align:-0.25001em;"></span><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span></span></span></span></li> -</ul> -<p>Maximum likelihood:<br> -<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo><mi mathvariant="normal">arg max</mi><mo></mo></mo><mrow><msub><mi>w</mi><mi>i</mi></msub><mo separator="true">,</mo><msub><mi>μ</mi><mi>i</mi></msub><mo separator="true">,</mo><msub><mi mathvariant="normal">Σ</mi><mi>i</mi></msub></mrow></msub><msub><mo>∑</mo><mi>x</mi></msub><mrow><mi>ln</mi><mo></mo><msub><mo>∑</mo><mi>i</mi></msub><mrow><mi>N</mi><mo stretchy="false">(</mo><mi>x</mi><mi mathvariant="normal">∣</mi><msub><mi>μ</mi><mi>i</mi></msub><mo separator="true">,</mo><msub><mi mathvariant="normal">Σ</mi><mi>i</mi></msub><mo stretchy="false">)</mo></mrow></mrow></mrow><annotation encoding="application/x-tex">\argmax_{w_{i}, \mu_{i}, \Sigma_{i}} \sum_{x} {\ln \sum_{i}{ N(x | \mu_{i}, \Sigma_{i})}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.130248em;vertical-align:-0.380248em;"></span><span class="mop"><span class="mop"><span class="mord mathrm">a</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">g</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">x</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.23419099999999998em;"><span style="top:-2.4558600000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:-0.02691em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span><span class="mpunct mtight">,</span><span class="mord mtight"><span class="mord mathdefault mtight">μ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span><span class="mpunct mtight">,</span><span class="mord mtight"><span class="mord mtight">Σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.380248em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.0016819999999999613em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10903em;">N</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mord">∣</span><span class="mord"><span class="mord mathdefault">μ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord">Σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span></span></p> -<h3 id="expectation-maximisation">Expectation-maximisation</h3> -<p>Finding maximum-likelihood is hard if there are hidden variables (not observed) that affect those that are in the dataset. For example, if the hidden variables come from mixture models (you don't know their specific distribution). This can be used to fit <em>any</em> hidden variable model.</p> -<p>Key insight: can't optimise both θ and z, but given some θ, can compute P(z|x), and given z, can optimise θ.</p> -<p>Intuition:</p> -<ol> -<li>Initialize components randomly</li> -<li>loop: -<ul> -<li>expectation: assign soft responsibilities to each point. i.e., points "belong" to each Gaussian "to some degree"; each Gaussian takes a certain <em>responsibility</em> for each point.</li> -<li>maximisation: fit components to the data, weighted by responsibility.</li> -</ul> -</li> -</ol> -<p>Definition of "responsibility": <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msubsup><mi>r</mi><mi>x</mi><mi>i</mi></msubsup><mo>=</mo><mfrac><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>z</mi><mo>=</mo><mi>i</mi><mi mathvariant="normal">∣</mi><mi>x</mi><mo stretchy="false">)</mo></mrow><mrow><msub><mo>∑</mo><mi>j</mi></msub><mi>P</mi><mo stretchy="false">(</mo><mi>z</mi><mo>=</mo><mi>j</mi><mi mathvariant="normal">∣</mi><mi>x</mi><mo stretchy="false">)</mo></mrow></mfrac></mrow><annotation encoding="application/x-tex">r_{x}^{i} = \frac{P(z=i | x)}{\sum_{j} P(z=j | x)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.071664em;vertical-align:-0.247em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-2.4530000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.677227em;vertical-align:-0.667227em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.01em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mop mtight"><span class="mop op-symbol small-op mtight" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.14964714285714287em;"><span style="top:-2.1785614285714283em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.46032428571428574em;"><span></span></span></span></span></span></span><span class="mspace mtight" style="margin-right:0.19516666666666668em;"></span><span class="mord mathdefault mtight" style="margin-right:0.13889em;">P</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight" style="margin-right:0.04398em;">z</span><span class="mrel mtight">=</span><span class="mord mathdefault mtight" style="margin-right:0.05724em;">j</span><span class="mord mtight">∣</span><span class="mord mathdefault mtight">x</span><span class="mclose mtight">)</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.485em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">P</span><span class="mopen mtight">(</span><span class="mord mathdefault mtight" style="margin-right:0.04398em;">z</span><span class="mrel mtight">=</span><span class="mord mathdefault mtight">i</span><span class="mord mtight">∣</span><span class="mord mathdefault mtight">x</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.667227em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span><br> -Model parameters, given responsibilities:</p> -<ul> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>n</mi><mi>i</mi></msub><mo>=</mo><msub><mo>∑</mo><mi>x</mi></msub><msubsup><mi>r</mi><mi>x</mi><mi>i</mi></msubsup></mrow><annotation encoding="application/x-tex">n_i = \sum_{x} r_{x}^{i}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.124374em;vertical-align:-0.29971000000000003em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.0016819999999999613em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-2.4530000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span></span></span></li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>μ</mi><mi>i</mi></msub><mo>=</mo><mfrac><mn>1</mn><msub><mi>n</mi><mi>i</mi></msub></mfrac><msub><mo>∑</mo><mi>x</mi></msub><msubsup><mi>r</mi><mi>x</mi><mi>i</mi></msubsup><mi>x</mi></mrow><annotation encoding="application/x-tex">\mu_i = \frac{1}{n_i} \sum_{x} r_{x}^{i} x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault">μ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2902079999999998em;vertical-align:-0.44509999999999994em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.44509999999999994em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.0016819999999999613em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-2.4530000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mord mathdefault">x</span></span></span></span></li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="normal">Σ</mi><mi>i</mi></msub><mo>=</mo><mfrac><mn>1</mn><msub><mi>n</mi><mi>i</mi></msub></mfrac><msub><mo>∑</mo><mi>x</mi></msub><msubsup><mi>r</mi><mi>x</mi><mi>i</mi></msubsup><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><msub><mi>μ</mi><mi>i</mi></msub><mo stretchy="false">)</mo><mo stretchy="false">(</mo><mi>x</mi><mo>−</mo><msub><mi>μ</mi><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">\Sigma_i = \frac{1}{n_i} \sum_{x} r_{x}^{i} (x-\mu_i) (x-\mu_i)^{T}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord">Σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2902079999999998em;vertical-align:-0.44509999999999994em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.845108em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.44509999999999994em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.0016819999999999613em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.824664em;"><span style="top:-2.4530000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault">μ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.0913309999999998em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault">μ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span></span></li> -<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>w</mi><mi>i</mi></msub><mo>=</mo><mfrac><msub><mi>n</mi><mi>i</mi></msub><mi>n</mi></mfrac></mrow><annotation encoding="application/x-tex">w_i = \frac{n_i}{n}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.056492em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.7114919999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.4101em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathdefault mtight">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></li> -</ul> -<p><img src="_resources/7ba36b211f204ba187d79d53fd4e6a97.png" alt="Expectation and maximization"></p> -</div></div> - </body> - </html> diff --git a/content/ml-notes/_resources/7ba36b211f204ba187d79d53fd4e6a97.png b/content/ml-notes/Probability/7ba36b211f204ba187d79d53fd4e6a97.png Binary files differ. diff --git a/content/ml-notes/Probability/index.md b/content/ml-notes/Probability/index.md @@ -0,0 +1,242 @@ ++++ +title = 'Probability' +template = 'page-math.html' ++++ +# Probability + +## Probability basics + +What even is probability? + +- Frequentism: probability is only property of repeated experiments +- Bayesianism: probability is expression of our uncertainty and of our + beliefs + +### Probability theory + +Definitions: + +- sample space: the possible outcomes, can be discrete or continuous + (like real numbers) +- event space: set of the things that have probability (subsets of + sample space) +- random variable: a way to describe events, takes values with some + probability + - notation P(X = x) = 0.2 means that X takes the value x with + probability 0.2 +- for random variables X and Y: + - joint probability P(X, Y): gives probability of each atomic + event (specified single value for each random variable) + - marginal probability: if you sum a row/column of the joint + distribution (also called "marginalizing out" a variable) + - conditional probability P(X | Y): probability of X given Y, + i.e. the probability over one variable if another variable is + known +- independence: X and Y independent if P(X, Y) = P(X) P(Y) +- conditional independence: + - X and Y conditionally independent if P(X | Y) = P(X) + - X and Y conditionally independent given Z if P(X, Y | Z) = + P(X|Z) P(Y|Z) + +Identities: + +- $P(x | y) = \frac{P(x \cap y)}{P(y)} = P(X,Y) P(Y) = \frac{P(y | x) P(x)}{P(y)}$ +- $P(x \cup y) = P(x) + P(y) - P(x \cap y)$ + +Maximum likelihood estimation: +$\hat{\theta} = \argmax_{\theta} P(X | \theta)$ + +Fitting a normal distribution: + +$$\begin{aligned} + \hat{\mu}, \hat{\sigma} &= \argmax_{\mu, \sigma} P(X<sup>1, X</sup>2, ... | \mu, \sigma) \\ + &= \argmax_{\mu, \sigma} \prod_i N(X^i | \mu, \sigma) \\ +\end{aligned}$$ + +Probabilistic classifiers return a probability over all classes, given +features. + +## (Naive) Bayesian classifiers + +This is a generative classifier -- learn P(X|Y) and P(Y), apply Bayes +rule. + +Choose class y that maximises P(y|x) -- the probability of class given +data. Then expand using Bayes' rule. Denominator doesn't affect which +class gets highest probability, so just fit models to P(x|y) and P(y) +to maximise quantity P(x|y)P(y). + +$$\begin{aligned} +c(x) &= \argmax_{y \in {pos,neg}}P(y|x) \\ + &= \argmax_{y \in {pos,neg}}\frac{P(x|y) P(y)}{P(x)} \\ + &= \argmax_{y \in {pos,neg}}P(x|y)P(y) +\end{aligned}$$ + + +Bayes classifier: + +- choose probability distribution M (e.g. multivariate normal) +- fit M<sub>pos</sub> to all positive points: P(X=x | pos) = M<sub>pos</sub>(x) +- fit M<sub>neg</sub> to all negative points: P(X=x | neg) = M<sub>neg</sub>(x) +- estmate P(Y) from class frequencies in the training data, or + domain-specific information + +Naive Bayes: + +- assume independence between all features, conditional on the class: + $P(X_1, X_2 | Y) = P(X_1 | Y) P(X_2 | Y)$ +- but, if particular value doesn't occur, we estimate the probability + to be 0. and since the whole estimate of probability is a long + product, if a factor becomes zero, everything becomes zero. + +Laplace smoothing: + +- for each possible value, add an instance where all features have + that value (e.g. one row with all trues and one row with all falses) +- avoids collapses due to zero values + +## Logistic "regression" (classifier) + +A discriminative classifier: learn function for P(Y|X) directly. + +The logistic sigmoid: +$\sigma(t) = \frac{1}{1+e^{-t}} = \frac{e<sup>t}{1+e</sup>t}$ + +- also, $1-\sigma(t) = \sigma(-t)$ +- fits results into interval \[0,1\] + +Classifier: compute linear function, apply logistic sigmoid to result +$c(x) = \sigma(w \cdot x + b)$ + +Loss function: log loss ($-\log{P(class |features)}$) + +- maximum likelihood objective: find classifier q that maximises + probability of true classes +- points near decision boundary get more influence than points far + away (least squares does the opposite) +- also sometimes called "cross-entropy loss" + +$$\begin{aligned} + \argmax_q \prod_{C,x}q_x(C) &= \argmax_{q}\log{\prod_{C,x}q_x(C)} \\ + &= \argmin_{q}-\log{\prod_{C,x} q_x (C)} \\ + &= \argmin_q \sum_{C,x} - \log{q_x (C)} \\ + &= \argmin_q - \sum_{x \in X_p} \log{q_x (P)} - \sum_{x \in X_N} \log{q_x (N)} +\end{aligned}$$ + +where: + +- x: some data point +- q<sub>x</sub>: our classifier q<sub>x</sub>(C) = q(C|x) + +Problem: if the classes are well separable linearly, there are many +suitable classifiers, and logistic regression has no reason to prefer +one over the other. + +## Information theory + +The relation between encoding information and probability theory. + +Prefix-free trees assign prefix free code to set of outcomes. Benefit is +that no delimiters necessary in bit/codeword string. + +Arithmetic coding - if allow L(x) (length of code for x) to take +non-integer values, we can equate codes with probability distributions. + +Entropy of distribution: expected codelength of an element sampled from +that distribution. + +$$\begin{aligned} + H(p) &= E_p L(x) \\ + &= \sum_{x \in X} P(x)L(x) \\ + &= - \sum_{x \in X} P(x) \log{P(x)} + \end{aligned}$$ + +Cross entropy: expected codelength if we use q, but data comes from p. + +$H(p, q) = E_p L^q(x) = - \sum_{x \in X} p(x) \log{q(x)}$ + +Kulback-Leibler divergence: expected difference in codelength between p +and q. in other words, differencein expected codelength. + +$KL(p,q) = H(p,q) - H(p) = - \sum_{x \in X} p(x) \log{\frac{q(x)}{p(x)}}$ + +### Maximum likelihood +The maximum likelihood is the model with the highest probability. Selects the model that is most suitable given the observed data. + +(Log) likelihood: what we maximise to fit a probability model + +Loss: what we minimise to find a machine learning model + + +### Normal distributions (Gaussians) +#### 1D normal distribution (Gaussian) +Has a mean μ and standard deviation σ. + +Not a probability function, but a probability _density_ function. The only things on the graph that have probability are intervals, so to find probability, you integrate over the probability density function. + +Definition: $N(x | \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^{2}}} \exp{[ -\frac{1}{2\sigma^2} (x-\mu)^2 ]}$ + +Maximum likelihood for the mean: + +$\begin{aligned} + \argmax_{\theta} \log{p(x | \theta)} &= \argmax_{\theta} \ln{\prod_{x \in x} p(x|\theta)} \\ + &= \argmax_{\theta} \sum_{x}{\ln{p(x|\theta})} &&\text{(because product in log is sum outside of log)}\\ + &= \argmax_{\mu, \sigma} \sum_{x}\ln{\frac{1}{\sqrt{2\pi\sigma^2}}} \exp \big\lfloor -\frac{1}{2\sigma^2} (x-\mu)^2 \big\rfloor &&\text{(fill in the formula)}\\ + &= \argmax_{\mu, \sigma} \sum_{x}\ln{\frac{1}{\sqrt{2\pi\sigma^2}}} - \frac{1}{2\sigma^2} (x-\mu)^2 \\ +\frac{\partial \ln P(x|\theta)}{\partial \mu} &= \sum_{x} \frac{\partial \big[ \ln{\frac{1}{\sqrt{2\pi\sigma^2}}} - \frac{1}{2\sigma^2} (x-\mu)^2 \big]}{\partial \mu} &&\text{(because we want to maximise it)}\\ + &= -\frac{1}{2\sigma^2} \sum_{x} \frac{\partial (x-\mu)^2}{\partial \mu} \\ + &= -\frac{1}{\sigma^2} \sum_{x} (x-\mu) \\ + -\frac{1}{\sigma^2} \sum_{x} (x-\mu) &= 0 &&\text{(because the max/min is where the derivative is 0)} \\ + \sum_{x} (x-\mu) &= 0 \\ + -\mu n + \sum_{x} x &= 0 \\ + \mu &= \frac{1}{n} \sum_{x} x &&\text{(i.e. the arithmetic mean)} +\end{aligned}$ + +The implication is that the maximum likelihood estimator for the mean of normal distribution is the mean of the data. + +#### Regression with Gaussian errors +For a regression $y = x^{T} w + b + E$, where $E \sim N(0, \sigma)$ + +If we want to maximise the likelihood of the parameters of the line, given some data: + +$\begin{aligned} +\argmax_{w,b} P(Y|X,w,b) &= \argmax_{w,b} \ln{\prod_{i} N(y_i | x_{i}^{T} w + b, \sigma)} \\ + &= \argmax_{w,b} \sum_{i} \ln{\frac{1}{\sqrt{2\pi\sigma^2}}} \exp \Big[ -\frac{1}{2\sigma<sup>2} (x_{i}</sup>{T} w + b - y_i)^2 \Big] &&\text{(just fill in the formula)}\\ + &= \argmax_{w,b} -\sum_{i} \frac{1}{2\sigma<sup>2} (x_{i}</sup>{T} w + b - y_i)^2 &&\text{(because the ln doesn't matter for argmax)}\\ + &= \argmax_{w,b} -\frac{1}{2} \sum_{i} (x_{i}<sup>{T} w + b - y_i)</sup>2 &&\text{(because the stdev doesn't impact the result)}\\ + &= \argmin_{w,b} \frac{1}{2} \sum_{i} (x_{i}<sup>{T} w + b - y_i)</sup>2 &&\text{(which is the least squares function)}\\ +\end{aligned}$ + +So that's why least squares assumes a normal distribution. +#### n-D normal distribution (multivariate Gaussian) +The formula: $N(x | \mu, \Sigma) = \frac{1}{\sqrt{(2\pi)^d |\Sigma |}} \exp \Big[ -\frac{1}{2} (x-\mu)^{T} \Sigma^{-1} (x-\mu) \Big]$ + +#### Gaussian mixture model +Basically, combine Gaussians to represent more complex shapes. + +Example with three components: +- three components: N(μ₁, Σ₁), N(μ₂, Σ₂), N(μ₃, Σ₃) +- three weights: w₁, w₂, w₃ with $\sum w_{i} = 1$ + +Maximum likelihood: +$\argmax_{w_{i}, \mu_{i}, \Sigma_{i}} \sum_{x} {\ln \sum_{i}{ N(x | \mu_{i}, \Sigma_{i})}}$ +### Expectation-maximisation +Finding maximum-likelihood is hard if there are hidden variables (not observed) that affect those that are in the dataset. For example, if the hidden variables come from mixture models (you don't know their specific distribution). This can be used to fit _any_ hidden variable model. + +Key insight: can't optimise both θ and z, but given some θ, can compute P(z|x), and given z, can optimise θ. + +Intuition: +1. Initialize components randomly +2. loop: + - expectation: assign soft responsibilities to each point. i.e., points "belong" to each Gaussian "to some degree"; each Gaussian takes a certain _responsibility_ for each point. + - maximisation: fit components to the data, weighted by responsibility. + +Definition of "responsibility": $r_{x}^{i} = \frac{P(z=i | x)}{\sum_{j} P(z=j | x)}$ +Model parameters, given responsibilities: +- $n_i = \sum_{x} r_{x}^{i}$ +- $\mu_i = \frac{1}{n_i} \sum_{x} r_{x}^{i} x$ +- $\Sigma_i = \frac{1}{n_i} \sum_{x} r_{x}^{i} (x-\mu_i) (x-\mu_i)^{T}$ +- $w_i = \frac{n_i}{n}$ + +![Expectation and maximization](7ba36b211f204ba187d79d53fd4e6a97.png) + diff --git a/content/ml-notes/Programming reference.html b/content/ml-notes/Programming reference.html @@ -1,185 +0,0 @@ - - <!DOCTYPE html> - <html> - <head> - <meta charset="UTF-8"> - <link rel="stylesheet" href="pluginAssets/highlight.js/atom-one-light.css"> - <title>Programming reference</title> - <link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head> - <body> - -<div id="rendered-md"><h1 id="numpy-matplotlib">Numpy & matplotlib</h1> -<p>Load external file:</p> -<pre class="hljs"><code>data = numpy.loadtxt(<span class="hljs-string">'./filepath.csv'</span>, delimiter=<span class="hljs-string">','</span>) -</code></pre> -<p>Print information about data:</p> -<pre class="hljs"><code>data.shape -</code></pre> -<p>Graph two columns of data:</p> -<pre class="hljs"><code><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt -%matplotlib inline -x = data[:,<span class="hljs-number">0</span>] -y = data[:,<span class="hljs-number">1</span>] -<span class="hljs-comment"># includes size and transparency setting, specifies third column to use for color</span> -plt.scatter(x, y, s=<span class="hljs-number">3</span>, alpha=<span class="hljs-number">0.2</span>, c=data[:,<span class="hljs-number">2</span>], cmap=<span class="hljs-string">'RdYlBu_r'</span>) -plt.xlabel(<span class="hljs-string">'x axis'</span>) -plt.ylabel(<span class="hljs-string">'y axis'</span>); -</code></pre> -<p>Histogram plotting:</p> -<pre class="hljs"><code><span class="hljs-comment"># bins determines width of bars</span> -plt.hist(data, bins=<span class="hljs-number">100</span>, range=[start, end] -</code></pre> -<p>The identity matrix:</p> -<pre class="hljs"><code>np.eye(<span class="hljs-number">2</span>) <span class="hljs-comment"># for a 2x2 matrix</span> -</code></pre> -<p>Matrix multiplication:</p> -<pre class="hljs"><code>a * b <span class="hljs-comment"># element-wise</span> -a.dot(b) <span class="hljs-comment"># dot product</span> -</code></pre> -<p>Useful references:</p> -<ul> -<li><a data-from-md title='https://docs.scipy.org/doc/numpy-dev/user/quickstart.html' href='https://docs.scipy.org/doc/numpy-dev/user/quickstart.html' type=''>The official numpy quickstart guide</a></li> -<li><a data-from-md title='https://www.datacamp.com/community/tutorials/python-numpy-tutorial' href='https://www.datacamp.com/community/tutorials/python-numpy-tutorial' type=''>A more in-depth tutorial, with in-browser samples</a></li> -<li><a data-from-md title='http://cs231n.github.io/python-numpy-tutorial/' href='http://cs231n.github.io/python-numpy-tutorial/' type=''>A very good walk through the most important functions and features</a>. From the famous <a data-from-md title='http://cs231n.github.io/' href='http://cs231n.github.io/' type=''>CS231n course</a>, from Stanford.</li> -<li><a data-from-md title='https://matplotlib.org/users/pyplot_tutorial.html' href='https://matplotlib.org/users/pyplot_tutorial.html' type=''>The official pyplot tutorial</a>. Note that pyplot can accept basic python lists as well as numpy data.</li> -<li><a data-from-md title='https://matplotlib.org/gallery.html' href='https://matplotlib.org/gallery.html' type=''>A gallery of example MPL plots</a>. Most of these do not use the pyplot state-machine interface, but the more low level objects like <a data-from-md title='https://matplotlib.org/api/axes_api.html' href='https://matplotlib.org/api/axes_api.html' type=''>Axes</a>.</li> -<li><a data-from-md title='http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html' href='http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html' type=''>In-depth walk through the main features and plot types</a></li> -</ul> -<h1 id="sklearn">Sklearn</h1> -<p>Split data into train and test, on features <code class="inline-code">x</code> and target <code class="inline-code">y</code>:</p> -<pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split -x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span class="hljs-number">0.5</span>) -</code></pre> -<p>An estimator implements method <code class="inline-code">fit(x,y)</code> that learns from data, and <code class="inline-code">predict(T)</code> which takes new instance and predicts target value.</p> -<p>Linear classifier, using SVC model with linear kernel:</p> -<pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.svm <span class="hljs-keyword">import</span> SVC -linear = SVC(kernel=<span class="hljs-string">'linear'</span>) -linear.fit(x_train, y_train) -</code></pre> -<p>Decision tree classifier:</p> -<pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.tree <span class="hljs-keyword">import</span> DecisionTreeClassifier -tree = DecisionTreeClassifier() -tree.fit(x_train, y_train) -</code></pre> -<p>k-Nearest Neighbors:</p> -<pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.neighbors <span class="hljs-keyword">import</span> KNeighborsClassifier -knn = KNeighborsClassifier(<span class="hljs-number">15</span>) <span class="hljs-comment"># We set the number of neighbors to 15</span> -knn.fit(x_train, y_train) -</code></pre> -<p>Try to classify new data:</p> -<pre class="hljs"><code>linear.predict(some_data) -</code></pre> -<p>Compute accuracy on testing data:</p> -<pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score -y_predicted = linear.predict(x_test) -accuracy_score(y_test, y_predicted) -</code></pre> -<p>Make a plot of classification, with colors showing classifier's decision:</p> -<pre class="hljs"><code><span class="hljs-keyword">from</span> mlxtend.plotting <span class="hljs-keyword">import</span> plot_decision_regions -plot_decision_regions(x_test[:<span class="hljs-number">500</span>], y_test.astype(np.integer)[:<span class="hljs-number">500</span>], clf=linear, res=<span class="hljs-number">0.1</span>); -</code></pre> -<p>Compare classifiers via ROC curve:</p> -<pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> roc_curve, auc - -<span class="hljs-comment"># The linear classifier doesn't produce class probabilities by default. We'll retrain it for probabilities.</span> -linear = SVC(kernel=<span class="hljs-string">'linear'</span>, probability=<span class="hljs-literal">True</span>) -linear.fit(x_train, y_train) - -<span class="hljs-comment"># We'll need class probabilities from each of the classifiers</span> -y_linear = linear.predict_proba(x_test) -y_tree = tree.predict_proba(x_test) -y_knn = knn.predict_proba(x_test) - -<span class="hljs-comment"># Compute the points on the curve</span> -<span class="hljs-comment"># We pass the probability of the second class (KIA) as the y_score</span> -curve_linear = sklearn.metrics.roc_curve(y_test, y_linear[:, <span class="hljs-number">1</span>]) -curve_tree = sklearn.metrics.roc_curve(y_test, y_tree[:, <span class="hljs-number">1</span>]) -curve_knn = sklearn.metrics.roc_curve(y_test, y_knn[:, <span class="hljs-number">1</span>]) - -<span class="hljs-comment"># Compute Area Under the Curve</span> -auc_linear = auc(curve_linear[<span class="hljs-number">0</span>], curve_linear[<span class="hljs-number">1</span>]) -auc_tree = auc(curve_tree[<span class="hljs-number">0</span>], curve_tree[<span class="hljs-number">1</span>]) -auc_knn = auc(curve_knn[<span class="hljs-number">0</span>], curve_knn[<span class="hljs-number">1</span>]) - -plt.plot(curve_linear[<span class="hljs-number">0</span>], curve_linear[<span class="hljs-number">1</span>], label=<span class="hljs-string">'linear (area = %0.2f)'</span> % auc_linear) -plt.plot(curve_tree[<span class="hljs-number">0</span>], curve_tree[<span class="hljs-number">1</span>], label=<span class="hljs-string">'tree (area = %0.2f)'</span> % auc_tree) -plt.plot(curve_knn[<span class="hljs-number">0</span>], curve_knn[<span class="hljs-number">1</span>], label=<span class="hljs-string">'knn (area = %0.2f)'</span>% auc_knn) - -plt.xlim([<span class="hljs-number">0.0</span>, <span class="hljs-number">1.0</span>]) -plt.ylim([<span class="hljs-number">0.0</span>, <span class="hljs-number">1.0</span>]) -plt.xlabel(<span class="hljs-string">'False Positive Rate'</span>) -plt.ylabel(<span class="hljs-string">'True Positive Rate'</span>) -plt.title(<span class="hljs-string">'ROC curve'</span>); - -plt.legend(); -</code></pre> -<p>Cross-validation:</p> -<pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> cross_val_score -<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> roc_auc_score, make_scorer - -<span class="hljs-comment"># The cross_val_score function does all the training for us. We simply pass</span> -<span class="hljs-comment"># it the complete data, the model, and the metric.</span> - -linear = SVC(kernel=<span class="hljs-string">'linear'</span>, probability=<span class="hljs-literal">True</span>) - -<span class="hljs-comment"># Train for 5 folds, returing ROC AUC. You can also try 'accuracy' as a scorer</span> -scores = cross_val_score(linear, x, y, cv=<span class="hljs-number">3</span>, scoring=<span class="hljs-string">'roc_auc'</span>) - -print(<span class="hljs-string">'scores per fold '</span>, scores) -</code></pre> -<p>Regression:</p> -<pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn <span class="hljs-keyword">import</span> datasets -<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> mean_squared_error, r2_score - -<span class="hljs-comment"># Load the diabetes dataset, and select one feature (Body Mass Index)</span> -x, y = datasets.load_diabetes(<span class="hljs-literal">True</span>) -x = x[:, <span class="hljs-number">2</span>].reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>) - -<span class="hljs-comment"># -- the reshape operation ensures that x still has two dimensions</span> -<span class="hljs-comment"># (that is, we need it to be an n by 1 matrix, not a vector)</span> - -x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span class="hljs-number">0.5</span>) - -<span class="hljs-comment"># feature space on horizontal axis, output space on vertical axis</span> -plt.scatter(x_train[:, <span class="hljs-number">0</span>], y_train) -plt.xlabel(<span class="hljs-string">'BMI'</span>) -plt.ylabel(<span class="hljs-string">'disease progression'</span>); - -<span class="hljs-comment"># Train three models: linear regression, tree regression, knn regression</span> -<span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> LinearRegression -linear = LinearRegression() -linear.fit(x_train, y_train) - -<span class="hljs-keyword">from</span> sklearn.tree <span class="hljs-keyword">import</span> DecisionTreeRegressor -tree = DecisionTreeRegressor() -tree.fit(x_train, y_train) - -<span class="hljs-keyword">from</span> sklearn.neighbors <span class="hljs-keyword">import</span> KNeighborsRegressor -knn = KNeighborsRegressor(<span class="hljs-number">10</span>) -knn.fit(x_train, y_train); - -<span class="hljs-comment"># Plot the models</span> -<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> mean_squared_error - -plt.scatter(x_train, y_train, alpha=<span class="hljs-number">0.1</span>) - -xlin = np.linspace(<span class="hljs-number">-0.10</span>, <span class="hljs-number">0.2</span>, <span class="hljs-number">500</span>).reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>) -plt.plot(xlin, linear.predict(xlin), label=<span class="hljs-string">'linear'</span>) -plt.plot(xlin, tree.predict(xlin), label=<span class="hljs-string">'tree '</span>) -plt.plot(xlin, knn.predict(xlin), label=<span class="hljs-string">'knn '</span>) - -print(<span class="hljs-string">'MSE linear '</span>, mean_squared_error(y_test, linear.predict(x_test))) -print(<span class="hljs-string">'MSE tree '</span>, mean_squared_error(y_test, tree.predict(x_test))) -print(<span class="hljs-string">'MSE knn'</span>, mean_squared_error(y_test, knn.predict(x_test))) - -plt.legend(); -</code></pre> -<p>Useful references:</p> -<ul> -<li><a data-from-md title='http://scikit-learn.org/stable/tutorial/basic/tutorial.html' href='http://scikit-learn.org/stable/tutorial/basic/tutorial.html' type=''>The official quickstart guide</a></li> -<li><a data-from-md title='https://www.datacamp.com/community/tutorials/machine-learning-python' href='https://www.datacamp.com/community/tutorials/machine-learning-python' type=''>A DataCamp tutorial with interactive exercises</a></li> -<li><a data-from-md title='http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html' href='http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html' type=''>Analyzing text data with SKLearn</a></li> -</ul> -</div></div> - </body> - </html> diff --git a/content/ml-notes/Programming reference.md b/content/ml-notes/Programming reference.md @@ -0,0 +1,223 @@ ++++ +title = 'Programming reference' ++++ +# Numpy & matplotlib +Load external file: +```python +data = numpy.loadtxt('./filepath.csv', delimiter=',') +``` + +Print information about data: + +```python +data.shape +``` + +Graph two columns of data: + +```python +import matplotlib.pyplot as plt +%matplotlib inline +x = data[:,0] +y = data[:,1] +# includes size and transparency setting, specifies third column to use for color +plt.scatter(x, y, s=3, alpha=0.2, c=data[:,2], cmap='RdYlBu_r') +plt.xlabel('x axis') +plt.ylabel('y axis'); +``` + +Histogram plotting: + +```python +# bins determines width of bars +plt.hist(data, bins=100, range=[start, end] +``` + +The identity matrix: + +```python +np.eye(2) # for a 2x2 matrix +``` + +Matrix multiplication: + +```python +a * b # element-wise +a.dot(b) # dot product +``` + +Useful references: +* [The official numpy quickstart guide](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html) +* [A more in-depth tutorial, with in-browser samples](https://www.datacamp.com/community/tutorials/python-numpy-tutorial) +* [A very good walk through the most important functions and features](http://cs231n.github.io/python-numpy-tutorial/). From the famous [CS231n course](http://cs231n.github.io/), from Stanford. +* [The official pyplot tutorial](https://matplotlib.org/users/pyplot_tutorial.html). Note that pyplot can accept basic python lists as well as numpy data. +* [A gallery of example MPL plots](https://matplotlib.org/gallery.html). Most of these do not use the pyplot state-machine interface, but the more low level objects like [Axes](https://matplotlib.org/api/axes_api.html). +* [In-depth walk through the main features and plot types](http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html) + + +# Sklearn +Split data into train and test, on features `x` and target `y`: + +```python +from sklearn.model_selection import train_test_split +x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.5) +``` + +An estimator implements method `fit(x,y)` that learns from data, and `predict(T)` which takes new instance and predicts target value. + +Linear classifier, using SVC model with linear kernel: + +```python +from sklearn.svm import SVC +linear = SVC(kernel='linear') +linear.fit(x_train, y_train) +``` + +Decision tree classifier: + +```python +from sklearn.tree import DecisionTreeClassifier +tree = DecisionTreeClassifier() +tree.fit(x_train, y_train) +``` + +k-Nearest Neighbors: + +```python +from sklearn.neighbors import KNeighborsClassifier +knn = KNeighborsClassifier(15) # We set the number of neighbors to 15 +knn.fit(x_train, y_train) +``` + +Try to classify new data: + +```python +linear.predict(some_data) +``` + +Compute accuracy on testing data: + +```python +from sklearn.metrics import accuracy_score +y_predicted = linear.predict(x_test) +accuracy_score(y_test, y_predicted) +``` + +Make a plot of classification, with colors showing classifier's decision: + +```python +from mlxtend.plotting import plot_decision_regions +plot_decision_regions(x_test[:500], y_test.astype(np.integer)[:500], clf=linear, res=0.1); +``` + +Compare classifiers via ROC curve: + + +```python +from sklearn.metrics import roc_curve, auc + +# The linear classifier doesn't produce class probabilities by default. We'll retrain it for probabilities. +linear = SVC(kernel='linear', probability=True) +linear.fit(x_train, y_train) + +# We'll need class probabilities from each of the classifiers +y_linear = linear.predict_proba(x_test) +y_tree = tree.predict_proba(x_test) +y_knn = knn.predict_proba(x_test) + +# Compute the points on the curve +# We pass the probability of the second class (KIA) as the y_score +curve_linear = sklearn.metrics.roc_curve(y_test, y_linear[:, 1]) +curve_tree = sklearn.metrics.roc_curve(y_test, y_tree[:, 1]) +curve_knn = sklearn.metrics.roc_curve(y_test, y_knn[:, 1]) + +# Compute Area Under the Curve +auc_linear = auc(curve_linear[0], curve_linear[1]) +auc_tree = auc(curve_tree[0], curve_tree[1]) +auc_knn = auc(curve_knn[0], curve_knn[1]) + +plt.plot(curve_linear[0], curve_linear[1], label='linear (area = %0.2f)' % auc_linear) +plt.plot(curve_tree[0], curve_tree[1], label='tree (area = %0.2f)' % auc_tree) +plt.plot(curve_knn[0], curve_knn[1], label='knn (area = %0.2f)'% auc_knn) + +plt.xlim([0.0, 1.0]) +plt.ylim([0.0, 1.0]) +plt.xlabel('False Positive Rate') +plt.ylabel('True Positive Rate') +plt.title('ROC curve'); + +plt.legend(); +``` + +Cross-validation: + + +```python +from sklearn.model_selection import cross_val_score +from sklearn.metrics import roc_auc_score, make_scorer + +# The cross_val_score function does all the training for us. We simply pass +# it the complete data, the model, and the metric. + +linear = SVC(kernel='linear', probability=True) + +# Train for 5 folds, returing ROC AUC. You can also try 'accuracy' as a scorer +scores = cross_val_score(linear, x, y, cv=3, scoring='roc_auc') + +print('scores per fold ', scores) +``` + +Regression: + +```python +from sklearn import datasets +from sklearn.metrics import mean_squared_error, r2_score + +# Load the diabetes dataset, and select one feature (Body Mass Index) +x, y = datasets.load_diabetes(True) +x = x[:, 2].reshape(-1, 1) + +# -- the reshape operation ensures that x still has two dimensions +# (that is, we need it to be an n by 1 matrix, not a vector) + +x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.5) + +# feature space on horizontal axis, output space on vertical axis +plt.scatter(x_train[:, 0], y_train) +plt.xlabel('BMI') +plt.ylabel('disease progression'); + +# Train three models: linear regression, tree regression, knn regression +from sklearn.linear_model import LinearRegression +linear = LinearRegression() +linear.fit(x_train, y_train) + +from sklearn.tree import DecisionTreeRegressor +tree = DecisionTreeRegressor() +tree.fit(x_train, y_train) + +from sklearn.neighbors import KNeighborsRegressor +knn = KNeighborsRegressor(10) +knn.fit(x_train, y_train); + +# Plot the models +from sklearn.metrics import mean_squared_error + +plt.scatter(x_train, y_train, alpha=0.1) + +xlin = np.linspace(-0.10, 0.2, 500).reshape(-1, 1) +plt.plot(xlin, linear.predict(xlin), label='linear') +plt.plot(xlin, tree.predict(xlin), label='tree ') +plt.plot(xlin, knn.predict(xlin), label='knn ') + +print('MSE linear ', mean_squared_error(y_test, linear.predict(x_test))) +print('MSE tree ', mean_squared_error(y_test, tree.predict(x_test))) +print('MSE knn', mean_squared_error(y_test, knn.predict(x_test))) + +plt.legend(); +``` + +Useful references: +* [The official quickstart guide](http://scikit-learn.org/stable/tutorial/basic/tutorial.html) +* [A DataCamp tutorial with interactive exercises](https://www.datacamp.com/community/tutorials/machine-learning-python) +* [Analyzing text data with SKLearn](http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html) diff --git a/content/ml-notes/Reinforcement learning.html b/content/ml-notes/Reinforcement learning.html @@ -1,87 +0,0 @@ - - <!DOCTYPE html> - <html> - <head> - <meta charset="UTF-8"> - <link rel="stylesheet" href="pluginAssets/highlight.js/atom-one-light.css"> - <title>Reinforcement learning</title> - <link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head> - <body> - -<div id="rendered-md"><h1 id="reinforcement-learning">Reinforcement learning</h1> -<nav class="table-of-contents"><ul><li><a href="#reinforcement-learning">Reinforcement learning</a><ul><li><a href="#what-is-reinforcement-learning">What is reinforcement learning?</a></li><li><a href="#approaches">Approaches</a><ul><li><a href="#random-search">Random search</a></li><li><a href="#policy-gradient">Policy gradient</a></li><li><a href="#q-learning">Q-learning</a></li></ul></li><li><a href="#alpha-stuff">Alpha-stuff</a><ul><li><a href="#alphago">AlphaGo</a></li><li><a href="#alphazero">AlphaZero</a></li><li><a href="#alphastar">AlphaStar</a></li></ul></li></ul></li></ul></nav><h2 id="what-is-reinforcement-learning">What is reinforcement learning?</h2> -<p>Agent is in a state, takes an action.<br> -Action is selected by policy - function from states to actions.<br> -The environment tells the agent its new state, and provides a reward (number, higher is better).<br> -The learner adapts the policy to maximise expectation of future rewards.</p> -<p>Markov decision process: optimal policy may not depend on previous state, only info in current state counts.</p> -<p><img src="_resources/e78427ef0d0845d0ae21e1c7857d2740.png" alt="90955f3da8fb0d61c2fa9f3033c65098.png"></p> -<p>Sparse loss:</p> -<ul> -<li>start with imitation learning - supervised learning, copying human action</li> -<li>reward shaping - guessing reward for intermediate states, or states close to good states</li> -<li>auxiliary goals - curiosity, max distance traveled</li> -</ul> -<p>policy network: NN with input of state, output of action, and a softmax output layer to produce prob distribution.</p> -<p>three problems of RL:</p> -<ul> -<li>non differentiable loss</li> -<li>balance exploration and exploitation -<ul> -<li>this is a classic trade-off in online learning</li> -<li>for example, an agent in a maze may train to reach a reward of 1 that's close by and exploit that reward, and so it might never explore further and reach the 100 reward at the end of the maze</li> -</ul> -</li> -<li>delayed reward/sparse loss -<ul> -<li>you might take an action that causes a negative result, but the result won't show up until some time later</li> -<li>for example, if you start studying before an exam, that's a good thing.<br> -the issue is that you started one day before, and didn't do jack shit during the preceding two weeks.</li> -<li>credit assignment problem: how do you know which action takes the credit for the bad result?</li> -</ul> -</li> -</ul> -<p>deterministic policy - every state followed by same action.<br> -probabilistic policy - all actions possible, certain actions higher probability.</p> -<h2 id="approaches">Approaches</h2> -<p>how do you choose the weights (how do you learn)?<br> -simple backpropagation doesn't work - we don't have labeled examples to tell us which move to take for given state.</p> -<h3 id="random-search">Random search</h3> -<p>pick random point m in model space.</p> -<pre class="hljs"><code><span class="hljs-attr">loop</span>:<span class="hljs-string"></span> - <span class="hljs-attr">pick</span> <span class="hljs-string">random point m' close to m</span> - <span class="hljs-attr">if</span> <span class="hljs-string">loss(m') < loss(m):</span> - <span class="hljs-attr">m</span> <span class="hljs-string"><- m'</span> -</code></pre> -<p>"close to" is sampled uniformly among all points with some pre-chosen distance r from w.</p> -<h3 id="policy-gradient">Policy gradient</h3> -<p>follow some semi-random policy, wait until reach reward state, then label all previous state-action pairs with final outcome.<br> -i.e. if some actions were bad, on average will occur more often in sequences ending with negative reward, and on average will be more often labeled as bad.</p> -<p><img src="_resources/c484829362004f90be2b33a92acf7fd9.png" alt="442f7f9bc5e14ffbbcfd54f6ea6b72df.png"></p> -<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∇</mi><msub><mi>𝔼</mi><mi>a</mi></msub><mi>r</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo><mo>=</mo><mi mathvariant="normal">∇</mi><msub><mo>∑</mo><mi>a</mi></msub><mi>p</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo><mi>r</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo><mo>=</mo><msub><mi>𝔼</mi><mi>a</mi></msub><mi>r</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo><mi mathvariant="normal">∇</mi><mi>ln</mi><mo></mo><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo></mrow></mrow><annotation encoding="application/x-tex">\nabla 𝔼_a r(a) = \nabla \sum_{a} p(a) r(a) = 𝔼_{a} r(a) \nabla \ln{p(a)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∇</span><span class="mord"><span class="mord mathbb">E</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">a</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mopen">(</span><span class="mord mathdefault">a</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.0497100000000001em;vertical-align:-0.29971000000000003em;"></span><span class="mord">∇</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.0016819999999999613em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">a</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">p</span><span class="mopen">(</span><span class="mord mathdefault">a</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mopen">(</span><span class="mord mathdefault">a</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathbb">E</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">a</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mopen">(</span><span class="mord mathdefault">a</span><span class="mclose">)</span><span class="mord">∇</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">ln</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault">p</span><span class="mopen">(</span><span class="mord mathdefault">a</span><span class="mclose">)</span></span></span></span></span>, r is the ultimate reward at the end of the trajectory.</p> -<h3 id="q-learning">Q-learning</h3> -<p>If I need this, I'll make better notes, can't really understand it from the slides.</p> -<h2 id="alpha-stuff">Alpha-stuff</h2> -<h3 id="alphago">AlphaGo</h3> -<p>starts with imitation learning.<br> -improve by playing against previous iterations and self. trained by reinforcement learning using policy gradient descent to update weights.<br> -during play, use Monte Carlo Tree Search, with node values being the prob that black will win from that state.</p> -<h3 id="alphazero">AlphaZero</h3> -<p>learns from scratch, there's no imitation learning or reward shaping.<br> -also applicable to other games like chess.</p> -<p>Improves AlphaGo by:</p> -<ul> -<li>combining policy and value nets</li> -<li>viewing MCTS as policy improvement operator</li> -<li>adding residual connections, batch normalization</li> -</ul> -<h3 id="alphastar">AlphaStar</h3> -<p>This shit can play starcraft.</p> -<p>Real time, imperfect information, large diverse action space, and no single best strategy.<br> -Its behaviour is generated by a deep NN that gets input from game interface, and outputs instructions that are an action in the game.</p> -<p>it has a transformer torso for units<br> -deep LSTM core with autoregressive policy head, and pointer network.<br> -makes use of multi-agent learning.</p> -</div></div> - </body> - </html> diff --git a/content/ml-notes/_resources/c484829362004f90be2b33a92acf7fd9.png b/content/ml-notes/Reinforcement learning/c484829362004f90be2b33a92acf7fd9.png Binary files differ. diff --git a/content/ml-notes/_resources/e78427ef0d0845d0ae21e1c7857d2740.png b/content/ml-notes/Reinforcement learning/e78427ef0d0845d0ae21e1c7857d2740.png Binary files differ. diff --git a/content/ml-notes/Reinforcement learning/index.md b/content/ml-notes/Reinforcement learning/index.md @@ -0,0 +1,87 @@ ++++ +title = 'Reinforcement learning' +template = 'page-math.html' ++++ +# Reinforcement learning + +## What is reinforcement learning? +Agent is in a state, takes an action. +Action is selected by policy - function from states to actions. +The environment tells the agent its new state, and provides a reward (number, higher is better). +The learner adapts the policy to maximise expectation of future rewards. + +Markov decision process: optimal policy may not depend on previous state, only info in current state counts. + +![90955f3da8fb0d61c2fa9f3033c65098.png](e78427ef0d0845d0ae21e1c7857d2740.png) + +Sparse loss: +- start with imitation learning - supervised learning, copying human action +- reward shaping - guessing reward for intermediate states, or states close to good states +- auxiliary goals - curiosity, max distance traveled + +policy network: NN with input of state, output of action, and a softmax output layer to produce prob distribution. + +three problems of RL: +- non differentiable loss +- balance exploration and exploitation + - this is a classic trade-off in online learning + - for example, an agent in a maze may train to reach a reward of 1 that's close by and exploit that reward, and so it might never explore further and reach the 100 reward at the end of the maze +- delayed reward/sparse loss + - you might take an action that causes a negative result, but the result won't show up until some time later + - for example, if you start studying before an exam, that's a good thing. + the issue is that you started one day before, and didn't do jack shit during the preceding two weeks. + - credit assignment problem: how do you know which action takes the credit for the bad result? + +deterministic policy - every state followed by same action. +probabilistic policy - all actions possible, certain actions higher probability. + +## Approaches +how do you choose the weights (how do you learn)? +simple backpropagation doesn't work - we don't have labeled examples to tell us which move to take for given state. + +### Random search +pick random point m in model space. + +``` +loop: + pick random point m' close to m + if loss(m') < loss(m): + m <- m' +``` + +"close to" is sampled uniformly among all points with some pre-chosen distance r from w. +### Policy gradient +follow some semi-random policy, wait until reach reward state, then label all previous state-action pairs with final outcome. +i.e. if some actions were bad, on average will occur more often in sequences ending with negative reward, and on average will be more often labeled as bad. + +![442f7f9bc5e14ffbbcfd54f6ea6b72df.png](c484829362004f90be2b33a92acf7fd9.png) + +$\nabla 𝔼_a r(a) = \nabla \sum_{a} p(a) r(a) = 𝔼_{a} r(a) \nabla \ln{p(a)}$, r is the ultimate reward at the end of the trajectory. + +### Q-learning +If I need this, I'll make better notes, can't really understand it from the slides. + +## Alpha-stuff +### AlphaGo +starts with imitation learning. +improve by playing against previous iterations and self. trained by reinforcement learning using policy gradient descent to update weights. +during play, use Monte Carlo Tree Search, with node values being the prob that black will win from that state. + +### AlphaZero +learns from scratch, there's no imitation learning or reward shaping. +also applicable to other games like chess. + +Improves AlphaGo by: +- combining policy and value nets +- viewing MCTS as policy improvement operator +- adding residual connections, batch normalization + +### AlphaStar +This shit can play starcraft. + +Real time, imperfect information, large diverse action space, and no single best strategy. +Its behaviour is generated by a deep NN that gets input from game interface, and outputs instructions that are an action in the game. + +it has a transformer torso for units +deep LSTM core with autoregressive policy head, and pointer network. +makes use of multi-agent learning. diff --git a/content/ml-notes/Tree models and ensembles.html b/content/ml-notes/Tree models and ensembles.html @@ -1,120 +0,0 @@ - - <!DOCTYPE html> - <html> - <head> - <meta charset="UTF-8"> - - <title>Tree models and ensembles</title> - <link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head> - <body> - -<div id="rendered-md"><h1 id="tree-models-ensembles">Tree models & ensembles</h1> -<nav class="table-of-contents"><ul><li><a href="#tree-models-ensembles">Tree models & ensembles</a><ul><li><a href="#tree-models">Tree models</a><ul><li><a href="#decision-trees-categorical">Decision trees (categorical)</a></li><li><a href="#regression-trees-numeric">Regression trees (numeric)</a></li><li><a href="#generalization-hierarchy">Generalization hierarchy</a></li></ul></li><li><a href="#ensembling-methods">Ensembling methods</a><ul><li><a href="#bagging">Bagging</a></li><li><a href="#boosting">Boosting</a><ul><li><a href="#adaboost-binary-classifiers">AdaBoost (binary classifiers)</a></li><li><a href="#gradient-boosting">Gradient boosting</a></li></ul></li><li><a href="#stacking">Stacking</a></li></ul></li></ul></li></ul></nav><h2 id="tree-models">Tree models</h2> -<h3 id="decision-trees-categorical">Decision trees (categorical)</h3> -<p>Work on numerical and categorical features</p> -<p>Standard decision tree learning algorithm (ID3, C45):</p> -<ul> -<li>start with empty tree</li> -<li>extend step by step -<ul> -<li>stop when: all inputs are same (no more features left), or all outputs are same (all instances same class)</li> -</ul> -</li> -<li>greedy (no backtracking)</li> -<li>choose the split that creates least uniform distribution over class labels in the resulting segmentation -<ul> -<li>entropy is measure of uniformity of distribution</li> -<li>recall, entropy <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>H</mi><mo stretchy="false">(</mo><mi>p</mi><mo stretchy="false">)</mo><mo>=</mo><mo>−</mo><msub><mo>∑</mo><mrow><mi>x</mi><mo>∈</mo><mi>X</mi></mrow></msub><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mi>log</mi><mo></mo><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mrow><annotation encoding="application/x-tex">H(p) = - \sum_{x \in X} P(x) \log{P(x)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault">p</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.07708em;vertical-align:-0.32708000000000004em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.17862099999999992em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">x</span><span class="mrel mtight">∈</span><span class="mord mathdefault mtight" style="margin-right:0.07847em;">X</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.32708000000000004em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span></span></span></li> -<li>conditional entropy: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>H</mi><mo stretchy="false">(</mo><mi>X</mi><mi mathvariant="normal">∣</mi><mi>Y</mi><mo stretchy="false">)</mo><mo>=</mo><msub><mo>∑</mo><mi>y</mi></msub><mi>P</mi><mo stretchy="false">(</mo><mi>y</mi><mo stretchy="false">)</mo><mi>H</mi><mo stretchy="false">(</mo><mi>X</mi><mi mathvariant="normal">∣</mi><mi>Y</mi><mo>=</mo><mi>y</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">H(X|Y) = \sum_{y} P(y) H(X | Y = y)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.185818em;vertical-align:-0.43581800000000004em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.0016819999999999613em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.03588em;">y</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.43581800000000004em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mclose">)</span></span></span></span></li> -<li>information gain of Y: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>I</mi><mi>X</mi></msub><mo stretchy="false">(</mo><mi>Y</mi><mo stretchy="false">)</mo><mo>=</mo><mi>H</mi><mo stretchy="false">(</mo><mi>X</mi><mo stretchy="false">)</mo><mo>−</mo><mi>H</mi><mo stretchy="false">(</mo><mi>X</mi><mi mathvariant="normal">∣</mi><mi>Y</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">I_{X}(Y) = H(X) - H(X | Y)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07847em;">I</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.32833099999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.07847em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.07847em;">X</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mord">∣</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mclose">)</span></span></span></span></li> -<li>so, pick the one with the highest information gain</li> -</ul> -</li> -</ul> -<p>The algorithm in steps:</p> -<ol> -<li>start with single unlabeled leaf</li> -<li>loop until no unlabeled leaves: -<ul> -<li>for each unlabeled leaf l with segment s: -<ul> -<li>if stop condition, label majority class of S -<ul> -<li>stop when: all inputs are same (no more features left), or all outputs are same (all instances same class)</li> -</ul> -</li> -<li>else split L on feature F with highest gain Is(F)</li> -</ul> -</li> -</ul> -</li> -</ol> -<p>With categoric features, it doesn't make sense to split on the same feature twice.</p> -<h3 id="regression-trees-numeric">Regression trees (numeric)</h3> -<p>For numeric features, split at a numeric threshold t.</p> -<p>Of course there's a trade-off, complicated decision trees lead to overfitting.</p> -<p>Pruning - for every split, ask whether the tree classifies better with or without the split (on validation data)</p> -<p>Using validation data: test is only for final testing, validation for hyperparameter selection. If you want to control search, split training data and use a part of it for 'validation'.</p> -<p>Label the leaves with the one element, or take the mean.</p> -<p>Instead of entropy, use <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>I</mi><mi>S</mi></msub><mo stretchy="false">(</mo><mi>V</mi><mo stretchy="false">)</mo><mo>=</mo><mi>V</mi><mi>a</mi><mi>r</mi><mo stretchy="false">(</mo><mi>S</mi><mo stretchy="false">)</mo><mo>−</mo><msub><mo>∑</mo><mi>i</mi></msub><mfrac><mrow><mi mathvariant="normal">∣</mi><msub><mi>S</mi><mi>i</mi></msub><mi mathvariant="normal">∣</mi></mrow><mrow><mi mathvariant="normal">∣</mi><mi>S</mi><mi mathvariant="normal">∣</mi></mrow></mfrac><mi>V</mi><mi>a</mi><mi>r</mi><mo stretchy="false">(</mo><msub><mi>S</mi><mi>i</mi></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">I_{S}(V) = Var(S) - \sum_{i} \frac{| S_i |}{|S|} Var(S_i)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07847em;">I</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.32833099999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.07847em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.05764em;">S</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.22222em;">V</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.22222em;">V</span><span class="mord mathdefault">a</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.05764em;">S</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.53em;vertical-align:-0.52em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.01em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">∣</span><span class="mord mathdefault mtight" style="margin-right:0.05764em;">S</span><span class="mord mtight">∣</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.485em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">∣</span><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.05764em;">S</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:-0.05764em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span><span class="mord mtight">∣</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.52em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord mathdefault" style="margin-right:0.22222em;">V</span><span class="mord mathdefault">a</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05764em;">S</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.05764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></p> -<h3 id="generalization-hierarchy">Generalization hierarchy</h3> -<p><img src="_resources/dc5bb005d60e400e9026666b704139da.png" alt="154f1d15c7808fb3db6bf33d60c61bbb.png"></p> -<h2 id="ensembling-methods">Ensembling methods</h2> -<p>Bias and variance:</p> -<p><img src="_resources/2b81999a96204ddeb7dc539b580b8dbb.png" alt="ba23c24af56d5203ba58cdc8b1b3f8d8.png"></p> -<p>Real-life example:</p> -<ul> -<li>grading by rubric: high bias, low variance</li> -<li>grading by TA: low bias, high variance</li> -</ul> -<p>Bootstrapping (from methodology 2):</p> -<ul> -<li>sample with replacement a dataset of same size as whole dataset</li> -<li>each bootstrapped sample lets you repeat your experiment</li> -<li>better than cross validation for small datasets</li> -<li>but some classifiers don't like duplicates</li> -</ul> -<p>Ensembling is:</p> -<ul> -<li>used in production to get better performance from model</li> -<li>never used in research, we can improve any model by boosting</li> -<li>can be expensive for big models</li> -</ul> -<p>Bagging reduces variance, boosting reduces bias.</p> -<h3 id="bagging">Bagging</h3> -<p>Bagging: <strong>b</strong>ootstrap <strong>agg</strong>regating</p> -<ul> -<li>resample k datasets, train k models. this is the ensemble</li> -<li>ensemble classifies by majority vote -<ul> -<li>for class probabilities, use relative freq among votes</li> -</ul> -</li> -</ul> -<p>Random forest: bagging with decision trees</p> -<ul> -<li>subsample data and features for each model in ensemble</li> -<li>pro: reduces variance, few hyperparameters, easy to parallelize</li> -<li>con: no reduction of bias</li> -</ul> -<h3 id="boosting">Boosting</h3> -<p>train some classifier m0, then iteratively train more classifiers.<br> -increase weights for instances misclassified by a classifier.<br> -train the next iteration on reweighted data.</p> -<p>weighted loss function: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi><mo stretchy="false">(</mo><mi>θ</mi><mo stretchy="false">)</mo><mo>=</mo><msub><mo>∑</mo><mi>i</mi></msub><msub><mi>w</mi><mi>i</mi></msub><mo stretchy="false">(</mo><msub><mi>f</mi><mi>θ</mi></msub><mo stretchy="false">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo stretchy="false">)</mo><mo>−</mo><msub><mi>t</mi><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">loss(\theta) = \sum_{i} w_{i} (f_{\theta}(x_i) - t_i)^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.0497100000000001em;vertical-align:-0.29971000000000003em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.16195399999999993em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.29971000000000003em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">i</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.10764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02778em;">θ</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault">t</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span></p> -<p>or resample data by the weights. the weight determines how likely an instance is to end up in the resampled data.</p> -<p>boosting works even if the model only classifies slightly better than random.</p> -<h4 id="adaboost-binary-classifiers">AdaBoost (binary classifiers)</h4> -<p>TODO: think of a better way to explain this for the exam</p> -<p>each model fits a reweighted dataset, each model defines its own reweighted loss.</p> -<h4 id="gradient-boosting">Gradient boosting</h4> -<p>boosting for regression models</p> -<p>fit the next model to the residuals of the current ensemble</p> -<p>each model fits (pseudo) residuals of previous model, ensemble optimises a global loss -- even if individual models don't optimise a well-defined loss.</p> -<h3 id="stacking">Stacking</h3> -<p>When you want to combine some number of existing models into a single model.<br> -Simply compute outputs of the models for every point in the dataset, and add them to the dataset as a column.<br> -Then, train a new model ('combiner') on this extended data (usually a logistic regression). If NNs are used for the ensemble, the whole thing turns into a big NN.</p> -</div></div> - </body> - </html> diff --git a/content/ml-notes/_resources/2b81999a96204ddeb7dc539b580b8dbb.png b/content/ml-notes/Tree models and ensembles/2b81999a96204ddeb7dc539b580b8dbb.png Binary files differ. diff --git a/content/ml-notes/_resources/dc5bb005d60e400e9026666b704139da.png b/content/ml-notes/Tree models and ensembles/dc5bb005d60e400e9026666b704139da.png Binary files differ. diff --git a/content/ml-notes/Tree models and ensembles/index.md b/content/ml-notes/Tree models and ensembles/index.md @@ -0,0 +1,109 @@ ++++ +title = 'Tree models and ensembles' +template = 'page-math.html' ++++ +# Tree models & ensembles + +## Tree models +### Decision trees (categorical) +Work on numerical and categorical features + +Standard decision tree learning algorithm (ID3, C45): +- start with empty tree +- extend step by step + - stop when: all inputs are same (no more features left), or all outputs are same (all instances same class) +- greedy (no backtracking) +- choose the split that creates least uniform distribution over class labels in the resulting segmentation + - entropy is measure of uniformity of distribution + - recall, entropy $H(p) = - \sum_{x \in X} P(x) \log{P(x)}$ + - conditional entropy: $H(X|Y) = \sum_{y} P(y) H(X | Y = y)$ + - information gain of Y: $I_{X}(Y) = H(X) - H(X | Y)$ + - so, pick the one with the highest information gain + +The algorithm in steps: +1. start with single unlabeled leaf +2. loop until no unlabeled leaves: + - for each unlabeled leaf l with segment s: + - if stop condition, label majority class of S + - stop when: all inputs are same (no more features left), or all outputs are same (all instances same class) + - else split L on feature F with highest gain Is(F) + +With categoric features, it doesn't make sense to split on the same feature twice. + +### Regression trees (numeric) +For numeric features, split at a numeric threshold t. + +Of course there's a trade-off, complicated decision trees lead to overfitting. + +Pruning - for every split, ask whether the tree classifies better with or without the split (on validation data) + +Using validation data: test is only for final testing, validation for hyperparameter selection. If you want to control search, split training data and use a part of it for 'validation'. + +Label the leaves with the one element, or take the mean. + +Instead of entropy, use $I_{S}(V) = Var(S) - \sum_{i} \frac{| S_i |}{|S|} Var(S_i)$ + +### Generalization hierarchy +![154f1d15c7808fb3db6bf33d60c61bbb.png](dc5bb005d60e400e9026666b704139da.png) + + +## Ensembling methods +Bias and variance: + +![ba23c24af56d5203ba58cdc8b1b3f8d8.png](2b81999a96204ddeb7dc539b580b8dbb.png) + +Real-life example: +- grading by rubric: high bias, low variance +- grading by TA: low bias, high variance + +Bootstrapping (from methodology 2): +- sample with replacement a dataset of same size as whole dataset +- each bootstrapped sample lets you repeat your experiment +- better than cross validation for small datasets +- but some classifiers don't like duplicates + +Ensembling is: +- used in production to get better performance from model +- never used in research, we can improve any model by boosting +- can be expensive for big models + +Bagging reduces variance, boosting reduces bias. + +### Bagging +Bagging: **b**ootstrap **agg**regating +- resample k datasets, train k models. this is the ensemble +- ensemble classifies by majority vote + - for class probabilities, use relative freq among votes + +Random forest: bagging with decision trees +- subsample data and features for each model in ensemble +- pro: reduces variance, few hyperparameters, easy to parallelize +- con: no reduction of bias + +### Boosting +train some classifier m0, then iteratively train more classifiers. +increase weights for instances misclassified by a classifier. +train the next iteration on reweighted data. + +weighted loss function: $loss(\theta) = \sum_{i} w_{i} (f_{\theta}(x_i) - t_i)^2$ + +or resample data by the weights. the weight determines how likely an instance is to end up in the resampled data. + +boosting works even if the model only classifies slightly better than random. + +#### AdaBoost (binary classifiers) +TODO: think of a better way to explain this for the exam + +each model fits a reweighted dataset, each model defines its own reweighted loss. + +#### Gradient boosting +boosting for regression models + +fit the next model to the residuals of the current ensemble + +each model fits (pseudo) residuals of previous model, ensemble optimises a global loss -- even if individual models don't optimise a well-defined loss. + +### Stacking +When you want to combine some number of existing models into a single model. +Simply compute outputs of the models for every point in the dataset, and add them to the dataset as a column. +Then, train a new model ('combiner') on this extended data (usually a logistic regression). If NNs are used for the ensemble, the whole thing turns into a big NN. diff --git a/content/ml-notes/_config.yaml b/content/ml-notes/_config.yaml @@ -1 +0,0 @@ -include: [_resources] diff --git a/content/ml-notes/_index.md b/content/ml-notes/_index.md @@ -0,0 +1,135 @@ ++++ +title = 'Machine Learning' ++++ +# Machine learning +Exam - [cheat sheet](formula-cheat-sheet.pdf) available for formulas! + +* [Introduction](introduction) + * [What is ML?](introduction#what-is-ml) + * [Supervised ML](introduction#supervised-ml) + * [Classification](introduction#classification) + * [Regression](introduction#regression) + * [Unsupervised ML](introduction#unsupervised-ml) + * [What isn't ML?](introduction#what-isn-t-ml) +* [Methodology](methodology) + * [Performing an experiment](methodology#performing-an-experiment) + * [What if you need to test many models?](methodology#what-if-you-need-to-test-many-models) + * [The modern recipe](methodology#the-modern-recipe) + * [Cross-validation](methodology#cross-validation) + * [What to report](methodology#what-to-report) + * [Classification](methodology#classification) + * [What's a good error (5%)?](methodology#what-s-a-good-error-5) + * [Performance metrics](methodology#performance-metrics) + * [Confusion matrix (contingency table)](methodology#confusion-matrix-contingency-table) + * [Precision and recall](methodology#precision-and-recall) + * [Regression](methodology#regression) + * [Errors & confidence intervals](methodology#errors-confidence-intervals) + * [The no-free-lunch theorem and principle](methodology#the-no-free-lunch-theorem-and-principle) + * [Cleaning your data](methodology#cleaning-your-data) + * [Missing data](methodology#missing-data) + * [Outliers](methodology#outliers) + * [Class imbalance](methodology#class-imbalance) + * [Choosing features](methodology#choosing-features) + * [Normalisation & standardisation](methodology#normalisation-standardisation) + * [Normalisation](methodology#normalisation) + * [Standardisation](methodology#standardisation) + * [Whitening](methodology#whitening) + * [Dimensionality reduction](methodology#dimensionality-reduction) +* [Linear models](linear-models) + * [Defining a model](linear-models#defining-a-model) + * [But which model fits best?](linear-models#but-which-model-fits-best) + * [Mean squared error loss](linear-models#mean-squared-error-loss) + * [Optimization & searching](linear-models#optimization-searching) + * [Black box optimisation](linear-models#black-box-optimisation) + * [Random search](linear-models#random-search) + * [Simulated annealing](linear-models#simulated-annealing) + * [Parallel search](linear-models#parallel-search) + * [Branching search](linear-models#branching-search) + * [Gradient descent](linear-models#gradient-descent) + * [Classification losses](linear-models#classification-losses) + * [Least-squares loss](linear-models#least-squares-loss) + * [Neural networks (feedforward)](linear-models#neural-networks-feedforward) + * [Overview](linear-models#overview) + * [Classification](linear-models#classification) + * [Dealing with loss - gradient descent & backpropagation](linear-models#dealing-with-loss-gradient-descent-backpropagation) + * [Support vector machines (SVMs)](linear-models#support-vector-machines-svms) + * [Summary of classification loss functions](linear-models#summary-of-classification-loss-functions) +* [Probability](probability) + * [Probability basics](probability#probability-basics) + * [Probability theory](probability#probability-theory) + * [(Naive) Bayesian classifiers](probability#naive-bayesian-classifiers) + * [Logistic "regression" (classifier)](probability#logistic-regression-classifier) + * [Information theory](probability#information-theory) + * [Maximum likelihood](probability#maximum-likelihood) + * [Normal distributions (Gaussians)](probability#normal-distributions-gaussians) + * [1D normal distribution (Gaussian)](probability#1d-normal-distribution-gaussian) + * [Regression with Gaussian errors](probability#regression-with-gaussian-errors) + * [n-D normal distribution (multivariate Gaussian)](probability#n-d-normal-distribution-multivariate-gaussian) + * [Gaussian mixture model](probability#gaussian-mixture-model) + * [Expectation-maximisation](probability#expectation-maximisation) +* [Deep learning](deep-learning) + * [Deep learning systems (autodiff engines)](deep-learning#deep-learning-systems-autodiff-engines) + * [Tensors](deep-learning#tensors) + * [Functions on tensors](deep-learning#functions-on-tensors) + * [Backpropagation revisited](deep-learning#backpropagation-revisited) + * [Multivariate chain rule](deep-learning#multivariate-chain-rule) + * [Backpropagation with tensors - matrix calculus](deep-learning#backpropagation-with-tensors-matrix-calculus) + * [Making deep neural nets work](deep-learning#making-deep-neural-nets-work) + * [Overcoming vanishing gradients](deep-learning#overcoming-vanishing-gradients) + * [Minibatch gradient descent](deep-learning#minibatch-gradient-descent) + * [Optimizers](deep-learning#optimizers) + * [Momentum](deep-learning#momentum) + * [Nesterov momentum](deep-learning#nesterov-momentum) + * [Adam](deep-learning#adam) + * [Regularizers](deep-learning#regularizers) + * [L2 regularizer](deep-learning#l2-regularizer) + * [L1 regulariser](deep-learning#l1-regulariser) + * [Dropout regularisation](deep-learning#dropout-regularisation) + * [Convolutional neural networks](deep-learning#convolutional-neural-networks) + * [Deep learning vs machine learning](deep-learning#deep-learning-vs-machine-learning) + * [Generators](deep-learning#generators) + * [Generative adversarial networks](deep-learning#generative-adversarial-networks) + * [Vanilla GANs](deep-learning#vanilla-gans) + * [Conditional GANs](deep-learning#conditional-gans) + * [CycleGAN](deep-learning#cyclegan) + * [StyleGAN](deep-learning#stylegan) + * [What can we do with a generator?](deep-learning#what-can-we-do-with-a-generator) + * [Autoencoders](deep-learning#autoencoders) + * [Turning an autoencoder into a generator](deep-learning#turning-an-autoencoder-into-a-generator) + * [Variational autoencoders](deep-learning#variational-autoencoders) +* [Tree models and ensembles](tree-models-and-ensembles) + * [Tree models](tree-models-and-ensembles#tree-models) + * [Decision trees (categorical)](tree-models-and-ensembles#decision-trees-categorical) + * [Regression trees (numeric)](tree-models-and-ensembles#regression-trees-numeric) + * [Generalization hierarchy](tree-models-and-ensembles#generalization-hierarchy) + * [Ensembling methods](tree-models-and-ensembles#ensembling-methods) + * [Bagging](tree-models-and-ensembles#bagging) + * [Boosting](tree-models-and-ensembles#boosting) + * [AdaBoost (binary classifiers)](tree-models-and-ensembles#adaboost-binary-classifiers) + * [Gradient boosting](tree-models-and-ensembles#gradient-boosting) + * [Stacking](tree-models-and-ensembles#stacking) +* [Sequences, models for sequential data](models-for-sequential-data) + * [Sequences](models-for-sequential-data#sequences) + * [Markov models](models-for-sequential-data#markov-models) + * [Embedding models](models-for-sequential-data#embedding-models) + * [Recurrent neural networks](models-for-sequential-data#recurrent-neural-networks) + * [LSTMs](models-for-sequential-data#lstms) +* [Matrix models](matrix-models) + * [Recommender systems](matrix-models#recommender-systems) + * [Matrix factorization](matrix-models#matrix-factorization) + * [Bias control](matrix-models#bias-control) + * [The 'cold start' problem](matrix-models#the-cold-start-problem) + * [Graph models](matrix-models#graph-models) + * [Validating embedding models](matrix-models#validating-embedding-models) +* [Reinforcement learning](reinforcement-learning) + * [What is reinforcement learning?](reinforcement-learning#what-is-reinforcement-learning) + * [Approaches](reinforcement-learning#approaches) + * [Random search](reinforcement-learning#random-search) + * [Policy gradient](reinforcement-learning#policy-gradient) + * [Q-learning](reinforcement-learning#q-learning) + * [Alpha-stuff](reinforcement-learning#alpha-stuff) + * [AlphaGo](reinforcement-learning#alphago) + * [AlphaZero](reinforcement-learning#alphazero) + * [AlphaStar](reinforcement-learning#alphastar) + +[Programming reference](programming-reference) diff --git a/content/ml-notes/_resources/04f80f36e94a4435b21fe321f7f58ecd.pdf b/content/ml-notes/formula-cheat-sheet.pdf Binary files differ. diff --git a/content/ml-notes/index.html b/content/ml-notes/index.html @@ -1,28 +0,0 @@ - - <!DOCTYPE html> - <html> - <head> - <meta charset="UTF-8"> - - <title>TOC: Machine Learning</title> - <link rel="stylesheet" href="pluginAssets/katex/katex.css" /></head> - <link rel="stylesheet" href="./style.css" /></head> - <body> - -<div id="rendered-md"><h1 id="toc-machine-learning">TOC: Machine learning</h1> -<p>Exam - <a data-from-md title='_resources/04f80f36e94a4435b21fe321f7f58ecd.pdf' href='_resources/04f80f36e94a4435b21fe321f7f58ecd.pdf' type=''>cheat sheet</a> available for formulas!</p> -<ul> -<li><a href="Introduction.html">Introduction</a></li> -<li><a href="Methodology.html">Methodology</a></li> -<li><a href="Linear models.html">Linear models</a></li> -<li><a href="Probability.html">Probability</a></li> -<li><a href="Deep learning.html">Deep learning</a></li> -<li><a href="Tree models and ensembles.html">Tree models and ensembles</a></li> -<li><a href="Models for sequential data.html">Models for sequential data</a></li> -<li><a href="Matrix models.html">Matrix models</a></li> -<li><a href="Reinforcement learning.html">Reinforcement learning</a></li> -</ul> -<p><a href="Programming reference.html">Programming reference</a></p> -</div></div> - </body> - </html> diff --git a/content/ml-notes/pluginAssets/highlight.js/atom-one-light.css b/content/ml-notes/pluginAssets/highlight.js/atom-one-light.css @@ -1,96 +0,0 @@ -/* - -Atom One Light by Daniel Gamage -Original One Light Syntax theme from https://github.com/atom/one-light-syntax - -base: #fafafa -mono-1: #383a42 -mono-2: #686b77 -mono-3: #a0a1a7 -hue-1: #0184bb -hue-2: #4078f2 -hue-3: #a626a4 -hue-4: #50a14f -hue-5: #e45649 -hue-5-2: #c91243 -hue-6: #986801 -hue-6-2: #c18401 - -*/ - -.hljs { - display: block; - overflow-x: auto; - padding: 0.5em; - color: #383a42; - background: #fafafa; -} - -.hljs-comment, -.hljs-quote { - color: #a0a1a7; - font-style: italic; -} - -.hljs-doctag, -.hljs-keyword, -.hljs-formula { - color: #a626a4; -} - -.hljs-section, -.hljs-name, -.hljs-selector-tag, -.hljs-deletion, -.hljs-subst { - color: #e45649; -} - -.hljs-literal { - color: #0184bb; -} - -.hljs-string, -.hljs-regexp, -.hljs-addition, -.hljs-attribute, -.hljs-meta-string { - color: #50a14f; -} - -.hljs-built_in, -.hljs-class .hljs-title { - color: #c18401; -} - -.hljs-attr, -.hljs-variable, -.hljs-template-variable, -.hljs-type, -.hljs-selector-class, -.hljs-selector-attr, -.hljs-selector-pseudo, -.hljs-number { - color: #986801; -} - -.hljs-symbol, -.hljs-bullet, -.hljs-link, -.hljs-meta, -.hljs-selector-id, -.hljs-title { - color: #4078f2; -} - -.hljs-emphasis { - font-style: italic; -} - -.hljs-strong { - font-weight: bold; -} - -.hljs-link { - text-decoration: underline; -} diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_AMS-Regular.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_AMS-Regular.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Caligraphic-Bold.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Caligraphic-Bold.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Caligraphic-Regular.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Caligraphic-Regular.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Fraktur-Bold.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Fraktur-Bold.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Fraktur-Regular.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Fraktur-Regular.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Main-Bold.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Main-Bold.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Main-BoldItalic.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Main-BoldItalic.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Main-Italic.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Main-Italic.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Main-Regular.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Main-Regular.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Math-BoldItalic.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Math-BoldItalic.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Math-Italic.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Math-Italic.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_SansSerif-Bold.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_SansSerif-Bold.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_SansSerif-Italic.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_SansSerif-Italic.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_SansSerif-Regular.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_SansSerif-Regular.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Script-Regular.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Script-Regular.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Size1-Regular.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Size1-Regular.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Size2-Regular.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Size2-Regular.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Size3-Regular.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Size3-Regular.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Size4-Regular.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Size4-Regular.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Typewriter-Regular.woff2 b/content/ml-notes/pluginAssets/katex/fonts/KaTeX_Typewriter-Regular.woff2 Binary files differ. diff --git a/content/ml-notes/pluginAssets/katex/katex.css b/content/ml-notes/pluginAssets/katex/katex.css @@ -1 +0,0 @@ -@font-face{font-family:KaTeX_AMS;src:url(fonts/KaTeX_AMS-Regular.woff2) format("woff2"),url(fonts/KaTeX_AMS-Regular.woff) format("woff"),url(fonts/KaTeX_AMS-Regular.ttf) format("truetype");font-weight:400;font-style:normal}@font-face{font-family:KaTeX_Caligraphic;src:url(fonts/KaTeX_Caligraphic-Bold.woff2) format("woff2"),url(fonts/KaTeX_Caligraphic-Bold.woff) format("woff"),url(fonts/KaTeX_Caligraphic-Bold.ttf) format("truetype");font-weight:700;font-style:normal}@font-face{font-family:KaTeX_Caligraphic;src:url(fonts/KaTeX_Caligraphic-Regular.woff2) format("woff2"),url(fonts/KaTeX_Caligraphic-Regular.woff) format("woff"),url(fonts/KaTeX_Caligraphic-Regular.ttf) format("truetype");font-weight:400;font-style:normal}@font-face{font-family:KaTeX_Fraktur;src:url(fonts/KaTeX_Fraktur-Bold.woff2) format("woff2"),url(fonts/KaTeX_Fraktur-Bold.woff) format("woff"),url(fonts/KaTeX_Fraktur-Bold.ttf) format("truetype");font-weight:700;font-style:normal}@font-face{font-family:KaTeX_Fraktur;src:url(fonts/KaTeX_Fraktur-Regular.woff2) format("woff2"),url(fonts/KaTeX_Fraktur-Regular.woff) format("woff"),url(fonts/KaTeX_Fraktur-Regular.ttf) format("truetype");font-weight:400;font-style:normal}@font-face{font-family:KaTeX_Main;src:url(fonts/KaTeX_Main-Bold.woff2) format("woff2"),url(fonts/KaTeX_Main-Bold.woff) format("woff"),url(fonts/KaTeX_Main-Bold.ttf) format("truetype");font-weight:700;font-style:normal}@font-face{font-family:KaTeX_Main;src:url(fonts/KaTeX_Main-BoldItalic.woff2) format("woff2"),url(fonts/KaTeX_Main-BoldItalic.woff) format("woff"),url(fonts/KaTeX_Main-BoldItalic.ttf) format("truetype");font-weight:700;font-style:italic}@font-face{font-family:KaTeX_Main;src:url(fonts/KaTeX_Main-Italic.woff2) format("woff2"),url(fonts/KaTeX_Main-Italic.woff) format("woff"),url(fonts/KaTeX_Main-Italic.ttf) format("truetype");font-weight:400;font-style:italic}@font-face{font-family:KaTeX_Main;src:url(fonts/KaTeX_Main-Regular.woff2) format("woff2"),url(fonts/KaTeX_Main-Regular.woff) format("woff"),url(fonts/KaTeX_Main-Regular.ttf) format("truetype");font-weight:400;font-style:normal}@font-face{font-family:KaTeX_Math;src:url(fonts/KaTeX_Math-BoldItalic.woff2) format("woff2"),url(fonts/KaTeX_Math-BoldItalic.woff) format("woff"),url(fonts/KaTeX_Math-BoldItalic.ttf) format("truetype");font-weight:700;font-style:italic}@font-face{font-family:KaTeX_Math;src:url(fonts/KaTeX_Math-Italic.woff2) format("woff2"),url(fonts/KaTeX_Math-Italic.woff) format("woff"),url(fonts/KaTeX_Math-Italic.ttf) format("truetype");font-weight:400;font-style:italic}@font-face{font-family:"KaTeX_SansSerif";src:url(fonts/KaTeX_SansSerif-Bold.woff2) format("woff2"),url(fonts/KaTeX_SansSerif-Bold.woff) format("woff"),url(fonts/KaTeX_SansSerif-Bold.ttf) format("truetype");font-weight:700;font-style:normal}@font-face{font-family:"KaTeX_SansSerif";src:url(fonts/KaTeX_SansSerif-Italic.woff2) format("woff2"),url(fonts/KaTeX_SansSerif-Italic.woff) format("woff"),url(fonts/KaTeX_SansSerif-Italic.ttf) format("truetype");font-weight:400;font-style:italic}@font-face{font-family:"KaTeX_SansSerif";src:url(fonts/KaTeX_SansSerif-Regular.woff2) format("woff2"),url(fonts/KaTeX_SansSerif-Regular.woff) format("woff"),url(fonts/KaTeX_SansSerif-Regular.ttf) format("truetype");font-weight:400;font-style:normal}@font-face{font-family:KaTeX_Script;src:url(fonts/KaTeX_Script-Regular.woff2) format("woff2"),url(fonts/KaTeX_Script-Regular.woff) format("woff"),url(fonts/KaTeX_Script-Regular.ttf) format("truetype");font-weight:400;font-style:normal}@font-face{font-family:KaTeX_Size1;src:url(fonts/KaTeX_Size1-Regular.woff2) format("woff2"),url(fonts/KaTeX_Size1-Regular.woff) format("woff"),url(fonts/KaTeX_Size1-Regular.ttf) format("truetype");font-weight:400;font-style:normal}@font-face{font-family:KaTeX_Size2;src:url(fonts/KaTeX_Size2-Regular.woff2) format("woff2"),url(fonts/KaTeX_Size2-Regular.woff) format("woff"),url(fonts/KaTeX_Size2-Regular.ttf) format("truetype");font-weight:400;font-style:normal}@font-face{font-family:KaTeX_Size3;src:url(fonts/KaTeX_Size3-Regular.woff2) format("woff2"),url(fonts/KaTeX_Size3-Regular.woff) format("woff"),url(fonts/KaTeX_Size3-Regular.ttf) format("truetype");font-weight:400;font-style:normal}@font-face{font-family:KaTeX_Size4;src:url(fonts/KaTeX_Size4-Regular.woff2) format("woff2"),url(fonts/KaTeX_Size4-Regular.woff) format("woff"),url(fonts/KaTeX_Size4-Regular.ttf) format("truetype");font-weight:400;font-style:normal}@font-face{font-family:KaTeX_Typewriter;src:url(fonts/KaTeX_Typewriter-Regular.woff2) format("woff2"),url(fonts/KaTeX_Typewriter-Regular.woff) format("woff"),url(fonts/KaTeX_Typewriter-Regular.ttf) format("truetype");font-weight:400;font-style:normal}.katex{font:normal 1.21em KaTeX_Main,Times New Roman,serif;line-height:1.2;text-indent:0;text-rendering:auto}.katex *{-ms-high-contrast-adjust:none!important}.katex .katex-version:after{content:"0.11.1"}.katex .katex-mathml{position:absolute;clip:rect(1px,1px,1px,1px);padding:0;border:0;height:1px;width:1px;overflow:hidden}.katex .katex-html>.newline{display:block}.katex .base{position:relative;white-space:nowrap;width:min-content}.katex .base,.katex .strut{display:inline-block}.katex .textbf{font-weight:700}.katex .textit{font-style:italic}.katex .textrm{font-family:KaTeX_Main}.katex .textsf{font-family:KaTeX_SansSerif}.katex .texttt{font-family:KaTeX_Typewriter}.katex .mathdefault{font-family:KaTeX_Math;font-style:italic}.katex .mathit{font-family:KaTeX_Main;font-style:italic}.katex .mathrm{font-style:normal}.katex .mathbf{font-family:KaTeX_Main;font-weight:700}.katex .boldsymbol{font-family:KaTeX_Math;font-weight:700;font-style:italic}.katex .amsrm,.katex .mathbb,.katex .textbb{font-family:KaTeX_AMS}.katex .mathcal{font-family:KaTeX_Caligraphic}.katex .mathfrak,.katex .textfrak{font-family:KaTeX_Fraktur}.katex .mathtt{font-family:KaTeX_Typewriter}.katex .mathscr,.katex .textscr{font-family:KaTeX_Script}.katex .mathsf,.katex .textsf{font-family:KaTeX_SansSerif}.katex .mathboldsf,.katex .textboldsf{font-family:KaTeX_SansSerif;font-weight:700}.katex .mathitsf,.katex .textitsf{font-family:KaTeX_SansSerif;font-style:italic}.katex .mainrm{font-family:KaTeX_Main;font-style:normal}.katex .vlist-t{display:inline-table;table-layout:fixed}.katex .vlist-r{display:table-row}.katex .vlist{display:table-cell;vertical-align:bottom;position:relative}.katex .vlist>span{display:block;height:0;position:relative}.katex .vlist>span>span{display:inline-block}.katex .vlist>span>.pstrut{overflow:hidden;width:0}.katex .vlist-t2{margin-right:-2px}.katex .vlist-s{display:table-cell;vertical-align:bottom;font-size:1px;width:2px;min-width:2px}.katex .msupsub{text-align:left}.katex .mfrac>span>span{text-align:center}.katex .mfrac .frac-line{display:inline-block;width:100%;border-bottom-style:solid}.katex .hdashline,.katex .hline,.katex .mfrac .frac-line,.katex .overline .overline-line,.katex .rule,.katex .underline .underline-line{min-height:1px}.katex .mspace{display:inline-block}.katex .clap,.katex .llap,.katex .rlap{width:0;position:relative}.katex .clap>.inner,.katex .llap>.inner,.katex .rlap>.inner{position:absolute}.katex .clap>.fix,.katex .llap>.fix,.katex .rlap>.fix{display:inline-block}.katex .llap>.inner{right:0}.katex .clap>.inner,.katex .rlap>.inner{left:0}.katex .clap>.inner>span{margin-left:-50%;margin-right:50%}.katex .rule{display:inline-block;border:0 solid;position:relative}.katex .hline,.katex .overline .overline-line,.katex .underline .underline-line{display:inline-block;width:100%;border-bottom-style:solid}.katex .hdashline{display:inline-block;width:100%;border-bottom-style:dashed}.katex .sqrt>.root{margin-left:.27777778em;margin-right:-.55555556em}.katex .fontsize-ensurer.reset-size1.size1,.katex .sizing.reset-size1.size1{font-size:1em}.katex .fontsize-ensurer.reset-size1.size2,.katex .sizing.reset-size1.size2{font-size:1.2em}.katex .fontsize-ensurer.reset-size1.size3,.katex .sizing.reset-size1.size3{font-size:1.4em}.katex .fontsize-ensurer.reset-size1.size4,.katex .sizing.reset-size1.size4{font-size:1.6em}.katex .fontsize-ensurer.reset-size1.size5,.katex .sizing.reset-size1.size5{font-size:1.8em}.katex .fontsize-ensurer.reset-size1.size6,.katex .sizing.reset-size1.size6{font-size:2em}.katex .fontsize-ensurer.reset-size1.size7,.katex .sizing.reset-size1.size7{font-size:2.4em}.katex .fontsize-ensurer.reset-size1.size8,.katex .sizing.reset-size1.size8{font-size:2.88em}.katex .fontsize-ensurer.reset-size1.size9,.katex .sizing.reset-size1.size9{font-size:3.456em}.katex .fontsize-ensurer.reset-size1.size10,.katex .sizing.reset-size1.size10{font-size:4.148em}.katex .fontsize-ensurer.reset-size1.size11,.katex .sizing.reset-size1.size11{font-size:4.976em}.katex .fontsize-ensurer.reset-size2.size1,.katex .sizing.reset-size2.size1{font-size:.83333333em}.katex .fontsize-ensurer.reset-size2.size2,.katex .sizing.reset-size2.size2{font-size:1em}.katex .fontsize-ensurer.reset-size2.size3,.katex .sizing.reset-size2.size3{font-size:1.16666667em}.katex .fontsize-ensurer.reset-size2.size4,.katex .sizing.reset-size2.size4{font-size:1.33333333em}.katex .fontsize-ensurer.reset-size2.size5,.katex .sizing.reset-size2.size5{font-size:1.5em}.katex .fontsize-ensurer.reset-size2.size6,.katex .sizing.reset-size2.size6{font-size:1.66666667em}.katex .fontsize-ensurer.reset-size2.size7,.katex .sizing.reset-size2.size7{font-size:2em}.katex .fontsize-ensurer.reset-size2.size8,.katex .sizing.reset-size2.size8{font-size:2.4em}.katex .fontsize-ensurer.reset-size2.size9,.katex .sizing.reset-size2.size9{font-size:2.88em}.katex .fontsize-ensurer.reset-size2.size10,.katex .sizing.reset-size2.size10{font-size:3.45666667em}.katex .fontsize-ensurer.reset-size2.size11,.katex .sizing.reset-size2.size11{font-size:4.14666667em}.katex .fontsize-ensurer.reset-size3.size1,.katex .sizing.reset-size3.size1{font-size:.71428571em}.katex .fontsize-ensurer.reset-size3.size2,.katex .sizing.reset-size3.size2{font-size:.85714286em}.katex .fontsize-ensurer.reset-size3.size3,.katex .sizing.reset-size3.size3{font-size:1em}.katex .fontsize-ensurer.reset-size3.size4,.katex .sizing.reset-size3.size4{font-size:1.14285714em}.katex .fontsize-ensurer.reset-size3.size5,.katex .sizing.reset-size3.size5{font-size:1.28571429em}.katex .fontsize-ensurer.reset-size3.size6,.katex .sizing.reset-size3.size6{font-size:1.42857143em}.katex .fontsize-ensurer.reset-size3.size7,.katex .sizing.reset-size3.size7{font-size:1.71428571em}.katex .fontsize-ensurer.reset-size3.size8,.katex .sizing.reset-size3.size8{font-size:2.05714286em}.katex .fontsize-ensurer.reset-size3.size9,.katex .sizing.reset-size3.size9{font-size:2.46857143em}.katex .fontsize-ensurer.reset-size3.size10,.katex .sizing.reset-size3.size10{font-size:2.96285714em}.katex .fontsize-ensurer.reset-size3.size11,.katex .sizing.reset-size3.size11{font-size:3.55428571em}.katex .fontsize-ensurer.reset-size4.size1,.katex .sizing.reset-size4.size1{font-size:.625em}.katex .fontsize-ensurer.reset-size4.size2,.katex .sizing.reset-size4.size2{font-size:.75em}.katex .fontsize-ensurer.reset-size4.size3,.katex .sizing.reset-size4.size3{font-size:.875em}.katex .fontsize-ensurer.reset-size4.size4,.katex .sizing.reset-size4.size4{font-size:1em}.katex .fontsize-ensurer.reset-size4.size5,.katex .sizing.reset-size4.size5{font-size:1.125em}.katex .fontsize-ensurer.reset-size4.size6,.katex .sizing.reset-size4.size6{font-size:1.25em}.katex .fontsize-ensurer.reset-size4.size7,.katex .sizing.reset-size4.size7{font-size:1.5em}.katex .fontsize-ensurer.reset-size4.size8,.katex .sizing.reset-size4.size8{font-size:1.8em}.katex .fontsize-ensurer.reset-size4.size9,.katex .sizing.reset-size4.size9{font-size:2.16em}.katex .fontsize-ensurer.reset-size4.size10,.katex .sizing.reset-size4.size10{font-size:2.5925em}.katex .fontsize-ensurer.reset-size4.size11,.katex .sizing.reset-size4.size11{font-size:3.11em}.katex .fontsize-ensurer.reset-size5.size1,.katex .sizing.reset-size5.size1{font-size:.55555556em}.katex .fontsize-ensurer.reset-size5.size2,.katex .sizing.reset-size5.size2{font-size:.66666667em}.katex .fontsize-ensurer.reset-size5.size3,.katex .sizing.reset-size5.size3{font-size:.77777778em}.katex .fontsize-ensurer.reset-size5.size4,.katex .sizing.reset-size5.size4{font-size:.88888889em}.katex .fontsize-ensurer.reset-size5.size5,.katex .sizing.reset-size5.size5{font-size:1em}.katex .fontsize-ensurer.reset-size5.size6,.katex .sizing.reset-size5.size6{font-size:1.11111111em}.katex .fontsize-ensurer.reset-size5.size7,.katex .sizing.reset-size5.size7{font-size:1.33333333em}.katex .fontsize-ensurer.reset-size5.size8,.katex .sizing.reset-size5.size8{font-size:1.6em}.katex .fontsize-ensurer.reset-size5.size9,.katex .sizing.reset-size5.size9{font-size:1.92em}.katex .fontsize-ensurer.reset-size5.size10,.katex .sizing.reset-size5.size10{font-size:2.30444444em}.katex .fontsize-ensurer.reset-size5.size11,.katex .sizing.reset-size5.size11{font-size:2.76444444em}.katex .fontsize-ensurer.reset-size6.size1,.katex .sizing.reset-size6.size1{font-size:.5em}.katex .fontsize-ensurer.reset-size6.size2,.katex .sizing.reset-size6.size2{font-size:.6em}.katex .fontsize-ensurer.reset-size6.size3,.katex .sizing.reset-size6.size3{font-size:.7em}.katex .fontsize-ensurer.reset-size6.size4,.katex .sizing.reset-size6.size4{font-size:.8em}.katex .fontsize-ensurer.reset-size6.size5,.katex .sizing.reset-size6.size5{font-size:.9em}.katex .fontsize-ensurer.reset-size6.size6,.katex .sizing.reset-size6.size6{font-size:1em}.katex .fontsize-ensurer.reset-size6.size7,.katex .sizing.reset-size6.size7{font-size:1.2em}.katex .fontsize-ensurer.reset-size6.size8,.katex .sizing.reset-size6.size8{font-size:1.44em}.katex .fontsize-ensurer.reset-size6.size9,.katex .sizing.reset-size6.size9{font-size:1.728em}.katex .fontsize-ensurer.reset-size6.size10,.katex .sizing.reset-size6.size10{font-size:2.074em}.katex .fontsize-ensurer.reset-size6.size11,.katex .sizing.reset-size6.size11{font-size:2.488em}.katex .fontsize-ensurer.reset-size7.size1,.katex .sizing.reset-size7.size1{font-size:.41666667em}.katex .fontsize-ensurer.reset-size7.size2,.katex .sizing.reset-size7.size2{font-size:.5em}.katex .fontsize-ensurer.reset-size7.size3,.katex .sizing.reset-size7.size3{font-size:.58333333em}.katex .fontsize-ensurer.reset-size7.size4,.katex .sizing.reset-size7.size4{font-size:.66666667em}.katex .fontsize-ensurer.reset-size7.size5,.katex .sizing.reset-size7.size5{font-size:.75em}.katex .fontsize-ensurer.reset-size7.size6,.katex .sizing.reset-size7.size6{font-size:.83333333em}.katex .fontsize-ensurer.reset-size7.size7,.katex .sizing.reset-size7.size7{font-size:1em}.katex .fontsize-ensurer.reset-size7.size8,.katex .sizing.reset-size7.size8{font-size:1.2em}.katex .fontsize-ensurer.reset-size7.size9,.katex .sizing.reset-size7.size9{font-size:1.44em}.katex .fontsize-ensurer.reset-size7.size10,.katex .sizing.reset-size7.size10{font-size:1.72833333em}.katex .fontsize-ensurer.reset-size7.size11,.katex .sizing.reset-size7.size11{font-size:2.07333333em}.katex .fontsize-ensurer.reset-size8.size1,.katex .sizing.reset-size8.size1{font-size:.34722222em}.katex .fontsize-ensurer.reset-size8.size2,.katex .sizing.reset-size8.size2{font-size:.41666667em}.katex .fontsize-ensurer.reset-size8.size3,.katex .sizing.reset-size8.size3{font-size:.48611111em}.katex .fontsize-ensurer.reset-size8.size4,.katex .sizing.reset-size8.size4{font-size:.55555556em}.katex .fontsize-ensurer.reset-size8.size5,.katex .sizing.reset-size8.size5{font-size:.625em}.katex .fontsize-ensurer.reset-size8.size6,.katex .sizing.reset-size8.size6{font-size:.69444444em}.katex .fontsize-ensurer.reset-size8.size7,.katex .sizing.reset-size8.size7{font-size:.83333333em}.katex .fontsize-ensurer.reset-size8.size8,.katex .sizing.reset-size8.size8{font-size:1em}.katex .fontsize-ensurer.reset-size8.size9,.katex .sizing.reset-size8.size9{font-size:1.2em}.katex .fontsize-ensurer.reset-size8.size10,.katex .sizing.reset-size8.size10{font-size:1.44027778em}.katex .fontsize-ensurer.reset-size8.size11,.katex .sizing.reset-size8.size11{font-size:1.72777778em}.katex .fontsize-ensurer.reset-size9.size1,.katex .sizing.reset-size9.size1{font-size:.28935185em}.katex .fontsize-ensurer.reset-size9.size2,.katex .sizing.reset-size9.size2{font-size:.34722222em}.katex .fontsize-ensurer.reset-size9.size3,.katex .sizing.reset-size9.size3{font-size:.40509259em}.katex .fontsize-ensurer.reset-size9.size4,.katex .sizing.reset-size9.size4{font-size:.46296296em}.katex .fontsize-ensurer.reset-size9.size5,.katex .sizing.reset-size9.size5{font-size:.52083333em}.katex .fontsize-ensurer.reset-size9.size6,.katex .sizing.reset-size9.size6{font-size:.5787037em}.katex .fontsize-ensurer.reset-size9.size7,.katex .sizing.reset-size9.size7{font-size:.69444444em}.katex .fontsize-ensurer.reset-size9.size8,.katex .sizing.reset-size9.size8{font-size:.83333333em}.katex .fontsize-ensurer.reset-size9.size9,.katex .sizing.reset-size9.size9{font-size:1em}.katex .fontsize-ensurer.reset-size9.size10,.katex .sizing.reset-size9.size10{font-size:1.20023148em}.katex .fontsize-ensurer.reset-size9.size11,.katex .sizing.reset-size9.size11{font-size:1.43981481em}.katex .fontsize-ensurer.reset-size10.size1,.katex .sizing.reset-size10.size1{font-size:.24108004em}.katex .fontsize-ensurer.reset-size10.size2,.katex .sizing.reset-size10.size2{font-size:.28929605em}.katex .fontsize-ensurer.reset-size10.size3,.katex .sizing.reset-size10.size3{font-size:.33751205em}.katex .fontsize-ensurer.reset-size10.size4,.katex .sizing.reset-size10.size4{font-size:.38572806em}.katex .fontsize-ensurer.reset-size10.size5,.katex .sizing.reset-size10.size5{font-size:.43394407em}.katex .fontsize-ensurer.reset-size10.size6,.katex .sizing.reset-size10.size6{font-size:.48216008em}.katex .fontsize-ensurer.reset-size10.size7,.katex .sizing.reset-size10.size7{font-size:.57859209em}.katex .fontsize-ensurer.reset-size10.size8,.katex .sizing.reset-size10.size8{font-size:.69431051em}.katex .fontsize-ensurer.reset-size10.size9,.katex .sizing.reset-size10.size9{font-size:.83317261em}.katex .fontsize-ensurer.reset-size10.size10,.katex .sizing.reset-size10.size10{font-size:1em}.katex .fontsize-ensurer.reset-size10.size11,.katex .sizing.reset-size10.size11{font-size:1.19961427em}.katex .fontsize-ensurer.reset-size11.size1,.katex .sizing.reset-size11.size1{font-size:.20096463em}.katex .fontsize-ensurer.reset-size11.size2,.katex .sizing.reset-size11.size2{font-size:.24115756em}.katex .fontsize-ensurer.reset-size11.size3,.katex .sizing.reset-size11.size3{font-size:.28135048em}.katex .fontsize-ensurer.reset-size11.size4,.katex .sizing.reset-size11.size4{font-size:.32154341em}.katex .fontsize-ensurer.reset-size11.size5,.katex .sizing.reset-size11.size5{font-size:.36173633em}.katex .fontsize-ensurer.reset-size11.size6,.katex .sizing.reset-size11.size6{font-size:.40192926em}.katex .fontsize-ensurer.reset-size11.size7,.katex .sizing.reset-size11.size7{font-size:.48231511em}.katex .fontsize-ensurer.reset-size11.size8,.katex .sizing.reset-size11.size8{font-size:.57877814em}.katex .fontsize-ensurer.reset-size11.size9,.katex .sizing.reset-size11.size9{font-size:.69453376em}.katex .fontsize-ensurer.reset-size11.size10,.katex .sizing.reset-size11.size10{font-size:.83360129em}.katex .fontsize-ensurer.reset-size11.size11,.katex .sizing.reset-size11.size11{font-size:1em}.katex .delimsizing.size1{font-family:KaTeX_Size1}.katex .delimsizing.size2{font-family:KaTeX_Size2}.katex .delimsizing.size3{font-family:KaTeX_Size3}.katex .delimsizing.size4{font-family:KaTeX_Size4}.katex .delimsizing.mult .delim-size1>span{font-family:KaTeX_Size1}.katex .delimsizing.mult .delim-size4>span{font-family:KaTeX_Size4}.katex .nulldelimiter{display:inline-block;width:.12em}.katex .delimcenter,.katex .op-symbol{position:relative}.katex .op-symbol.small-op{font-family:KaTeX_Size1}.katex .op-symbol.large-op{font-family:KaTeX_Size2}.katex .op-limits>.vlist-t{text-align:center}.katex .accent>.vlist-t{text-align:center}.katex .accent .accent-body{position:relative}.katex .accent .accent-body:not(.accent-full){width:0}.katex .overlay{display:block}.katex .mtable .vertical-separator{display:inline-block;min-width:1px}.katex .mtable .arraycolsep{display:inline-block}.katex .mtable .col-align-c>.vlist-t{text-align:center}.katex .mtable .col-align-l>.vlist-t{text-align:left}.katex .mtable .col-align-r>.vlist-t{text-align:right}.katex .svg-align{text-align:left}.katex svg{display:block;position:absolute;width:100%;height:inherit;fill:currentColor;stroke:currentColor;fill-rule:nonzero;fill-opacity:1;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1}.katex svg path{stroke:none}.katex img{border-style:none;min-width:0;min-height:0;max-width:none;max-height:none}.katex .stretchy{width:100%;display:block;position:relative;overflow:hidden}.katex .stretchy:after,.katex .stretchy:before{content:""}.katex .hide-tail{width:100%;position:relative;overflow:hidden}.katex .halfarrow-left{position:absolute;left:0;width:50.2%;overflow:hidden}.katex .halfarrow-right{position:absolute;right:0;width:50.2%;overflow:hidden}.katex .brace-left{position:absolute;left:0;width:25.1%;overflow:hidden}.katex .brace-center{position:absolute;left:25%;width:50%;overflow:hidden}.katex .brace-right{position:absolute;right:0;width:25.1%;overflow:hidden}.katex .x-arrow-pad{padding:0 .5em}.katex .mover,.katex .munder,.katex .x-arrow{text-align:center}.katex .boxpad{padding:0 .3em}.katex .fbox,.katex .fcolorbox{box-sizing:border-box;border:.04em solid}.katex .cancel-pad{padding:0 .2em}.katex .cancel-lap{margin-left:-.2em;margin-right:-.2em}.katex .sout{border-bottom-style:solid;border-bottom-width:.08em}.katex-display{display:block;margin:1em 0;text-align:center}.katex-display>.katex{display:block;text-align:center;white-space:nowrap}.katex-display>.katex>.katex-html{display:block;position:relative}.katex-display>.katex>.katex-html>.tag{position:absolute;right:0}.katex-display.leqno>.katex>.katex-html>.tag{left:0;right:auto}.katex-display.fleqn>.katex{text-align:left} diff --git a/content/ml-notes/style.css b/content/ml-notes/style.css @@ -1,38 +0,0 @@ -@charset 'UTF-8'; - -body { - margin: 0px; - padding: 1em; - background: #f3f2ed; - font-family: 'Lato', sans-serif; - font-size: 12pt; - font-weight: 300; - color: #8A8A8A; - padding-left: 50px; - line-height: 1.5; -} -h1 { - margin: 0px; - padding: 0px; - font-weight: 300; - text-align: center; -} -ul.toc li { - margin: 8px 0; -} -h3.name { - font-style: italic; - text-align: center; - font-weight: 300; - font-size: 20px; -} -a { - color: #D1551F; - } -a:hover { - color: #AF440F; -} - strong { - font-weight: 700; - color: #2A2A2A; - }