index.md (3632B)
1 +++ 2 title = 'Introduction' 3 template = 'page-math.html' 4 +++ 5 # Introduction 6 ## What is ML? 7 Deductive vs inductive reasoning: 8 9 * Deductive (conclusion by logic): discrete, unambiguous, provable, known rules 10 * Inductive (conclusion from experience): fuzzy, ambiguous, experimental, unknown rules 11 12 ML lets systems learn and improve from experience without being explicitly programmed (for a specific situation). 13 14 Used in software, analytics, data mining, data science, statistics. 15 16 Problem is suitable for ML _if we can't solve it explicitly_. 17 18 * when approximate solutions are ok 19 * when reliability is not the biggest focus 20 21 Why don't we have explicit solutions? Sometimes could be expensive, or could change over time, or other reasons. 22 23 ![overview-diagram.png](6610df2f6a4a4d21ad34c09c3468f115.png) 24 25 An intelligent agent: 26 27 * online learning: acting + learning simultaneously 28 * reinforcement learning: online learning in a world based on delayed feedback 29 30 Offline learning: separate learning and acting 31 32 * take fixed dataset of examples 33 * train model on that dataset 34 * test the model, and if it works, use it in prod 35 36 ## Supervised ML 37 Supervised: explicit examples of input and output. Learn to predict output for unseen input. 38 39 learning tasks: 40 41 * classification: assign class to each example 42 * regression: assign number to each example 43 44 ### Classification 45 how do you reduce a problem to classification? e.g. every pixel in a grayscale image is a feature, label each feature 46 47 classification: output labels are classes (categorical data) 48 49 linear classifier: just draw a line, plane, or hyperplane 50 51 * feature space: contains features 52 * model space: contains models. the bright spots have low loss. 53 * loss function: performance of model on data, the lower the better 54 55 decision tree classifier: every node is a condition for a feature, go down branch based on condition. would look like a step function in a graph. 56 57 k-nearest-neighbors: lazy, doesn't do anything, just remembers the data (?? have to look this up in more detail) 58 features: numerical or categorical 59 60 binary classification: only have two classes 61 62 multiclass classification: more than two classes 63 64 ### Regression 65 regression: output labels are numbers. the model we're trying to learn is a function from feature space to ℜ 66 67 loss function: maps model to number that expresses how well it fits the data 68 69 common example: $loss(p) = \frac{1}{n} \sum_i (f_p (x_i) - y_i)^2$ 70 71 takes difference between model prediction and target value (residual), then square and sum all residuals 72 73 overfitting: the model is too specific to the data, it's memorizing the data instead of generalizing 74 75 split test and training data. don't judge performance on training data, the aim is to minimise loss on _test_ data. 76 77 ## Unsupervised ML 78 Unsupervised: only inputs provided, find _any_ pattern that explains something about data. 79 80 learning tasks: 81 82 * clustering: classification, except no target column, so model outputs cluster id 83 * density estimation: model outputs a number (probability density), should be high for instances of data that are likely. e.g. fitting prob distribution to data 84 * generative modeling: build a model from which you can sample new examples 85 86 ## What isn't ML? 87 ML is a subdomain of AI. 88 89 * AI, but not ML: automated reasoning, planning 90 * Data Science, not ML: gathering, harmonising, and interpreting data 91 * Data mining is more closely related, but e.g. finding fraud in transaction networks is closer to data mining 92 * Stats wants to figure out the truth, whereas with ML it just has to work well enough, but doesn't necessarily have to be true 93 94