supervised-learning.md (1870B)
1 +++ 2 title = 'Supervised learning' 3 +++ 4 5 # Supervised learning 6 Learn functional relationship ("unknown target function") from an observation to a target. 7 8 Assume unknown conditional target distribution p(y|x). 9 If we calculate f(x), add noise from noise distribution (Bernoulli/categorical for discrete target, normal for continuous). 10 11 1. Separate dataset into training, validation, test. 12 2. Learn function that fits our observed data in training set 13 - should stratify training set if unbalanced (e.g. oversample) 14 3. Evaluate generalizability of function on test set 15 4. Stop learning process based on validation set. 16 5. If small dataset, use cross validation. 17 18 Error measure: 19 - assume hypothesis for target function h. How far is h from f (risk), what is its value per point (loss). 20 - approximate it using the data we have 21 - in sample error: error made on training set 22 - out of sample error: error that you make on all the other possible elements 23 - we try to minimize in sample error 24 25 Model selection: 26 - select hypothesis with lowest in-sample error on validation set 27 - watch out for overfitting, don't use too many features 28 - PAC ("probably approximately correct") learnable -- formal definition of an "almost perfect" model 29 - VC dimension: the max number of input vectors (points) that can be shattered (model can represent every possible labelling) 30 - all hypothesis sets with finite VC-dimension are PAC learnable 31 32 ## Predictive modeling without notion of time 33 1. Think about the learning setup (what do you want to learn) 34 2. Don't overfit, select features with forward and backward selection, consider regularization (punishing more complex models) 35 - forward selection: iteratively add most predictive feature 36 - backward selection: iteratively remove least predictive feature 37 - regularization: add term to error function to punish more complex models 38 39