lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

supervised-learning.md (1870B)


      1 +++
      2 title = 'Supervised learning'
      3 +++
      4 
      5 # Supervised learning
      6 Learn functional relationship ("unknown target function") from an observation to a target.
      7 
      8 Assume unknown conditional target distribution p(y|x).
      9 If we calculate f(x), add noise from noise distribution (Bernoulli/categorical for discrete target, normal for continuous).
     10 
     11 1. Separate dataset into training, validation, test.
     12 2. Learn function that fits our observed data in training set
     13     - should stratify training set if unbalanced (e.g. oversample)
     14 3. Evaluate generalizability of function on test set
     15 4. Stop learning process based on validation set.
     16 5. If small dataset, use cross validation.
     17 
     18 Error measure:
     19 - assume hypothesis for target function h. How far is h from f (risk), what is its value per point (loss).
     20 - approximate it using the data we have
     21 - in sample error: error made on training set
     22 - out of sample error: error that you make on all the other possible elements
     23 - we try to minimize in sample error
     24 
     25 Model selection:
     26 - select hypothesis with lowest in-sample error on validation set
     27 - watch out for overfitting, don't use too many features
     28 - PAC ("probably approximately correct") learnable -- formal definition of an "almost perfect" model
     29 - VC dimension: the max number of input vectors (points) that can be shattered (model can represent every possible labelling)
     30 - all hypothesis sets with finite VC-dimension are PAC learnable
     31 
     32 ## Predictive modeling without notion of time
     33 1. Think about the learning setup (what do you want to learn)
     34 2. Don't overfit, select features with forward and backward selection, consider regularization (punishing more complex models)
     35     - forward selection: iteratively add most predictive feature
     36     - backward selection: iteratively remove least predictive feature
     37     - regularization: add term to error function to punish more complex models
     38 
     39