lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

handling-sensory-noise.md (3647B)


      1 +++
      2 title = 'Handling sensory noise'
      3 template = 'page-math.html'
      4 +++
      5 
      6 Removing noise: outliers, imputing missing values, transforming data.
      7 
      8 Outlier: observation point that's distant from other observations
      9 - may be caused by measurement error, or variability
     10 
     11 Remove with domain knowledge, or without.
     12 But be careful, don't want to remove valuable info.
     13 
     14 Outlier detection:
     15 - distribution based -- assume a certain distribution of data
     16 - distance based -- only look at distance between data points
     17 
     18 ## Distribution based
     19 ### Chauvenet's criterion
     20 Assume normal distribution, single attribute Xi.
     21 
     22 Take mean and standard dev for attribute j in the data set:
     23 
     24 $\mu = \frac{\sum_{n = 1}^{N} x_{n}^{j}}{N}$
     25 
     26 $\sigma = \sqrt { \frac{ \sum_{{n=1}}^{{N}} {\left({{x_{{n}}^{{j}}}  - \mu} \right)^2} }{N} } $
     27 
     28 Take those values as parameters for normal distribution.
     29 
     30 For each instance i for attribute j, compute probability of observation:
     31 
     32 $P(X \leq x_{i}^{j}) = \int_{-\infty}^{x_{i}^{j}}{\frac{1}{\sqrt{2 \sigma^2 \pi}} e^{-\frac{(u - \mu)^{2}}{2 \sigma^2}}} du$
     33 
     34 Define instance as outlier when:
     35 - $(1 - P(X \leq x_{i}^{j})) < \frac{1}{c \cdot N}$
     36 - $P(X \leq x_{i}^{j} < \frac{1}{c \cdot N}$
     37 
     38 Typical value for $c$ is 2.
     39 
     40 
     41 ### Mixture models
     42 Assuming data follows a single distribution might be too simple.
     43 So, assume it can be described with K normal distributions.
     44 
     45 $p(x) = \sum_{k=1}^{K} \pi_{k} \mathscr{N} (x | \mu_{k}, \sigma_{k})$ with $\sum_{k=1}^{k} \pi_{k} = 1 \quad \forall k: 0 < \pi_{k} \leq 1$
     46 
     47 Find best for parameters by maximizing likelihood: $L = \prod_{n=1}{N} p(x_{n}^{j})$
     48 
     49 For example with expectation maximization algorithm.
     50 
     51 ## Distance-based
     52 Use $d(x_{i}^{j}, x_{k}^{j})$ to represent distance between two values of attribute j.
     53 
     54 points are "close" if within distance $d_{min}$.
     55 points are outliers when they are more than a fraction $f_{min}$ away.
     56 
     57 ### Local outlier factor
     58 Takes density into account.
     59 
     60 Define $k_{dist}$ for point $x_{i}^{j}$ as largest distance to one of its k closest neighbors.
     61 
     62 Set of neighbors of $x_{i}^{j}$ within $k_{dist]$ is k-distance neighborhood.
     63 
     64 Reachability distance of $x_{i}^{j}$ to $x$: $k_{reach dist} (x_{i}^{j}, x) = \max (k_{dist}(x), d(x, x_{i}^{j}))$
     65 
     66 Define local reachability distance of point $x_{i}^{j}$ and compare to neighbors.
     67 
     68 ## Missing values
     69 Replace missing values by substituted value (imputation).
     70 Can use mean, mode, median.
     71 Or other attribute values in same instance, or values of same attributes from other instances.
     72 
     73 ## Combining outlier detection & imputation
     74 Kalman filter:
     75 - estimates expected values based on historical data
     76 - if observed value is an outlier, impute with the expected value
     77 
     78 Assume some latent state $s_{t}$ which can have multiple components.
     79 Data performs $x_t$ measurements about that state.
     80 
     81 Next value of state is: $s_{t} = F_{t} s_{t-1} + B_{t} u_{t} + w_{t}$
     82 - $u_{t}$ is control input state (like sending a message)
     83 - $w_{t}$ is white noise
     84 - $F_{t}$ and $B_{t}$ are matrices
     85 
     86 Measurement associated with $s_{t}$ is $x_{t} = H_{t} s_{t} + v_{t}$
     87 - $v_{t}$ is white noise
     88 
     89 For white noise, assume a normal distribution.
     90 Try to predict next state, and estimate prediction error (matrix of variances and covariances).
     91 Based on prediction, look at the error, and update prediction of the state.
     92 
     93 ## Transforming data
     94 Filter out more subtle noise.
     95 
     96 Lowpass filter: some data has periodicity, decompose series of values into different periodic signals and select most interesting frequencies.
     97 
     98 Principal component analysis: find new features explaining most of variability in data, select number of components based on explained variance.