2 Chapter 01

Author

Shao-Ting Chiu

Published

August 26, 2022

2.1 Lecture

A pattern is the opposite of randomness.
On the other hand, there is “randomness” between two events if they are independent.

For example, musical preference is independent of the occurrence of heart disease.

Pattern Recognition and Machine Learning have substantial overlap with each other.

Statistical pattern recognition
Synthetic pattern recognition
- on the other hand, it is not statistically reasoning.

A vector of measurements
- \(X\in R^d\)
- known as a feature vector
- a target \(Y\in R\) to be predicted
Feature vector
- \(X\)
Target vector
- \(Y\)
The relationship between \(X\) and \(Y\) (Figure 2.1)
- rarely determinsitic
- There is no function \(f\) such that \(Y=f(X)\)
- but express as a joint probability distribution \(P_{X,Y}\)
Source of uncertainty
- Latent factors
  - \(Y\) depends on factors that are not available.
- Measurement noise
  - The values of the predictor \(X\) itself

The pure data-driven method will ultimately fail.

Why is not everyone using Bayesian?

Bayesian method is complicated, especially the Bayesian inference.

Quadratic loss
- \(\mathcal{l}(\psi(X),Y) = (Y-\psi(X))^2\)
Absolute difference loss
- \(\mathcal{l}(\psi(X),Y) = |Y-\psi(X)|\)
Misclassification loss
- \(\mathcal{l}(\psi(X),Y) = 1, 0 (Y\neq \psi(X), Y=\psi(X))\)

When your target is a lable, you are taling about classification.

\(Y\in \{0,1,\cdots, c-1\}\)
Variable \(Y\) is called a label to emphasize that it has no numerica meaning.
Binary classification
Expection of a random variable is an event.

Regerssion
Unsupervised learning
- Error of the operation is not straightforward
- Ex: PCA and clustering
Semi-supervised learning
- Target \(Y\) is available for only a subpopulation of the feature vector \(X\)
- Some \(X\) doesn’t have \(Y\)
- See more explanation¹
Reinforcement learning
- decision are made in continuous interaction with an environment.
- The objective is to minimize a cost over the long run.
- Dynamic programming

\(\begin{align} \epsilon[\psi] = E[I_{Y\neq \psi(X)}] &= 0\cdot P(S_A=0) + 1\cdot P(S_A=1)\\ &= P(S_A=1) \end{align}\)