3 Chapter 2: Optimal Classification
3.1 Stoachastic analysis
3.2 Classification without predictors
\[ \hat{Y}= \begin{cases} 0, & P(Y=0) \geq P(Y=1)\\ 1, & P(Y=1) > P(Y=0)\\ \end{cases} \]
- \(E[Y] = P(Y=1)\)
Posterior-probability function
\[\eta(x) = E[Y|X=x] = P(Y=1|X=x), \quad x\in R^d\]
3.3 Classification error
\[\epsilon[\psi] = p(\psi(X)\neq Y) = p(\{(x,y)|\psi(x)\neq y\})\]
The classification error is determined by the feature-label distribution \(P_{X,Y}\)
3.4 Class-specific errors
3.5 Bayes Classifier
\[\psi^{*} = \arg\min_{\psi\in\mathcal{C}} P(\psi(X) \neq Y) \]

3.6 Feature transformation
\[\epsilon^{*}(x,y) \geq \epsilon^{*}(X', y)\quad \text{with } X = t^{-1}(X')\]