nearest neighbour

See also: slides 13, slides 14, slides 15

expected error minimisation

think of it as bias-variance tradeoff

Squared loss: $l(\hat{y},y)=(y-\hat{y})^2$

solution to $y^* = \argmin_{\hat{y}} E_{X,Y}(Y-\hat{y}(X))^2$ is $E[Y | X=x]$

Instead we have $Z = \{(x^i, y^i)\}^n_{i=1}$

error decomposition

\begin{aligned} &E_{x,y}(y-\hat{y_Z}(x))^2 \\ &= E_{xy}(y-y^{*}(x))^2 + E_x(y^{*}(x) - \hat{y_Z}(x))^2 \\ &= \text{noise} + \text{estimation error} \end{aligned}

bias-variance decompositions

For linear estimator:

\begin{aligned} E_Z&E_{x,y}(y-(\hat{y}_Z(x)\coloneqq W^T_Zx))^2 \\ =& E_{x,y}(y-y^{*}(x))^2 \quad \text{noise} \\ &+ E_x(y^{*}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{bias} \\ &+ E_xE_Z(\hat{y_Z}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{variance} \end{aligned}

accuracy

zero-one loss:

l^{0-1}(y, \hat{y}) = 1_{y \neq \hat{y}}= \begin{cases} 1 & y \neq \hat{y} \\\ 0 & y = \hat{y} \end{cases}

linear classifier

\begin{aligned} \hat{y}_W(x) &= \text{sign}(W^T x) = 1_{W^T x \geq 0} \\[8pt] &\because \hat{W} = \argmin_{W} L_{Z}^{0-1} (\hat{y}_W) \end{aligned}

surrogate loss functions

assume classifier returns a discrete value $\hat{y}_W = \text{sign}(W^T x) \in \{0,1\}$

What if classifier's output is continuous?

$\hat{y}$ will also capture the “confidence” of the classifier.

Think of contiguous loss function: margin loss, cross-entropy/negative log-likelihood, etc.

linearly separable data

linearly separable

A binary classification data set $Z=\{(x^i, y^i)\}_{i=1}^{n}$ is linearly separable if there exists a $W^{*}$ such that:

$\forall i \in [n] \mid \text{SGN}(<x^i, W^{*}>) = y^i$

Or, for every $i \in [n]$ we have $(W^{* T}x^i)y^i > 0$

linear programming

\begin{aligned} \max_{W \in \mathbb{R}^d} &\langle{u, w} \rangle = \sum_{i=1}^{d} u_i w_i \\ &\text{s.t } A w \ge v \end{aligned}

Given that data is linearly separable

\begin{aligned} \exists \space W^{*} &\mid \forall i \in [n], ({W^{*}}^T x^i)y^i > 0 \\ \exists \space W^{*}, \gamma > 0 &\mid \forall i \in [n], ({W^{*}}^T x^i)y^i \ge \gamma \\ \exists \space W^{*} &\mid \forall i \in [n], ({W^{*}}^T x^i)y^i \ge 1 \end{aligned}

LP for linear classification

Define $A = [x_j^iy^i]_{n \times d}$
find optimal $W$ equivalent to
$\begin{aligned} \max_{w \in \mathbb{R}^d} &\langle{\vec{0}, w} \rangle \\ & \text{s.t. } Aw \ge \vec{1} \end{aligned}$

perceptron

Rosenblatt’s perceptron algorithm

"\\begin{algorithm}\n\\caption{Batch Perceptron}\n\\begin{algorithmic}\n\\REQUIRE Training set $(\\mathbf{x}_1, y_1),\\ldots,(\\mathbf{x}_m, y_m)$\n\\STATE Initialize $\\mathbf{w}^{(1)} = (0,\\ldots,0)$\n\\FOR{$t = 1,2,\\ldots$}\n \\IF{$(\\exists \\space i \\text{ s.t. } y_i\\langle\\mathbf{w}^{(t)}, \\mathbf{x}_i\\rangle \\leq 0)$}\n \\STATE $\\mathbf{w}^{(t+1)} = \\mathbf{w}^{(t)} + y_i\\mathbf{x}_i$\n \\ELSE\n \\STATE \\textbf{output} $\\mathbf{w}^{(t)}$\n \\ENDIF\n\\ENDFOR\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 9 Batch Perceptron

Require: Training set $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$

1:Initialize $\mathbf{w}^{(1)} = (0,\ldots,0)$

2:for $t = 1,2,\ldots$ do

3:if $(\exists \space i \text{ s.t. } y_i\langle\mathbf{w}^{(t)}, \mathbf{x}_i\rangle \leq 0)$ then

4: $\mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} + y_i\mathbf{x}_i$

5:else

6:output $\mathbf{w}^{(t)}$

7:end if

8:end for

greedy update

\begin{aligned} W_{\text{new}}^T x^i y^i &= \langle W_{\text{old}}+ y^i x^i, x^i \rangle y^i \\ &=W_{\text{old}}^T x^{i} y^{i} + \|x^i\|_2^2 y^{i} y^{i} \end{aligned}

proof

Bibliographie

Novikoff, A. B. J. (1962). On Convergence Proofs for Perceptrons. Proceedings of the Symposium on the Mathematical Theory of Automata, 12, 615–622. https://apps.dtic.mil/sti/tr/pdf/AD0298258.pdf

nearest neighbour

Étiquette

publié à

modifié à

durée

source

expected error minimisation

error decomposition

bias-variance decompositions

accuracy

linear classifier

surrogate loss functions

linearly separable data

linear programming

LP for linear classification

perceptron

greedy update

proof

Bibliographie

Vous pourriez aimer ce qui suit

Liens retour