profile pic
⌘ '
raccourcis clavier

idea: maximises margin and more robust to “perturbations”

Euclidean distance between two points xx and the hyperplane parametrised by WW is:

WTx+bW2\frac{\mid W^T x + b \mid }{\|W\|_2}

Assuming W2=1\| W \|_2=1 then the distance is WTx+b\mid W^T x + b \mid

regularization

SVMs are good for high-dimensional data

We can probably use a solver, or gradient descent

maximum margin hyperplane

WW has γ\gamma margin if

WTx+bγ  blue xWTx+bγ  red x\begin{aligned} W^T x + b \ge \gamma \space &\forall \text{ blue x} \\ W^T x +b \le - \gamma \space &\forall \text{ red x} \end{aligned}

Margin:

Z={(xi,yi)}i=1n,y{1,1},W2=1Z = \{(x^{i}, y^{i})\}_{i=1}^{n}, y \in \{-1, 1\}, \|W\|_2 = 1

hard-margin SVM

this is the version with bias

"\\begin{algorithm}\n\\caption{Hard-SVM}\n\\begin{algorithmic}\n\\REQUIRE Training set $(\\mathbf{x}_1, y_1),\\ldots,(\\mathbf{x}_m, y_m)$\n\\STATE \\textbf{solve:} $(w_{0},b_{0}) = \\argmin\\limits_{(w,b)} \\|w\\|^2 \\text{ s.t } \\forall i, y_{i}(\\langle{w,x_i} \\rangle + b) \\ge 1$\n\\STATE \\textbf{output:} $\\hat{w} = \\frac{w_0}{\\|w_0\\|}, \\hat{b} = \\frac{b_0}{\\|w_0\\|}$\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 2 Hard-SVM

Require: Training set (x1,y1),,(xm,ym)(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)

1:solve: (w0,b0)=arg min(w,b)w2 s.t i,yi(w,xi+b)1(w_{0},b_{0}) = \argmin\limits_{(w,b)} \|w\|^2 \text{ s.t } \forall i, y_{i}(\langle{w,x_i} \rangle + b) \ge 1

2:output: w^=w0w0,b^=b0w0\hat{w} = \frac{w_0}{\|w_0\|}, \hat{b} = \frac{b_0}{\|w_0\|}

Note that this version is sensitive to outliers

it assumes that training set is linearly separable

soft-margin SVM

can be applied even if the training set is not linearly separable

"\\begin{algorithm}\n\\caption{Soft-SVM}\n\\begin{algorithmic}\n\\REQUIRE Input $(\\mathbf{x}_1, y_1),\\ldots,(\\mathbf{x}_m, y_m)$\n\\STATE \\textbf{parameter:} $\\lambda > 0$\n\\STATE \\textbf{solve:} $\\min_{\\mathbf{w}, b, \\boldsymbol{\\xi}} \\left( \\lambda \\|\\mathbf{w}\\|^2 + \\frac{1}{m} \\sum_{i=1}^m \\xi_i \\right)$\n\\STATE \\textbf{s.t: } $\\forall i, \\quad y_i (\\langle \\mathbf{w}, \\mathbf{x}_i \\rangle + b) \\geq 1 - \\xi_i \\quad \\text{and} \\quad \\xi_i \\geq 0$\n\\STATE \\textbf{output:} $\\mathbf{w}, b$\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 3 Soft-SVM

Require: Input (x1,y1),,(xm,ym)(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)

1:parameter: λ>0\lambda > 0

2:solve: minw,b,ξ(λw2+1mi=1mξi)\min_{\mathbf{w}, b, \boldsymbol{\xi}} \left( \lambda \|\mathbf{w}\|^2 + \frac{1}{m} \sum_{i=1}^m \xi_i \right)

3:s.t: i,yi(w,xi+b)1ξiandξi0\forall i, \quad y_i (\langle \mathbf{w}, \mathbf{x}_i \rangle + b) \geq 1 - \xi_i \quad \text{and} \quad \xi_i \geq 0

4:output: w,b\mathbf{w}, b

Equivalent form of soft-margin SVM:

minw(λw2+LShinge(w))LShinge(w)=1mi=1mmax({0,1yw,xi})\begin{aligned} \min_{w} &(\lambda \|w\|^2 + L_S^{\text{hinge}}(w)) \\[8pt] L_{S}^{\text{hinge}}(w) &= \frac{1}{m} \sum_{i=1}^{m} \max{(\{0, 1 - y \langle w, x_i \rangle\})} \end{aligned}

SVM with basis functions

minW1nmax{0,1yiw,ϕ(xi)}+λw22\min_{W} \frac{1}{n} \sum \max \{0, 1 - y^i \langle w, \phi(x^i) \rangle\} + \lambda \|w\|^2_2

ϕ(xi)\phi(x^i) can be high-dimensional

representor theorem

W=arg minW1nmax{0,1yiw,ϕ(xi)}+λw22W^{*} = \argmin_{W} \frac{1}{n} \sum \max \{0, 1- y^i \langle w, \phi (x^i) \rangle\} + \lambda \|w\|^2_2

theorem

There are real values a1,,ama_{1},\ldots,a_{m} such that 1

W=aiϕ(xi)W^{*} = \sum a_i \phi(x^i)

kernelized SVM

from representor theorem, we have the kernel:

K(x,z)=ϕ(x),ϕ(z)K(x,z) = \langle \phi(x), \phi(z) \rangle

drawbacks

  • prediction-time complexity
  • need to store all training data
  • Dealing with Kn×n\mathbf{K}_{n \times n}
  • choice of kernel, in which is tricky and pretty heuristic sometimes.

Remarque

  1. note that we can also write aTϕa^T \phi where ϕ=[ϕ(x1),,ϕ(xn)]T\phi = [\phi(x^1),\ldots,\phi(x^n)]^T