Support Vector Machine
idea: maximises margin and more robust to “perturbations”
Euclidean distance between two points x and the hyperplane parametrised by W is:
∥W∥2∣WTx+b∣
Assuming ∥W∥2=1 then the distance is ∣WTx+b∣
SVMs are good for high-dimensional data
We can probably use a solver, or gradient descent
maximum margin hyperplane
W has γ margin if
WTx+b≥γ WTx+b≤−γ ∀ blue x∀ red x
Margin:
Z={(xi,yi)}i=1n,y∈{−1,1},∥W∥2=1
hard-margin SVM
this is the version with bias
Algorithm 2 Hard-SVM
Require: Training set (x1,y1),…,(xm,ym)
1:solve: (w0,b0)=(w,b)argmin∥w∥2 s.t ∀i,yi(⟨w,xi⟩+b)≥1
2:output: w^=∥w0∥w0,b^=∥w0∥b0
Note that this version is sensitive to outliers
it assumes that training set is linearly separable
soft-margin SVM
can be applied even if the training set is not linearly separable
Algorithm 3 Soft-SVM
Require: Input (x1,y1),…,(xm,ym)
1:parameter: λ>0
2:solve: minw,b,ξ(λ∥w∥2+m1∑i=1mξi)
3:s.t: ∀i,yi(⟨w,xi⟩+b)≥1−ξiandξi≥0
4:output: w,b
Equivalent form of soft-margin SVM:
wminLShinge(w)(λ∥w∥2+LShinge(w))=m1i=1∑mmax({0,1−y⟨w,xi⟩})
SVM with basis functions
Wminn1∑max{0,1−yi⟨w,ϕ(xi)⟩}+λ∥w∥22
ϕ(xi) can be high-dimensional
representor theorem
W∗=Wargminn1∑max{0,1−yi⟨w,ϕ(xi)⟩}+λ∥w∥22
There are real values a1,…,am such that 1
W∗=∑aiϕ(xi)
kernelized SVM
from representor theorem, we have the kernel:
K(x,z)=⟨ϕ(x),ϕ(z)⟩
drawbacks
- prediction-time complexity
- need to store all training data
- Dealing with Kn×n
- choice of kernel, in which is tricky and pretty heuristic sometimes.