likelihood
maximum likelihood estimation
α=argmaxP(X∣α)=argmin−i∑log(P(xi∣α))
P(α) captures a priori distribution of α.
P(α∣X) is the posterior distribution of α given X.
maximum a posteriori estimation
αMAP=argmaxP(α∣X)=αargmaxP(X)P(X∣α)P(α)=αargmin(−logP(α))−i=1∑nlogP(xi∣α)
WargmaxP(x∣α)P(α)=Wargmax[logP(α)+i∑log(xi,yi∣W)]=Wargmax[lnβ1−λ∥W∥22−σ2(xiTW−yi)2]
P(W)=β1eλ∥W∥22
P(W)=β1er2λ∥W∥22
WargmaxP(Z∣α)=Wargmax∑logP(xi,yi∣W)
P(y∣x,W)=γ1e−2σ2(xTW−y)2
expected error minimisation
think of it as bias-variance tradeoff
Squared loss: l(y^,y)=(y−y^)2
solution to y∗=y^argminEX,Y(Y−y^(X))2 is E[Y∣X=x]
Instead we have Z={(xi,yi)}i=1n
error decomposition
Ex,y(y−yZ^(x))2=Exy(y−y∗(x))2+Ex(y∗(x)−yZ^(x))2=noise+estimation error
bias-variance decompositions
For linear estimator:
EZ=Ex,y(y−(y^Z(x):=WZTx))2Ex,y(y−y∗(x))2noise+Ex(y∗(x)−EZ(yZ^(x)))2bias+ExEZ(yZ^(x)−EZ(yZ^(x)))2variance