probabilitic modeling
example: to assume each class is a Gaussian
discriminant analysis
P(x∣y=1,μ0,μ1,β)=a01e−∥x−μ1∥22
maximum likelihood estimate
see also priori and posterior distribution
given Θ={μ1,μ2,β}:
ΘargmaxP(Z∣Θ)=Θargmaxi=1∏nP(xi,yi∣Θ)
How can we predict the label of a new test point?
Or in another words, how can we run inference?
Check P(y=1∣X,Θ)P(y=0∣X,Θ)≥1
Generalization for correlated features
Gaussian for correlated features:
N(x∣μ,Σ)=(2π)d/2∣Σ∣1/21exp(−21(x−μ)TΣ−1(x−μ))
Naive Bayes Classifier
Given the label, the coordinates are statistically independent
P(x∣y=k,Θ)=πjP(xj∣y=k,Θ)
idea: comparison between discriminative and generative models
😄 fun fact: actually better for classification instead of regression problems
Assume there is a plane in Rd parameterized by W
P(Y=1∣x,W)P(Y=0∣x,W)=ϕ(WTx)=1−ϕ(WTx)∵ϕ(a)=1+e−a1
maximum likelihood
1−ϕ(a)=ϕ(−a)
WML=Wargmax∏P(xi,yi∣W)=Wargmax∏P(W)P(xi,yi,W)=Wargmax∏P(yi∣xi,W)P(xi)=Wargmax[∏P(xi)][∏P(yi∣xi,W)]=Wargmaxi=1∑nlog(τ(yiWTxi))
maximize the following:
i=1∑n(yilogpi+(1−yi)log(1−pi))
softmax
softmax(y)i=∑ieyieyi
where y∈Rk