profile pic
⌘ '
raccourcis clavier

example: to assume each class is a Gaussian

discriminant analysis

P(xy=1,μ0,μ1,β)=1a0exμ122P(x \mid y = 1, \mu_0, \mu_1, \beta) = \frac{1}{a_0} e^{-\|x-\mu_1\|^2_2}

maximum likelihood estimate

see also priori and posterior distribution

given Θ={μ1,μ2,β}\Theta = \{\mu_1, \mu_2, \beta\}:

arg maxΘP(ZΘ)=arg maxΘi=1nP(xi,yiΘ)\begin{aligned} \argmax_{\Theta} P(Z \mid \Theta) &= \argmax_{\Theta} \prod_{i=1}^{n} P(x^i, y^i \mid \Theta) \\ \end{aligned}

How can we predict the label of a new test point?

Or in another words, how can we run inference?

Check P(y=0X,Θ)P(y=1X,Θ)1\frac{P(y=0 \mid X, \Theta)}{P(y=1 \mid X, \Theta)} \ge 1

Generalization for correlated features

Gaussian for correlated features:

N(xμ,Σ)=1(2π)d/2Σ1/2exp(12(xμ)TΣ1(xμ))\mathcal{N}(x \mid \mu, \Sigma) = \frac{1}{(2 \pi)^{d/2}|\Sigma|^{1/2}} \exp (-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu))

Naive Bayes Classifier

assumption

Given the label, the coordinates are statistically independent

P(xy=k,Θ)=πjP(xjy=k,Θ)P(x \mid y = k, \Theta) = \pi_j P(x_j \mid y=k, \Theta)

idea: comparison between discriminative and generative models

Logistic regression

😄 fun fact: actually better for classification instead of regression problems

Assume there is a plane in Rd\mathbb{R}^d parameterized by WW

P(Y=1x,W)=ϕ(WTx)P(Y=0x,W)=1ϕ(WTx)ϕ(a)=11+ea\begin{aligned} P(Y = 1 \mid x, W) &= \phi (W^T x) \\ P(Y= 0 \mid x, W) &= 1 - \phi (W^T x) \\[12pt] &\because \phi (a) = \frac{1}{1+e^{-a}} \end{aligned}

maximum likelihood

1ϕ(a)=ϕ(a)1 - \phi (a) = \phi (-a) WML=arg maxWP(xi,yiW)=arg maxWP(xi,yi,W)P(W)=arg maxWP(yixi,W)P(xi)=arg maxW[P(xi)][P(yixi,W)]=arg maxWi=1nlog(τ(yiWTxi))\begin{aligned} W^{\text{ML}} &= \argmax_{W} \prod P(x^i, y^i \mid W) \\ &= \argmax_{W} \prod \frac{P(x^i, y^i, W)}{P(W)} \\ &= \argmax_{W} \prod P(y^i | x^i, W) P(x^i) \\ &= \argmax_{W} \lbrack \prod P(x^i) \rbrack \lbrack \prod P(y^i \mid x^i, W) \rbrack \\ &= \argmax_{W} \sum_{i=1}^{n} \log (\tau (y^i W^T x^i)) \end{aligned}

equivalent form

maximize the following:

i=1n(yilogpi+(1yi)log(1pi))\sum_{i=1}^{n} (y^i \log p^i + (1-y^i) \log (1-p^i))

softmax

softmax(y)i=eyiieyi\text{softmax(y)}_i = \frac{e^{y_i}}{\sum_{i} e^{y_i}}

where yRky \in \mathbb{R}^k

Lien vers l'original

cross entropy

Lien vers l'original
Lien vers l'original