[베이지안 딥러닝] Ch4.4 Linear Models for Classification

2020-2학기 이화여대 김정태 교수님 강의 내용을 바탕으로 본 글을 작성하였습니다.

Overview

Discriminant Functions
Probabilistic Generative Models
Probabilistic Discriminative Models
The Laplace Approximation
Bayesian Logistic Regression

베이지안 방법론을 적용한 logistic regression에 대해 추후 논의할 것인데, 이는 linear regression에 대한 베이지안 방법보다 더 복잡함. 특히, posterior distribution이 더 이상 gaussian distribution이 아니므로 매개변수 $\mathbf{w}$ 에 대해서 정확히 적분할 수가 없다. 따라서 특정 형태의 근사법을 사용해하는데, 그중에서도 단순하지만 널리사용되고 있는 Lapplace approximation에 대해 살펴보려고 한다.

Recap (Logistic regression)

The posterior probability of class $C_1$

$p(C_1|\phi) = y(\phi) = \sigma(\mathbf{w}^T\phi)$

위 식을 $\mathbf{w}$ 에 대해 $-\infty \sims \infty$ 구간에 적분 불가능 (확인 필요..)

The Laplace Approximation

▶ Why the Laplace Approximation is needed?

Gaussian approximation to a probability density defined over a set of continuous variables

라플라스 근사법의 목적은 연속 변수의 집합에 대해 정의된 확률 밀도의 가우시안 근사치를 찾는 것이다.

Consider the distribution $p(z)$ which is defined by

$p(z) = \frac{1}{Z} f(z) \tag{1}\label{1}$

where $Z$ is the unknown normalization coefficient.

우선, 단일 연속 변수 $z$ 의 경우, 식 (1)과 같이 정의 되는 분포 $p(z)$ 를 가정하자. $Z = \int f(z) d z$ 는 정규화 계수 (알려지지 않았다고 가정)

First, find the mode (peak)...
Taylor series expansion around $z_0$ ... (참고자료)
Normalized distribution $q(z)$ ...

[Figure 1] Laplace approximation for single variables

$M$ 차원의 multi-variable 경우에 대해 살펴보도록 하자.

Multi-variable distribution p(\mathbf{z}) = \frac{f(\mathbf{z})}{Z}
Hessian Matrix
Multivariate Gaussian

[Figure 3] Laplace approximation for vector

▶ Why the Laplace Approximation is needed?

For Laplace approximation, mode is usually first determined by some numerical methods
Next, the Hessian is usually approximated at the mode
Thanks to the central limit theorem, Laplace approximation to be most useful where the number of data points is relatively large → "data가 많아지면, laplace approximation은 유용함"
The most serious limitation of the Laplace framework is that it can fail to capture important global properties

[Figure 4] https://socratic.org/questions/what-is-a-bimodal-distribution

4.4.1 Model comparison and BIC (Bayesian Information Criterion)

분포 $p(\mathbf{z})$ 뿐만 아니라 정규화 상수 $Z$ 의 근사치도 구할 수 있음

The normalization constant using Laplace approximation. (아래 참고)

model evidence → "베이지안 모델 비교에 있어서 중요한 역할"

The Occam factor penalizes model complexity
If we asssume that the Gaussian prior distribution over parameter is broad, and that the Hessian has full rank, then... BIC

Reference

Pattern Recognition and Machine Learning
PRML Example Code (git) : github.com/ctgk/PRML

저작자표시

'패턴인식과 머신러닝 > Ch 04. Linear Models for Classification' 카테고리의 다른 글

[베이지안 딥러닝] Ch4.5 Linear Models for Classification - Bayesian Logistic Regression (0)	2021.04.04
[베이지안 딥러닝] Ch4.3 Linear Models for Classification - Probabilistic Discriminative Models (0)	2020.12.07
[베이지안 딥러닝] Ch4.2 Linear Models for Classification - Probabilistic Generative Models (0)	2020.12.03
[베이지안 딥러닝] Ch4.1 Linear Models for Classification - Introduction , Discriminant Functions (0)	2020.11.18

화	수	목	금	토	일

+13°	+21°	+23°	+17°	+17°	+20°
+5°	+7°	+12°	+14°	+13°	+11°

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

DeepHaejoong

[베이지안 딥러닝] Ch4.4 Linear Models for Classification - The Laplace Approximation

Overview

The Laplace Approximation

Reference

'패턴인식과 머신러닝 > Ch 04. Linear Models for Classification' 카테고리의 다른 글

댓글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

[베이지안 딥러닝] Ch4.4 Linear Models for Classification - The Laplace Approximation

Overview

The Laplace Approximation

Reference

'패턴인식과 머신러닝 > Ch 04. Linear Models for Classification' 카테고리의 다른 글

관련글

댓글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역