Ch2.6 Bayesian decision theory - Discriminant functions for the normal density

2020-2학기 서강대 김경환 교수님 강의 내용 및 패턴인식 교재를 바탕으로 본 글을 작성하였습니다.

2.6 Discriminant functions for the normal density

2.4.1절에서 최소 에러율 분류가 아래의 판별 함수들을 사용해서 달성될 수 있음을 보았었다.

$$ g_i(\mathbf{x}) = \ln p(\mathbf{x}|w_i) + \ln P(w_i)$$

위 식에서 우항의 첫번째 식(likelihood )인 $p(\mathbf{x}|w_i)$가 Multivariate normal distribution (다변 정규 분포)를 따른다고 하면, 즉, $p(\mathbf{x}|w_i) \sim N(\mu_i, \sum_{i})$ 이라면 아래 식을 전개할 수 있다.

$$p(\mathbf{x}|w_i)=\frac{1}{(2 \pi)^{d / 2}|\mathbf{\Sigma}|^{1 / 2}} \exp \left[-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^{t} \mathbf{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right]$$

$$\mathrm{g}_{i}(\mathbf{x})=-\frac{1}{2}\left(\mathbf{x}-\boldsymbol{\mu}_{i}\right)^{t} \boldsymbol{\Sigma}_{i}^{-1}\left(\mathbf{x}-\boldsymbol{\mu}_{i}\right)-\frac{d}{2} \ln 2 \pi-\frac{1}{2} \ln \left|\Sigma_{i}\right|+\ln P\left(\omega_{i}\right)$$

몇 가지 특별한 경우에 대해서 위 판별 함수 $g_i(\mathbf{x})$와 그 결과에 따라서 의미를 확인해보자.

▶ Three cases of the discriminant functions:

$\sum_i = \sigma^2 \mathbf{I}$
$\sum_i = \sum$
$\sum_i = arbitary$

이해를 돕고자 아래와 같이 class 2개를 랜덤하게 2차원에서 시각화시킴 (각각의 clsss 별로, Covariance(공분산)에 차이가 있음을 반드시 집고 넘어길 바람)

[Figure 1] three case study by covariance (special case)

이제 "class에 대한 covariance의 Case별" 어떻게 the discriminant functions(decision boundary) 이 그려지는지 확인해보자.

▶ Case 1 : $\sum_i = \sigma^2 \mathbf{I}$

The simplest case (가장 제약이 심한 가정)

The features are statistically independent and each feature has the same variance, $\sigma^2$.
The samples fall in equal-size hyperspherical clusters. (완전한 원, hyper는 2,3,...,D 차원까지 모든 것을 고려한다는 의미)
The clusters for the $i$th class is centered about the mean vector $\mu_i$.
The computation of the determinant and the inverse of $\sum_i$ is easy.

Linear discriminant function을 사용하는 classifier를 Linear machine(선형 기계) 라고 한다.

다음은 이런 classifier 중에서 Class가 2개인 경우(e.g $i, j$)에 HyperPlane이 어떻게 형성되는지 확인해보자.

Hyperplanes defined by the linear equations : $g_i(\mathbf{x}) = g_j(\mathbf{x})$

Hyperplanes defined by the linear equations (유도해보기)

즉, Hyperplane을 입력 vector $\mathbf{x}$에 관해 1차 선형식으로 나타내면 다음과 같다.

$p(w_i) = p(w_j)$이면, 점 $x_0$는 더 가능성이 있는 평균으로부터 더 먼쪽으로 이동한다.

그러나, 만일 분산 $\sigma^2$이 거리 제곱인 $||\mu_i - \mu_j||^2$에 비해 상대적으로 작으면, 판정 경계의 위치는 상대적으로 사전 확률들의 정확한 값에 대해 둔감해진다. (수식을 보고 해석하는 이해 필요)

▶ Case 2 : $\sum_i = \sum$

The covariance matrices for all of the classes are identical

This corresponds to the situation in which the samples fall in hyperellipsoidal clusters of equal size and shape.
The cluster for the $i$th class is centered about the mean vector $\mu$.

다음은 이런 classifier 중에서 Class가 2개인 경우(e.g $i, j$)에 HyperPlane이 어떻게 형성되는지 확인해보자.

Hyperplanes defined by the linear equations : $g_i(\mathbf{x}) = g_j(\mathbf{x})$

전개 과정 넣기

즉, Hyperplane을 입력 vector $\mathbf{x}$에 관해 1차 선형식으로 나타내면 다음과 같다.

"교재에서 bias (Covariance)가 충분하다면, 판정 평면은 두 평균 벡터 사이에 놓일 필요가 없다"라고 언급함(반대로 covariance가 작아야지 가능한것 아닌가 싶다. 다시 한번 확인해보기!)

▶ Case 3 : $\sum_i = $ arbitrary (서로 다른 임의의 Matrix)

The covariance matrices are different for each category.

같은 원리로, 두 부류의 경우, 판정 표면들은 Hyperqudratic(초 2차 곡면)이며, 어떠한 모양으로 다 가능하다. 다양한 유형의 초평면, 초평면 쌍, 초구, 초타원, 초포물선(hyperparaoloids), 초쌍곡선(hyperhyperboloids. 1차원에서 조차, 임의의 분산에 대해 판정 영역(decision boundary)은 단순 연결될 필요가 없다.

임의의 경계 구간에 대해 전체 c개(e.g. 4개) class 를 경계 영역의 모양은 다음과 같다. (원리 동일)

▶ EXAMPLE 1 : Decion regions for two-dimensional Gaussian data

지금까지 다룬 내용들을 직접 체감하기 위해 다음 간단한 예제를 풀어보자! (Dedision boundary 추론하기!)

위 식은 정점이 $[3, 1.83]^t$에 포물선을 그려준다. 두 분포에 대해 $\mathbf{x}_2$ 방향의 데이터 분산이 2로 같음에도 불구하고, decision boundary는 두 분포의 평균의 중간인 점 $[3, 2]^t$를 지나지 않는다. 그 이유는 $w_1$ 분포에 대한 확률 분포가 $w_2$에 대해서보다 $x_1$ 방향에서 더 압착되어 있기 때문이다. $w_1$ 분포는 $x_2$ 방향을 따라서 더 크다. ($w_2$ 분포에 대한 것과 비교했을 때), 따라서 decision boundary는 두 평균 간의 중간 점보다 약간 아래 놓인다.

다음 Ch2.7에서는 "Error bounds for normal distribution" 를 다루도록 하겠습니다.

Reference

pattern classification by richard o. duda

저작자표시

'Pattern Classification [수업]' 카테고리의 다른 글

Ch2.8 Bayesian decision theory - Error Bounds for Normal Densities (0)	2020.09.17
Ch2.7 Bayesian decision theory - Error Probabilities and Integrals (0)	2020.09.17
Ch2.5 Bayesian decision theory - The Normal Density (0)	2020.09.15
Ch2.4 Bayesian decision theory - Classifiers, Discriminant Functions and Decision Surfaces (0)	2020.09.10
Ch2.3 Bayesian decision theory - Minimum-error-rate Classification (0)	2020.09.10