基于最大后验学习分类分布参数

mac2024-04-15  33

最近在看计算机视觉:模型学习与推理。本篇博客介绍:使用狄利克雷分布作为先验概率计算分类分布的最大后验概率。

基本原理如下:

λ ^ 1 … k = argmax ⁡ λ 1 … k [ ∏ i = 1 I Pr ⁡ ( x i ∣ λ 1 … k ) Pr ⁡ ( λ 1 … k ) ] = argmax ⁡ λ 1 … k [ ∏ i = 1 I Cat ⁡ x i [ λ 1 … k ] Dir ⁡ λ 1 … k [ α 1 … k ] ] = argmax ⁡ λ 1 … k [ ∏ i = 1 k λ k N k ∏ k = 1 k λ α k − 1 ] = argmax ⁡ λ 1 … k [ ∏ k = 1 k λ k N k + α k − 1 ] \begin{aligned} \hat{\lambda}_{1 \ldots k} &=\underset{\lambda_{1} \ldots k}{\operatorname{argmax}}\left[\prod_{i=1}^{I} \operatorname{Pr}\left(x_{i} | \lambda_{1 \ldots k}\right) \operatorname{Pr}\left(\lambda_{1 \ldots k}\right)\right] \\ &=\underset{\lambda_{1} \ldots k}{\operatorname{argmax}}\left[\prod_{i=1}^{I} \operatorname{Cat}_{x_{i}}\left[\lambda_{1 \ldots k}\right] \operatorname{Dir}_{\lambda_{1} \ldots k}\left[\alpha_{1 \ldots k}\right]\right] \\ &=\underset{\lambda_{1 \ldots k}}{\operatorname{argmax}}\left[\prod_{i=1}^{k} \lambda_{k}^{N_{k}} \prod_{k=1}^{k} \lambda^{\alpha_{k}-1}\right] \\ &=\underset{\lambda_{1 \ldots k}}{\operatorname{argmax}}\left[\prod_{k=1}^{k} \lambda_{k}^{N_{k}+\alpha_{k}-1}\right] \end{aligned} λ^1k=λ1kargmax[i=1IPr(xiλ1k)Pr(λ1k)]=λ1kargmax[i=1ICatxi[λ1k]Dirλ1k[α1k]]=λ1kargmax[i=1kλkNkk=1kλαk1]=λ1kargmax[k=1kλkNk+αk1]

最终结果如下:

λ ^ k = N k + α k − 1 ∑ m = 1 k ( N m + α m − 1 ) \hat{\lambda}_{k}=\frac{N_{k}+\alpha_{k}-1}{\sum_{m=1}^{k}\left(N_{m}+\alpha_{m}-1\right)} λ^k=m=1k(Nm+αm1)Nk+αk1

数据生成使用的是上一篇文章的方法,这里不再介绍。

算法流程如下:  Input : Binary training data  { x i } i = 1 I ,  Hyperparameters  { α k } k = 1 K  Output: MAP estimates of parameters  θ = { λ k } k = 1 K  begin   for  k = 1  to  K d o λ k = ( N k − 1 + α k ) / ( I − K + ∑ k = 1 K α k )  end   end  \begin{array}{l}{\text { Input : Binary training data }\left\{x_{i}\right\}_{i=1}^{I}, \text { Hyperparameters }\left\{\alpha_{k}\right\}_{k=1}^{K}} \\ {\text { Output: MAP estimates of parameters } \theta=\left\{\lambda_{k}\right\}_{k=1}^{K}} \\ {\text { begin }} \\ {\text { for } k=1 \text { to } K \mathrm{do}} \\ {\qquad \lambda_{k}=\left(N_{k}-1+\alpha_{k}\right) /\left(I-K+\sum_{k=1}^{K} \alpha_{k}\right)} \\ {\text { end }} \\ {\text { end }}\end{array}  Input : Binary training data {xi}i=1I, Hyperparameters {αk}k=1K Output: MAP estimates of parameters θ={λk}k=1K begin  for k=1 to Kdoλk=(Nk1+αk)/(IK+k=1Kαk) end  end 

学习的代码如下:

void MAP_categorical_distribution_parameters() { vector<int> data; data = generate_categorical_distribution_data(100000); std::map<int, double> hist{}; for (int i = 0; i < data.size(); i++) { ++hist[data[i]]; } vector<double> alpha_v; //set Drichilet distribution superparameters for (int i = 0; i < hist.size(); i++) { alpha_v.push_back(1.0); } double total_p = 0; double down=0; for (int i = 0; i < hist.size(); i++) { down += hist.at(i) + alpha_v[i] - 1; } for (int i = 0; i < hist.size(); i++) { hist.at(i) = (hist.at(i) + alpha_v[i] - 1) / down; total_p += hist.at(i); std::cout << hist.at(i) << std::endl; } cout << "total_p: " << total_p << endl; }

这里Dirichlet 分布的超参数都是设置成了1;

下面两个图片是书中作者的实验对比

最新回复(0)