- Blogs
Hindon和Jordan理解的EM算法
Computational Statistics in Python
EM算法及其推广
EM算法及其推广学习笔记
从最大似然到EM算法浅解
EM算法在缺失数据下的极大似然估计R代码
Matlab极大似然估计缺失数据
Cos 424: Interacting with Data
ProbabilityCourse
关于EM算法的一些疑问 - Papers
What is the EM algorithm
1958-Maximum Likelihood Estimation from Incomplete Data
1977-Maximum Likelihood from Incomplete Data via the EM Algorithm
1998-EM algorithms for PCA and SPCA
1999-Probabilistic Principal Component Analysis - Packages
Scipy.stats.binom
impyute 0.04
PyMix - Videos
Statistical Computing for Scientists and Engineers
Youtube-Em algorithm: how it works
Youtube-Em algprithm and missing data
Youtube-Missing Data Analysis - Multiple Imputation, EM method
Em Algorithm - Coin Problem Case Study
Constructing Models to Deal with Missing Data | SciPy 2016 | Deborah Hanus
Recent Advances in missing Data Methods: Imputation and Weighting - Elizabeth Stuart - Books
Handbook of Missing Data Methodology
Missing Data Analysis in Practice
Bayesian Missing Data Problems: EM, Data Augmentation and Noniterative Computation
Statistical Analysis with Missing Data -2nd
Missing Data Analysis – Multiple Imputation
Digital Signal Processing with Python Programming - Pictures
# -*- coding: utf-8 -*-
"""
Created on Mon Jan 15 18:58:37 2018
@author: brucelau
"""
import glob
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
plt.style.use('ggplot')
np.random.seed(1234)
np.set_printoptions(formatter={'all':lambda x: '%.3f' % x})
from IPython.display import Image
from numpy.core.umath_tests import matrix_multiply as mm
from scipy.optimize import minimize
from scipy.stats import bernoulli, binom
from scipy import stats
#%%
# 硬币投掷结果观测序列
observations = np.array([[1, 0, 0, 0, 1, 1, 0, 1, 0, 1],
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1],
[1, 0, 1, 1, 1, 1, 1, 0, 1, 1],
[1, 0, 1, 0, 0, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 0, 1, 1, 1, 0, 1]])
coin_A_pmf_observation_1 = binom.pmf(5,10,0.6)
coin_B_pmf_observation_1 = binom.pmf(5,10,0.5)
normalized_coin_A_pmf_observation_1 = coin_A_pmf_observation_1/(coin_A_pmf_observation_1+coin_B_pmf_observation_1)
normalized_coin_B_pmf_observation_1 = coin_B_pmf_observation_1/(coin_A_pmf_observation_1+coin_B_pmf_observation_1)
print ("%0.1f" %(normalized_coin_A_pmf_observation_1))
print ("%0.1f" %(normalized_coin_B_pmf_observation_1))
#%%
def em_single(priors, observations):
"""
EM算法单次迭代
Arguments
---------
priors : [theta_A, theta_B]
observations : [m X n matrix]
Returns
--------
new_priors: [new_theta_A, new_theta_B]
:param priors:
:param observations:
:return:
"""
counts = {'A': {'H': 0, 'T': 0}, 'B': {'H': 0, 'T': 0}}
theta_A = priors[0]
theta_B = priors[1]
# E step
for observation in observations:
len_observation = len(observation)
num_heads = observation.sum()
num_tails = len_observation - num_heads
contribution_A = stats.binom.pmf(num_heads, len_observation, theta_A)
contribution_B = stats.binom.pmf(num_heads, len_observation, theta_B) # 两个二项分布
weight_A = contribution_A / (contribution_A + contribution_B)
weight_B = contribution_B / (contribution_A + contribution_B)
# 更新在当前参数下A、B硬币产生的正反面次数
counts['A']['H'] += weight_A * num_heads
counts['A']['T'] += weight_A * num_tails
counts['B']['H'] += weight_B * num_heads
counts['B']['T'] += weight_B * num_tails
# M step
new_theta_A = counts['A']['H'] / (counts['A']['H'] + counts['A']['T'])
new_theta_B = counts['B']['H'] / (counts['B']['H'] + counts['B']['T'])
return [new_theta_A, new_theta_B]
#%%
def em(observations, prior, tol=1e-6, iterations=10000):
"""
EM算法
:param observations: 观测数据
:param prior: 模型初值
:param tol: 迭代结束阈值
:param iterations: 最大迭代次数
:return: 局部最优的模型参数
"""
import math
iteration = 0
while iteration < iterations:
new_prior = em_single(prior, observations)
delta_change = np.abs(prior[0] - new_prior[0])
print('The new_prior valuers are:',new_prior)
if delta_change < tol:
break
else:
prior = new_prior
iteration += 1
return [new_prior, iteration]
theta1,theta2 = em(observations,[0.6,0.5])
print(theta1,theta2)