首页 > 其他分享 >BayesianLSTM PawaritL 使用PyTorch中的贝叶斯LSTM进行能源消耗预测,贝叶斯神经网络仅尝试解释认知模型不确定性,并不一定解决不确定性

BayesianLSTM PawaritL 使用PyTorch中的贝叶斯LSTM进行能源消耗预测,贝叶斯神经网络仅尝试解释认知模型不确定性,并不一定解决不确定性

时间:2024-09-18 23:22:24浏览次数:8  
标签:不确定性 BayesianLSTM df energy uncertainty 贝叶斯 date test self

https://colab.research.google.com/github/PawaritL/BayesianLSTM/blob/master/Energy_Consumption_Predictions_with_Bayesian_LSTMs_in_PyTorch.ipynb
# Energy Consumption Predictions with Bayesian LSTMs in PyTorch
Author: Pawarit Laosunthara

内容:请点击上面的在Colab中打开按钮,以便查看所有交互式可视化

本笔记本演示了在PyTorch中实现(近似)贝叶斯递归神经网络的方法,其灵感最初来源于Uber的Deep and Confident Prediction for Time Series(https://arxiv.org/pdf/1709.01907.pdf)。

在这种方法中,使用蒙特卡洛dropout来近似贝叶斯推断,从而使我们的预测具有显式的不确定性和置信区间。

# **Important Note for GitHub Readers:**

请点击上方的在Colab中打开按钮以查看所有交互式可视化

本笔记本演示了在PyTorch中实现(近似)贝叶斯循环神经网络的方法,最初受到Ube时间序列深度自信预测的启发

* (https://arxiv.org/pdf/1709.01907.pdf)

在这种方法中,蒙特卡洛dropout被用来近似贝叶斯推断,使我们的预测具有明确的不确定性和置信区间。这一属性使得贝叶斯神经网络对于需要不确定性量化的关键应用非常有吸引力。 本例中使用的家电能耗预测数据集来自加州大学欧文分校机器学习库(https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction)

**注意:**本笔记本纯粹

在这种方法中,蒙特卡洛dropout被用来近似贝叶斯推断,使我们的预测具有明确的不确定性和置信区间。这一属性使得贝叶斯神经网络对于需要不确定性量化的关键应用非常有吸引力。 本例中使用的家电能耗预测数据集来自加州大学欧文分校机器学习库(https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction)

- date time year-month-day hour:minute:second, sampled every 10 minutes \
- Appliances, energy use in Wh for the corresponding 10-minute timestamp \
- day_of_week, where Monday corresponds to 0 \
- hour_of_day

import pandas as pd
energy_df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/00374/energydata_complete.csv')

energy_df['date'] = pd.to_datetime(energy_df['date'])

energy_df['month'] = energy_df['date'].dt.month.astype(int)
energy_df['day_of_month'] = energy_df['date'].dt.day.astype(int)

# day_of_week=0 corresponds to Monday
energy_df['day_of_week'] = energy_df['date'].dt.dayofweek.astype(int)
energy_df['hour_of_day'] = energy_df['date'].dt.hour.astype(int)

selected_columns = ['date', 'day_of_week', 'hour_of_day', 'Appliances']
energy_df = energy_df[selected_columns]
energy_df.head()
## Time Series Transformations

  1. .

    数据集将以每小时为频率重新采样,以便进行更有意义的分析。

  2. 2.

    为了缓解指数效应,目标变量根据Uber论文进行了对数变换。

  3. 3.

    为了简化操作并加快运行本笔记本的速度,只使用了时间和自回归特征,即星期几一天中的小时以及家电的前值作为特征。

import numpy as np

resample_df = energy_df.set_index('date').resample('1H').mean()
resample_df['date'] = resample_df.index
resample_df['log_energy_consumption'] = np.log(resample_df['Appliances'])

datetime_columns = ['date', 'day_of_week', 'hour_of_day']
target_column = 'log_energy_consumption'

feature_columns = datetime_columns + ['log_energy_consumption']

# For clarity in visualization and presentation, 
# only consider the first 150 hours of data.
resample_df = resample_df[feature_columns]
import plotly.express as px

plot_length = 150
plot_df = resample_df.copy(deep=True).iloc[:plot_length]
plot_df['weekday'] = plot_df['date'].dt.day_name()

fig = px.line(plot_df,
              x="date",
              y="log_energy_consumption", 
              color="weekday", 
              title="Log of Appliance Energy Consumption vs Time")
fig.show()
# 准备训练数据 在这个例子中,我们将使用每个窗口10个点的滑动窗口(相当于10小时)来预测下一个点。窗口大小可以通过sequence_length变量进行调整。

# 还已经对训练数据进行了最小-最大缩放,以帮助神经网络的收敛。

# Min-Max scaling has also been fitted to the training data to aid the convergence of the neural network. 
from sklearn.preprocessing import MinMaxScaler

def create_sliding_window(data, sequence_length, stride=1):
    X_list, y_list = [], []
    for i in range(len(data)):
      if (i + sequence_length) < len(data):
        X_list.append(data.iloc[i:i+sequence_length:stride, :].values)
        y_list.append(data.iloc[i+sequence_length, -1])
    return np.array(X_list), np.array(y_list)

train_split = 0.7
n_train = int(train_split * len(resample_df))
n_test = len(resample_df) - n_train

features = ['day_of_week', 'hour_of_day', 'log_energy_consumption']
feature_array = resample_df[features].values

#仅在训练特征上拟合缩放器
feature_scaler = MinMaxScaler()
feature_scaler.fit(feature_array[:n_train])
#仅在训练目标值上拟合缩放器
target_scaler = MinMaxScaler()
target_scaler.fit(feature_array[:n_train, -1].reshape(-1, 1))

# 在训练和测试数据上都进行转换
scaled_array = pd.DataFrame(feature_scaler.transform(feature_array),
                            columns=features)

sequence_length = 10
X, y = create_sliding_window(scaled_array, 
                             sequence_length)

X_train = X[:n_train]
y_train = y[:n_train]

X_test = X[n_train:]
y_test = y[n_train:]
# 定义贝叶斯LSTM架构 为了展示贝叶斯LSTM的一个简单工作示例,我们从Uber论文中的模型架构开始。网络架构如下:

编码器-解码器阶段:

  • 一个单向LSTM,具有2个堆叠层和128个隐藏单元,作为编码层以构建固定维度的嵌入状态
  • 一个单向LSTM,具有2个堆叠层和32个隐藏单元,作为解码层以生成预测序列

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class BayesianLSTM(nn.Module):

    def __init__(self, n_features, output_length, batch_size):

        super(BayesianLSTM, self).__init__()

        self.batch_size = batch_size # user-defined

        self.hidden_size_1 = 128 # number of encoder cells (from paper)
        self.hidden_size_2 = 32 # number of decoder cells (from paper)
        self.stacked_layers = 2 # number of (stacked) LSTM layers for each stage
        self.dropout_probability = 0.5 # arbitrary value (the paper suggests that performance is generally stable across all ranges)

        self.lstm1 = nn.LSTM(n_features, 
                             self.hidden_size_1, 
                             num_layers=self.stacked_layers,
                             batch_first=True)
        self.lstm2 = nn.LSTM(self.hidden_size_1,
                             self.hidden_size_2,
                             num_layers=self.stacked_layers,
                             batch_first=True)
        
        self.fc = nn.Linear(self.hidden_size_2, output_length)
        self.loss_fn = nn.MSELoss()
        
    def forward(self, x):
        batch_size, seq_len, _ = x.size()

        hidden = self.init_hidden1(batch_size)
        output, _ = self.lstm1(x, hidden)
        output = F.dropout(output, p=self.dropout_probability, training=True)
        state = self.init_hidden2(batch_size)
        output, state = self.lstm2(output, state)
        output = F.dropout(output, p=self.dropout_probability, training=True)
        output = output[:, -1, :] # take the last decoder cell's outputs
        y_pred = self.fc(output)
        return y_pred
        
    def init_hidden1(self, batch_size):
        hidden_state = Variable(torch.zeros(self.stacked_layers, batch_size, self.hidden_size_1))
        cell_state = Variable(torch.zeros(self.stacked_layers, batch_size, self.hidden_size_1))
        return hidden_state, cell_state
    
    def init_hidden2(self, batch_size):
        hidden_state = Variable(torch.zeros(self.stacked_layers, batch_size, self.hidden_size_2))
        cell_state = Variable(torch.zeros(self.stacked_layers, batch_size, self.hidden_size_2))
        return hidden_state, cell_state
    
    def loss(self, pred, truth):
        return self.loss_fn(pred, truth)

    def predict(self, X):
        return self(torch.tensor(X, dtype=torch.float32)).view(-1).detach().numpy()
### Begin Training

要训练贝叶斯LSTM,我们使用ADAM优化器以及mini-batch梯度下降(batch_size = 128)。出于快速演示目的,模型训练了150个epoch。

贝叶斯LSTM在前70%的数据点上进行训练,使用前面提到的大小为10的滑动窗口。剩余的30%数据集完全用于测试。


n_features = scaled_array.shape[-1]
sequence_length = 10
output_length = 1

batch_size = 128
n_epochs = 150
learning_rate = 0.01

bayesian_lstm = BayesianLSTM(n_features=n_features,
                             output_length=output_length,
                             batch_size = batch_size)

criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(bayesian_lstm.parameters(), lr=learning_rate)
bayesian_lstm.train()

for e in range(1, n_epochs+1):
    for b in range(0, len(X_train), batch_size):
        features = X_train[b:b+batch_size,:,:]
        target = y_train[b:b+batch_size]    

        X_batch = torch.tensor(features,dtype=torch.float32)    
        y_batch = torch.tensor(target,dtype=torch.float32)

        output = bayesian_lstm(X_batch)
        loss = criterion(output.view(-1), y_batch)  

        loss.backward()
        optimizer.step()        
        optimizer.zero_grad() 

    if e % 10 == 0:
      print('epoch', e, 'loss: ', loss.item())
# Evaluating Model Performance
The Bayesian LSTM implemented is shown to produce reasonably accurate and sensible results on both the training and test sets, often comparable to other existing frequentist machine learning and deep learning methods.

要评估贝叶斯LSTM模型的性能,您可以遵循以下步骤:

  1. 1.

    数据分割:确保您已经将数据集分为训练集、验证集和测试集。通常的比例是70%训练集,15%验证集,15%测试集。

  2. 2.

    模型训练:使用训练集训练贝叶斯LSTM模型,并使用验证集调整超参数。

  3. 3.

    性能指标:选择适当的性能指标来评估模型。对于回归问题,可以使用均方误差(MSE)、均方根误差(RMSE)或平均绝对误差(MAE)。对于分类问题,可以使用准确率、精确度、召回率、F1分数或AUC-ROC曲线。

  4. 4.

    模型评估

    • 在训练集上评估模型,以检查是否存在过拟合。
    • 在测试集上评估模型,以了解模型在未见过的数据上的泛化能力。
  5. 5.

    比较基准:将贝叶斯LSTM模型的性能与其他现有的频率主义机器学习方法和深度学习方法进行比较。这可以通过计算每个方法的性能指标并比较它们来完成。

  6. 6.

    可视化:使用图表和图形来可视化模型的性能,例如混淆矩阵、ROC曲线、学习曲线等。

  7. 7.

    敏感性分析:进行敏感性分析,以了解模型对不同输入特征和超参数变化的响应。

  8. 8.

    错误分析:检查模型在测试集上的错误,以识别可能的改进领域


offset = sequence_length

def inverse_transform(y):
  return target_scaler.inverse_transform(y.reshape(-1, 1))

training_df = pd.DataFrame()
training_df['date'] = resample_df['date'].iloc[offset:n_train + offset:1] 
training_predictions = bayesian_lstm.predict(X_train)
training_df['log_energy_consumption'] = inverse_transform(training_predictions)
training_df['source'] = 'Training Prediction'

training_truth_df = pd.DataFrame()
training_truth_df['date'] = training_df['date']
training_truth_df['log_energy_consumption'] = resample_df['log_energy_consumption'].iloc[offset:n_train + offset:1] 
training_truth_df['source'] = 'True Values'

testing_df = pd.DataFrame()
testing_df['date'] = resample_df['date'].iloc[n_train + offset::1] 
testing_predictions = bayesian_lstm.predict(X_test)
testing_df['log_energy_consumption'] = inverse_transform(testing_predictions)
testing_df['source'] = 'Test Prediction'

testing_truth_df = pd.DataFrame()
testing_truth_df['date'] = testing_df['date']
testing_truth_df['log_energy_consumption'] = resample_df['log_energy_consumption'].iloc[n_train + offset::1] 
testing_truth_df['source'] = 'True Values'evaluation = pd.concat([training_df, 
                        testing_df,
                        training_truth_df,
                        testing_truth_df
                        ], axis=0)
fig = px.line(evaluation.loc[evaluation['date'].between('2016-04-14', '2016-04-23')],
                 x="date",
                 y="log_energy_consumption",
                 color="source",
                 title="Log of Appliance Energy Consumption in Wh vs Time")
fig.show()
#

不确定性量化 贝叶斯LSTM中每个LSTM层后应用随机dropout的事实,使得用户可以将模型输出解释为目标变量后验分布的随机样本。

这意味着通过运行多次实验/预测,可以近似后验分布的参数,即均值和方差,以便为每个预测创建置信区间。

在这个例子中,我们构建了99%的置信区间。


n_experiments = 100

test_uncertainty_df = pd.DataFrame()
test_uncertainty_df['date'] = testing_df['date']

for i in range(n_experiments):
  experiment_predictions = bayesian_lstm.predict(X_test)
  test_uncertainty_df['log_energy_consumption_{}'.format(i)] = inverse_transform(experiment_predictions)

log_energy_consumption_df = test_uncertainty_df.filter(like='log_energy_consumption', axis=1)
test_uncertainty_df['log_energy_consumption_mean'] = log_energy_consumption_df.mean(axis=1)
test_uncertainty_df['log_energy_consumption_std'] = log_energy_consumption_df.std(axis=1)

test_uncertainty_df = test_uncertainty_df[['date', 'log_energy_consumption_mean', 'log_energy_consumption_std']]
test_uncertainty_df['lower_bound'] = test_uncertainty_df['log_energy_consumption_mean'] - 3*test_uncertainty_df['log_energy_consumption_std']
test_uncertainty_df['upper_bound'] = test_uncertainty_df['log_energy_consumption_mean'] + 3*test_uncertainty_df['log_energy_consumption_std']
import plotly.graph_objects as go

test_uncertainty_plot_df = test_uncertainty_df.copy(deep=True)
test_uncertainty_plot_df = test_uncertainty_plot_df.loc[test_uncertainty_plot_df['date'].between('2016-05-01', '2016-05-09')]
truth_uncertainty_plot_df = testing_truth_df.copy(deep=True)
truth_uncertainty_plot_df = truth_uncertainty_plot_df.loc[testing_truth_df['date'].between('2016-05-01', '2016-05-09')]

upper_trace = go.Scatter(
    x=test_uncertainty_plot_df['date'],
    y=test_uncertainty_plot_df['upper_bound'],
    mode='lines',
    fill=None,
    name='99% Upper Confidence Bound'
    )
lower_trace = go.Scatter(
    x=test_uncertainty_plot_df['date'],
    y=test_uncertainty_plot_df['lower_bound'],
    mode='lines',
    fill='tonexty',
    fillcolor='rgba(255, 211, 0, 0.1)',
    name='99% Lower Confidence Bound'
    )
real_trace = go.Scatter(
    x=truth_uncertainty_plot_df['date'],
    y=truth_uncertainty_plot_df['log_energy_consumption'],
    mode='lines',
    fill=None,
    name='Real Values'
    )

data = [upper_trace, lower_trace, real_trace]

fig = go.Figure(data=data)
fig.update_layout(title='Uncertainty Quantification for Energy Consumption Test Data',
                   xaxis_title='Time',
                   yaxis_title='log_energy_consumption (log Wh)')

fig.show()
####

评估不确定性 通过上述多次实验,我们已经为目标变量的每次预测(家电功耗的对数)构建了99%的置信区间。虽然我们可以直观地观察到模型通常能够捕捉时间序列的行为,但大约只有50%的真实数据点位于平均预测值的99%置信区间内。

尽管置信区间内的点数相对较低,但它仍然为模型的不确定性提供了一个量化指标。这意味着模型对其预测的某些部分不太确定,这在高风险应用中是非常有价值的,因为它可以帮助决策者理解模型的局限性并采取相应的风险缓解措施。

为了进一步提高模型的不确定性估计质量,可以考虑以下策略:

  1. 1.

    增加实验次数:通过增加实验次数,可以更准确地估计后验分布的参数。

  2. 2.

    调整dropout率:改变dropout率可能会影响模型的不确定性估计。较高的dropout率可能导致更大的不确定性,但也可能导致欠拟合。

  3. 3.

    使用更复杂的贝叶斯方法:例如,可以使用变分自编码器(VAE)或贝叶斯神经网络(BNN)来获得更复杂的不确定性估计。

  4. 4.

    特征工程:改进特征工程可能有助于模型更好地捕捉时间序列的结构,从而提高不确定性估计的准确性。

  5. 5.

    模型集成:通过结合多个模型的预测,可以减少单个模型的不确定性并提高整体性能。

总之,尽管置信区间内的点数较低,但贝叶斯LSTM模型仍然提供了一种量化不确定性的方法,这对于需要不确定性量化的应用至关重要。

bounds_df = pd.DataFrame()

# Using 99% confidence bounds
bounds_df['lower_bound'] = test_uncertainty_plot_df['lower_bound']
bounds_df['prediction'] = test_uncertainty_plot_df['log_energy_consumption_mean']
bounds_df['real_value'] = truth_uncertainty_plot_df['log_energy_consumption']
bounds_df['upper_bound'] = test_uncertainty_plot_df['upper_bound']

bounds_df['contained'] = ((bounds_df['real_value'] >= bounds_df['lower_bound']) &
                          (bounds_df['real_value'] <= bounds_df['upper_bound']))

print("Proportion of points contained within 99% confidence interval:", 
      bounds_df['contained'].mean())
# Conclusions

  • 贝叶斯LSTM在相同条件下能够产生与其频率主义对应物相当的性能。
  • 随机dropout使用户能够近似目标变量的后验分布,从而为每次预测构建置信区间。
  • 贝叶斯神经网络仅尝试解释认知模型不确定性,并不一定解决不确定性。
  • 在初始训练之后,重复/多次贝叶斯LSTM预测的计算开销相对较低,因为dropout层在每次前向传播时都会随机丢弃神经元,这可以看作是一种高效的贝叶斯推断方法。

标签:不确定性,BayesianLSTM,df,energy,uncertainty,贝叶斯,date,test,self
From: https://blog.csdn.net/zhangfeng1133/article/details/142345202

相关文章