首页 > 其他分享 >基于Informer网络实现电力负荷时序预测——cross validation交叉验证与Hyperopt超参数调优

基于Informer网络实现电力负荷时序预测——cross validation交叉验证与Hyperopt超参数调优

时间:2025-01-12 17:04:48浏览次数:3  
标签:None df Hyperopt self torch cross 调优 Informer size

效果
前言

系列专栏:【深度学习:算法项目实战】✨︎
涉及医疗健康、财经金融、商业零售、食品饮料、运动健身、交通运输、环境科学、社交媒体以及文本和图像处理等诸多领域,讨论了各种复杂的深度神经网络思想,如卷积神经网络、循环神经网络、生成对抗网络、门控循环单元、长短期记忆、自然语言处理、深度强化学习、大型语言模型和迁移学习。

该架构具有三个显著特点:①一个具有 O 时间和Llog(L)内存复杂度的ProbSparse自注意力机制。②一个优先考虑注意力并有效处理长输入序列的自注意力蒸馏过程。③一个MLP(多层感知器)多步解码器,能够在单次前向操作中预测长时间序列,而非逐步预测。(效果图
在这里插入图片描述

import pandas as pd
import matplotlib.pyplot as plt

from neuralforecast.core import NeuralForecast
from neuralforecast.models import Informer
from neuralforecast.losses.numpy import mae, mse, rmse, mape
from neuralforecast.losses.pytorch import MAE

from datasetsforecast.long_horizon import LongHorizon
from torchinfo import summary

1. 数据集加载

datasetsforecast 是一个用于处理时间序列预测相关数据集的库。它的主要目的是方便用户获取、加载和预处理适合于时间序列预测任务的数据集。在时间序列分析和预测领域,拥有高质量、合适的数据集是非常关键的一步,这个库能够帮助我们更高效地开展工作。

# Change this to your own data to try the model
Y_df, X_df, _ = LongHorizon.load(directory='./', group='ETTh1')

2. 数据预处理

Y_df['ds'] = pd.to_datetime(Y_df['ds'])
# For this excercise we are going to take 20% of the DataSet
n_time = len(Y_df.ds.unique())
val_size = int(.2 * n_time)
test_size = int(.2 * n_time)

Y_df.groupby('unique_id').head(5)

在这里插入图片描述

3. 数据可视化

plt.style.use('ggplot')
plt.plot(Y_df['y'], color='darkorange' ,label='Trend')
plt.show()

ot

4. 构建模型

ProbAttentionInformer 模型的核心创新点,它通过“K、Q交替采样、没采到的地方用均值替代”的方式,来降低Attention的复杂度

class ProbMask:
    """
    ProbMask
    """

    def __init__(self, B, H, L, index, scores, device="cpu"):
        _mask = torch.ones(L, scores.shape[-1], dtype=torch.bool, device=device).triu(1)
        _mask_ex = _mask[None, None, :].expand(B, H, L, scores.shape[-1])
        indicator = _mask_ex[
            torch.arange(B)[:, None, None], torch.arange(H)[None, :, None], index, :
        ].to(device)
        self._mask = indicator.view(scores.shape).to(device)

    @property
    def mask(self):
        return self._mask


class ProbAttention(nn.Module):
    """
    ProbAttention
    """

    def __init__(
        self,
        mask_flag=True,
        factor=5,
        scale=None,
        attention_dropout=0.1,
        output_attention=False,
    ):
        super(ProbAttention, self).__init__()
        self.factor = factor
        self.scale = scale
        self.mask_flag = mask_flag
        self.output_attention = output_attention
        self.dropout = nn.Dropout(attention_dropout)

    def _prob_QK(self, Q, K, sample_k, n_top):  # n_top: c*ln(L_q)
        # Q [B, H, L, D]
        B, H, L_K, E = K.shape
        _, _, L_Q, _ = Q.shape

        # calculate the sampled Q_K
        K_expand = K.unsqueeze(-3).expand(B, H, L_Q, L_K, E)

        index_sample = torch.randint(
            L_K, (L_Q, sample_k)
        )  # real U = U_part(factor*ln(L_k))*L_q
        K_sample = K_expand[:, :, torch.arange(L_Q).unsqueeze(1), index_sample, :]
        Q_K_sample = torch.matmul(Q.unsqueeze(-2), K_sample.transpose(-2, -1)).squeeze()

        # find the Top_k query with sparisty measurement
        M = Q_K_sample.max(-1)[0] - torch.div(Q_K_sample.sum(-1), L_K)
        M_top = M.topk(n_top, sorted=False)[1]

        # use the reduced Q to calculate Q_K
        Q_reduce = Q[
            torch.arange(B)[:, None, None], torch.arange(H)[None, :, None], M_top, :
        ]  # factor*ln(L_q)
        Q_K = torch.matmul(Q_reduce, K.transpose(-2, -1))  # factor*ln(L_q)*L_k

        return Q_K, M_top

    def _get_initial_context(self, V, L_Q):
        B, H, L_V, D = V.shape
        if not self.mask_flag:
            # V_sum = V.sum(dim=-2)
            V_sum = V.mean(dim=-2)
            contex = V_sum.unsqueeze(-2).expand(B, H, L_Q, V_sum.shape[-1]).clone()
        else:  # use mask
            assert L_Q == L_V  # requires that L_Q == L_V, i.e. for self-attention only
            contex = V.cumsum(dim=-2)
        return contex

    def _update_context(self, context_in, V, scores, index, L_Q, attn_mask):
        B, H, L_V, D = V.shape

        if self.mask_flag:
            attn_mask = ProbMask(B, H, L_Q, index, scores, device=V.device)
            scores.masked_fill_(attn_mask.mask, -np.inf)

        attn = torch.softmax(scores, dim=-1)  # nn.Softmax(dim=-1)(scores)

        context_in[
            torch.arange(B)[:, None, None], torch.arange(H)[None, :, None], index, :
        ] = torch.matmul(attn, V).type_as(context_in)
        if self.output_attention:
            attns = (torch.ones([B, H, L_V, L_V], device=attn.device) / L_V).type_as(
                attn
            )
            attns[
                torch.arange(B)[:, None, None], torch.arange(H)[None, :, None], index, :
            ] = attn
            return (context_in, attns)
        else:
            return (context_in, None)

    def forward(self, queries, keys, values, attn_mask):
        B, L_Q, H, D = queries.shape
        _, L_K, _, _ = keys.shape

        queries = queries.transpose(2, 1)
        keys = keys.transpose(2, 1)
        values = values.transpose(2, 1)

        U_part = self.factor * np.ceil(np.log(L_K)).astype("int").item()  # c*ln(L_k)
        u = self.factor * np.ceil(np.log(L_Q)).astype("int").item()  # c*ln(L_q)

        U_part = U_part if U_part < L_K else L_K
        u = u if u < L_Q else L_Q

        scores_top, index = self._prob_QK(queries, keys, sample_k=U_part, n_top=u)

        # add scale factor
        scale = self.scale or 1.0 / math.sqrt(D)
        if scale is not None:
            scores_top = scores_top * scale
        # get the context
        context = self._get_initial_context(values, L_Q)
        # update the context with selected top_k queries
        context, attn = self._update_context(
            context, values, scores_top, index, L_Q, attn_mask
        )

        return context.contiguous(), attn

ProbMask 类用于生成概率掩码,而 ProbAttention 类实现了一种基于概率的注意力机制。在 ProbAttention 类中,_prob_QK 方法用于计算 Q 和 K 的采样点积,_get_initial_context 方法用于获取初始上下文,_update_context 方法用于更新上下文。最后,forward 方法将这些部分组合在一起,实现了整个注意力机制的前向传播。

NeuralForecast 库的 models 模块可以用于构建 Informer 信息网络模型,NeuralForecast 库是基于 PyTorch 的高级封装,它提供了便捷的接口来构建和训练包括Informer信息网络模型在内的多种时间序列预测模型。Informer 模型以其处理长序列数据的高效性和准确性而著称,非常适合用于需要捕捉长期依赖关系的任务。

horizon = 1
model = Informer(
    h = 1,                                    # Forecasting horizon
    input_size =10,                           # Input size
    stat_exog_list=None,                      # static exogenous columns.
    hist_exog_list=None,                      # historic exogenous
    futr_exog_list=None,                      # future exogenous columns.
    exclude_insample_y=False,                 # bool=False, the model skips the autoregressive features y[t-input_size:t] if True.
    decoder_input_size_multiplier = 0.5,      # float = 0.5,
    hidden_size = 128,                        # units of embeddings and encoders.
    dropout = 0.05,                           # float (0, 1)
    factor = 3,                               # Prob sparse attention factor.
    n_head = 4,                               # controls number of multi-head's attention.
    conv_hidden_size = 32,                    # channels of the convolutional encoder.
    activation = 'gelu',                      # activation from ['ReLU', 'Softplus', 'Tanh', 'SELU', 'LeakyReLU', 'PReLU', 'Sigmoid', 'GELU'].
    encoder_layers = 2,                       # number of layers for the TCN encoder.
    decoder_layers = 1,                       # number of layers for the MLP decoder.
    distil = True,                            # bool = True. wether the Informer decoder uses bottlenecks.
    loss=MAE(),                               # PyTorch module, instantiated train loss class from [losses collection]
    valid_loss=None,                          # PyTorch module=`loss`, instantiated valid loss class from [losses collection]
    max_steps = 1000,                         # Maximum number of training iterations
    learning_rate = 1e-4,                     # float=1e-3, Learning rate between (0, 1).
    num_lr_decays = -1,                       # int=-1, Number of learning rate decays, evenly distributed across max_steps.
    early_stop_patience_steps = -1,           # int=-1, Number of validation iterations before early stopping.
    val_check_steps = 100,                    # Compute validation loss every 100 steps
    batch_size = 32,                          # number of different series in each batch.
    valid_batch_size = None,                  # number of different series in each validation and test batch.
    windows_batch_size=1024,                  # number of windows to sample in each training batch.
    inference_windows_batch_size=1024,        # number of windows to sample in each inference batch.
    start_padding_enabled=False,              # bool=False, if True, the model will pad the time series with zeros at the beginning, by input size.
    step_size = 1,                            # step size between each window of temporal data.
    scaler_type = "identity",                 # str='robust', type of scaler for temporal inputs normalization see temporal scaler
    random_seed = 1,                          # random_seed for pytorch initializer and numpy generators.
    drop_last_loader = False,                 # bool=False, if True `TimeSeriesDataLoader` drops last non-full batch.
    optimizer=None,                           # Subclass of 'torch.optim.Optimizer', optional, user specified optimizer instead of the default choice (Adam).
    optimizer_kwargs=None,                    # dict, optional, list of parameters used by the user specified `optimizer`.
    lr_scheduler=None,                        # Subclass of 'torch.optim.lr_scheduler.LRScheduler', optional, user specified lr_scheduler instead of the default choice (StepLR).
    lr_scheduler_kwargs=None,                 # dict, optional, list of parameters used by the user specified `lr_scheduler`.
    dataloader_kwargs=None,                   # dict, optional, list of parameters passed into the PyTorch Lightning dataloader by the `TimeSeriesDataLoader`.
)

中文解释:
exclude_insample_y: bool=False, the model skips the autoregressive features y[t-input_size:t] if True.意思是如果设置为True,说明模型会跳过(也就是不使用、忽略)自回归特征中从 y t − i n p u t s i z e y_{t-inputsize} yt−inputsize​到 y t y_t yt​这一部分数据。正常情况下,这些数据往往会被纳入模型的输入,作为帮助模型学习时间序列规律以及进行预测的重要依据。但当满足上述条件时,模型就不会把这一段对应的历史时间序列值当作输入信息了,相当于切断了这部分自回归的信息链路,模型会基于其他可用的输入(比如外生变量、其他历史阶段的数据等,如果有的话)来进行后续的处理和预测工作。

5. 模型概要

summary(model=model)
======================================================================
Layer (type:depth-idx)                        Param #
======================================================================
Informer                                      --
├─MAE: 1-1                                    --
├─MAE: 1-2                                    --
├─ConstantPad1d: 1-3                          --
├─TemporalNorm: 1-4                           --
├─DataEmbedding: 1-5                          --
│    └─TokenEmbedding: 2-1                    --
│    │    └─Conv1d: 3-1                       384
│    └─PositionalEmbedding: 2-2               --
│    └─Dropout: 2-3                           --
├─DataEmbedding: 1-6                          --
│    └─TokenEmbedding: 2-4                    --
│    │    └─Conv1d: 3-2                       384
│    └─PositionalEmbedding: 2-5               --
│    └─Dropout: 2-6                           --
├─TransEncoder: 1-7                           --
│    └─ModuleList: 2-7                        --
│    │    └─TransEncoderLayer: 3-3            74,912
│    │    └─TransEncoderLayer: 3-4            74,912
│    └─ModuleList: 2-8                        --
│    │    └─ConvLayer: 3-5                    49,536
│    └─LayerNorm: 2-9                         256
├─TransDecoder: 1-8                           --
│    └─ModuleList: 2-10                       --
│    │    └─TransDecoderLayer: 3-6            141,216
│    └─LayerNorm: 2-11                        256
│    └─Linear: 2-12                           129
======================================================================
Total params: 341,985
Trainable params: 341,985
Non-trainable params: 0
======================================================================

6. 交叉验证

交叉验证方法 cross_validation 将返回模型在测试集上的预测结果。这里我们使用第一种方法进行交叉验证

nf = NeuralForecast(
    models = [model],
    freq='H'
)
Y_hat_df = nf.cross_validation(df=Y_df, val_size=val_size,
                               test_size=test_size, n_windows=None)

  | Name          | Type          | Params | Mode 
--------------------------------------------------------
0 | loss          | MAE           | 0      | train
1 | padder_train  | ConstantPad1d | 0      | train
2 | scaler        | TemporalNorm  | 0      | train
3 | enc_embedding | DataEmbedding | 384    | train
4 | dec_embedding | DataEmbedding | 384    | train
5 | encoder       | TransEncoder  | 199 K  | train
6 | decoder       | TransDecoder  | 141 K  | train
--------------------------------------------------------
341 K     Trainable params
0         Non-trainable params
341 K     Total params
1.368     Total estimated model params size (MB)
73        Modules in train mode
0         Modules in eval mode

7. 预测结果

Y_plot = Y_hat_df.copy() # OT dataset
cutoffs = Y_hat_df['cutoff'].unique()[::horizon]
Y_plot = Y_plot[Y_hat_df['cutoff'].isin(cutoffs)]

plt.figure(figsize=(20,5))
plt.plot(Y_plot['ds'], Y_plot['y'], label='True')
plt.plot(Y_plot['ds'], Y_plot['Informer'], label='Informer')
plt.title('Informer Prediction', fontdict={'family': 'Times New Roman'})
plt.xlabel('Datestamp')
plt.ylabel('OT')
plt.grid()
plt.legend()

预测结果

8. 模型评估

以下代码使用了一些常见的评估指标:平均绝对误差(MAE)、平均绝对百分比误差(MAPE)、均方误差(MSE)、均方根误差(RMSE)来衡量模型预测的性能。这里我们将调用 neuralforecast.losses.numpy 模块中的 mae, mse, mape, rmse 函数来对模型的预测效果进行评估。

mae_informer = mae(Y_hat_df['y'], Y_hat_df['Informer'])
mse_informer = mse(Y_hat_df['y'], Y_hat_df['Informer'])

mape_informer = mape(Y_hat_df['y'], Y_hat_df['Informer'])
rmse_informer = rmse(Y_hat_df['y'], Y_hat_df['Informer'])
print(f'Informer_mae: {mae_informer:.3f}')
print(f'Informer_mse: {mse_informer:.3f}')
print(f'Informer_mape: {mape_informer * 100:.3f}%')
print(f'Informer_rmse: {rmse_informer:.3f}')
Informer_mae: 0.069
Informer_mse: 0.007
Informer_mape: 5.403%
Informer_rmse: 0.085

标签:None,df,Hyperopt,self,torch,cross,调优,Informer,size
From: https://blog.csdn.net/m0_63287589/article/details/144965543

相关文章

  • 15. 你知道哪些JVM性能调优参数?
    「堆栈内存相关」-Xms设置初始堆的大小-Xmx设置最大堆的大小-Xmn设置年轻代大小,相当于同时配置-XX:NewSize和-XX:MaxNewSize为一样的值-Xss每个线程的堆栈大小-XX:NewSize设置年轻代大小(for1.3/1.4)-XX:MaxNewSize年轻代最大值(for1.3/1.4)-XX:NewRatio年轻代与......
  • 12. 调优命令有哪些?
    SunJDK监控和故障处理命令有jpsjstatjmapjhatjstackjinfojps:JVMProcessStatusTool,显示指定系统内所有的HotSpot虚拟机进程。jstat:JVMstatisticsMonitoring是用于监视虚拟机运行时状态信息的命令,它可以显示出虚拟机进程中的类装载、内存、垃圾收集、JIT编译等运行数......
  • JVM调优配置
    议基于普通的业务服务进行此项配置:<project、sppp、business等>    -XX:MetaspaceSize=512m-XX:MaxMetaspaceSize=512m-Xms1G-Xmx1G-Xmn256m-Xss1m-XX:SurvivorRatio=8-XX:+UseConcMarkSweepGC   建议基于业务平台微服务进行此项配置:<gateway、auth等> ......
  • 【Block总结】CrossFormerBlock
    论文介绍链接:https://arxiv.org/pdf/2108.00154CrossFormerBlock模块提出:论文提出了一种名为CrossFormer的视觉Transformer模型,其中重点介绍了CrossFormerBlock模块的设计。研究背景:针对视觉任务中自注意力模块计算成本高、难以处理跨尺度交互的问题,CrossFormerBlock模块......
  • DiTCtrl:创新KV共享与潜在融合策略,突破多提示视频生成局限,解决提示切换不连贯、场景转
    DiTCtrl:创新KV共享与潜在融合策略,突破多提示视频生成局限,解决提示切换不连贯、场景转换突变问题,实现高质量、流畅过渡,免调优生成精准视频DiTCtrl:ExploringAttentionControlinMulti-ModalDiffusionTransformerforTuning-FreeMulti-PromptLongerVideoGenerati......
  • Cross-modal Information Flow in Multimodal Large Language Models
    本文是LLM系列文章,针对《Cross-modalInformationFlowinMultimodalLargeLanguageModels》的翻译。多模态大型语言模型中的跨模态信息流摘要1引言2相关工作3MLLM中的信息流跟踪4实验设置5不同模态对最终预测的贡献6语言和视觉信息如何集成的?7最终答......
  • 一文吃透常用的 JVM 调优命令!
    目录一、常用命令1.1、jps2.2、jstat2.2.1、示例参数:class2.2.2、示例参数:compiler2.2.3、示例参数:gc2.2.4、示例参数:gccapacity2.2.5、示例参数:gcutil2.2.6、示例参数:gccause2.2.7、示例参数:gcnew2.2.8、示例参数:gcnewcapacity2.2.9、示例参数:gcold2.2.10、示......
  • 【Seed-Labs 2.0】Cross-Site Scripting (XSS) Attack Lab (Web Application: Elgg)
    Overview跨站脚本(XSS)是网络应用程序中常见的一种漏洞。攻击者可利用该漏洞向受害者的网络浏览器注入恶意代码(如JavaScript程序)。利用这些恶意代码,攻击者可以窃取受害者的凭证,如会话cookie。利用XSS漏洞可绕过浏览器为保护这些凭证而采用的访问控制策略(即同一来源......
  • 基于N-HiTS神经层次插值模型的时间序列预测——cross validation交叉验证与ray tune超
    论文链接:https://arxiv.org/pdf/2201.12886v3N-......
  • 快速排序算法的 Java 实现与性能调优
    目录一、快速排序的基本原理二、快速排序的Java实现三、时间复杂度与空间复杂度四、总结引言排序是计算机科学中的基础问题之一,无论是在数据库查询、数据分析,还是在日常编程中,排序算法的选择都对性能有着重要的影响。快速排序(QuickSort)是最广泛使用的排序算法之一,......