Paper: Informer

时间：2023-09-08 21:01:51浏览次数：69

标签：Transformer sequence self attention long Paper 2019 Informer

Informer 时间序列模型

1 Introduction

3 significant limitations in LSTF

LSTF(Long sequence time-series forecasting)

The quadratic computation of self-attention. The atom operation of self-attention mechanism, namely canonical dot-product, causes the time complexity and memory usage per layer to be O(L2).
The memory bottleneck in stacking layers for long inputs. The stack of J encoder/decoder layers makes total memory usage to be O(J · L2), which limits the model scalability in receiving long sequence inputs.
The speed plunge in predicting long outputs. Dynamic decoding of vanilla Transformer makes the step-by-step inference as slow as RNN-based model (Fig.(1b)).

prior works

Vanilla Transformer(2017)
The Sparse Transformer(2019)
LogSparse Transformer(2019)
Longformer(2020)
Reformer(2019)
Linformer(2020)
Transformer-XL(2019)
Compressive Transformer(2019)

2 Preliminary

3 Methodology

Efficient Self-attention Mechanism

query’s attention is defined as a kernel smoother in a probability form

\(\mathcal{A}(q, K, V) = \mathbb{E}_{p(k|q)[v]}\)

The Sparse Transformer

“self-attention probability has potential sparsity” 自注意力概率具有潜在的稀疏性

Query Sparsity Measurement

a few dot-product pairs contribute to the major attention,
others generate trivial attention.

distinguish the “important” queries

Kullback-Leibler divergence

Dropping the constant,

query’s sparsity measurement

Log-Sum-Exp (LSE)

arithmetic mean

ProbSparse Self-attention

ProbSparse self-attention

only attend to the u dominant queries

Encoder

extract dependency

Self-attention Distilling

distilling (inspired by dilated convolution)
1. Attention Block
2. Conv1d( )
3. ELU( ) : activation function
4. MaxPool
reduce memory usage

Decoder

two identical multihead attention layers

Generative Inference

sample a L_token long sequence
take the known 5 days before the target sequence as “starttoken”
feed the generative-style inference decoder
one forward procedure predicts outputs

Loss function

MSE loss function

4 Experiment

Datasets

2 collected real-world datasets for LSTF and 2 public benchmark datasets.

ETT (Electricity Transformer Temperature)

ECL (Electricity Consuming Load)

Weather

Experimental Details

Baselines:

ARIMA(2014)
Prophet(2018)
LSTMa(2015)
LSTnet(2018)
DeepAR(2017)

self-attention:

the canonical self-attention variant
Reformer(2019)
LogSparse self-attention(2019)

Metrics

Platform:

a single Nvidia V100 32GB GPU

Results and Analysis

Parameter Sensitivity

Ablation Study

Computation Efficiency

5 Conclusion

标签：Transformer,sequence,self,attention,long,Paper,2019,Informer
From： https://www.cnblogs.com/tow1/p/17688525.html

Proj CDeepFuzz Paper Reading: Software Testing with Large Language Model: Survey
Abstract本文:Task:ReviewontheuseofLLMsinsoftwaretestingMethod:1.analyzes52relevantstudies1.Intro2.Background2.1LargeLanguageModel2.2SoftwareTesting3.PaperSelectionandReviewSchema3.1SurveyScope3.2PaperCollectionMetho......
Proj CDeepFuzz Paper Reading: PELICAN: Exploiting Backdoors of Naturally Trained
Abstract背景：本文研究的不是被恶意植入的后门，而是productsofdefectsintraining攻击模式:injectingsomesmallfixedinputpattern(backdoor)toinducemisclassification本文:PELICANGithub:https://github.com/ZhangZhuoSJTU/PelicanTask:findbackdoorvulne......
Informer模型学习记录
Informer时间序列模型data1.WTH.csv水厂csv格式数据，总共13列，包含一列标签，12列特征，后面输入模型维度：12每隔一小时一条记录每个时间点对应多个特征，最后一个数据值作为数据标签2.ECL.csvcsv格式数据3.data_loadercols=list(df_raw.columns);cols.re......
Proj CDeepFuzz Paper Reading: COMET: Coverage-guided Model Generation For Deep L
Abstract背景：已有的方法(Muffin,Lemon,Cradle)cancoveratmost34.1%layerinputs,25.9%layerparametervalues,and15.6%layersequences.本文：COMETGithub:https://github.com/maybeLee/COMETBugType:Crash,NaN,inconsistencybetweentheTensorFlowlibrar......
Proj CDeepFuzz Paper Reading: IvySyn: Automated Vulnerability Discovery in Deep
Abstract本文：IvySynTask:discovermemoryerrorvulnerabilitiesinDLframeworksBugType:memorysafetyerrors,fatalruntimeerrorsMethod:利用nativeAPIs中静态写明的类型信息在low-levelkernelcode上执行type-awaremutation-basedfuzzingsynthesizeProofof......
Proj CDeepFuzz Paper Reading: Invariance-inducing regularization using worst-cas
Abstract本文：Task:1.proveinvariance-inducingregularizerscanincreasepredictiveaccuracyforworst-casespatialtransformations2.provethatonadversarialexamplesfromtransformationgroupsintheinfinitedatalimitrobusttrainingcanalsoimpro......
Proj CDeepFuzz Paper Reading: Framework for Evaluating Faithfulness of Local Exp
Abstract本文:Task:1.studythefaithfulnessofanexplanationsystemtotheunderlyingpredictionmodelonconsistencyandsufficiency2.introducequantitativemeasuresofconsistencyandsufficiency3.provideestimatorsandsamplecomplexityboundsfo......
Proj CDeepFuzz Paper Reading: DeepTest: automated testing of deep-neural-network
Abstract本文:DeepTestTask:asystematictestingtoolforDNN-drivenvehiclesMethod:generatedtestcaseswithreal-worldchangeslikerain,fog,lightingconditions,etc.maxthenumberofactivatedneuronsGithub:https://github.com/ARiSE-Lab/deepTes......
Proj CDeepFuzz Paper Reading: DeepGauge: multi-granularity testing criteria for
Abstract本文:DeepGaugeTask:providemulti-granularitytestingcriteriaforDLsystemsMethod:multi-granularitytestingcriteriaforDLsystems:1.k-multisectionNeuronCoverage2.NeuronBoundaryCoverage3.StrongNeuronActivationCoverage4.Top-kN......
前端项目实战叁佰肆拾柒react-admin和material ui-Paper的使用Basic
import*asReactfrom'react';importBoxfrom'@mui/material/Box';importPaperfrom'@mui/material/Paper';exportdefaultfunctionSimplePaper(){return(<Boxsx={{display:'flex',......