首页 > 其他分享 >PyTorchStepByStep - Chapter 9: Sequence-to-Sequence

PyTorchStepByStep - Chapter 9: Sequence-to-Sequence

时间:2024-11-03 10:20:51浏览次数:1  
标签:Chapter target seq Sequence self PyTorchStepByStep decoder hidden out

 

points, directions = generate_sequences(n=256, seed=13)

And then let’s visualize the first five squares:

class Encoder(nn.Module):
    def __init__(self, n_features, hidden_dim):
        super().__init__()
        self.n_features = n_features
        self.hidden_dim = hidden_dim
        self.hidden = None
        self.basic_rnn = nn.GRU(self.n_features, self.hidden_dim, batch_first=True)

    def forward(self, x):
        rnn_out, self.hidden = self.basic_rnn(x)

        return rnn_out  # N, L, F

 

coordinates of a "perfect" square and split it into source and target sequences:

full_seq = torch.tensor([[-1, -1], [-1, 1], [1, 1], [1, -1]]).float().view(1, 4, 2)
source_seq = full_seq[:, :2] # first two corners
target_seq = full_seq[:, 2:] # last two corners

Now, let’s encode the source sequence and take the final hidden state:

torch.manual_seed(21)
encoder = Encoder(n_features=2, hidden_dim=2)
hidden_seq = encoder(source_seq)    # output is N, L, F
hidden_final = hidden_seq[:, -1:]   # takes last hidden state
hidden_final

# tensor([[[ 0.3105, -0.5263]]], grad_fn=<SliceBackward0>)

 

The decoder model is actually quite similar to the models we developed in Chapter 8:

class Decoder(nn.Module):
    def __init__(self, n_features, hidden_dim):
        super().__init__()
        self.n_features = n_features
        self.hidden_dim = hidden_dim
        self.hidden = None
        self.basic_rnn = nn.GRU(self.n_features, self.hidden_dim, batch_first=True) 
        self.regression = nn.Linear(self.hidden_dim, self.n_features)

    def init_hidden(self, hidden_seq):
        # We only need the final hidden state
        hidden_final = hidden_seq[:, -1:]  # N, 1, H
        
        # Initialize decoder’s hidden state using encoder’s final hidden state.
        # But we need to make it sequence-first
        self.hidden = hidden_final.permute(1, 0, 2)  # 1, N, H

    def forward(self, x):
        # x is N, 1, F
        # The recurrent layer both uses and updates the hidden state.
        batch_first_output, self.hidden = self.basic_rnn(x, self.hidden) 

        last_output = batch_first_output[:, -1:]
        out = self.regression(last_output)

        # The output has the same shape as the input (N, 1, F).
        return out.view(-1, 1, self.n_features)

 

torch.manual_seed(21)
decoder = Decoder(n_features=2, hidden_dim=2)

# Initial hidden state will be encoder's final hidden state
decoder.init_hidden(hidden_seq)
# Initial data point is the last element of source sequence
inputs = source_seq[:, -1:]

target_len = 2
for i in range(target_len):
    print(f'Hidden: {decoder.hidden}')
    out = decoder(inputs)   # Predicts coordinates
    print(f'Output: {out}\n')
    # Predicted coordinates are next step's inputs
    inputs = out

 

Hidden: tensor([[[ 0.3105, -0.5263]]], grad_fn=<PermuteBackward0>)
Output: tensor([[[-0.2339,  0.4702]]], grad_fn=<ViewBackward0>)

Hidden: tensor([[[ 0.3913, -0.6853]]], grad_fn=<StackBackward0>)
Output: tensor([[[-0.0226,  0.4628]]], grad_fn=<ViewBackward0>)

 

# Initial hidden state will be encoder's final hidden state
decoder.init_hidden(hidden_seq)
# Initial data point is the last element of source sequence
inputs = source_seq[:, -1:]

target_len = 2
for i in range(target_len):
    print(f'Hidden: {decoder.hidden}')
    out = decoder(inputs) # Predicts coordinates    
    print(f'Output: {out}\n')
    # But completely ignores the predictions and uses real data instead
    inputs = target_seq[:, i:i+1]

 

Hidden: tensor([[[ 0.3105, -0.5263]]], grad_fn=<PermuteBackward0>)
Output: tensor([[[-0.2339,  0.4702]]], grad_fn=<ViewBackward0>)

Hidden: tensor([[[ 0.3913, -0.6853]]], grad_fn=<StackBackward0>)
Output: tensor([[[0.2265, 0.4529]]], grad_fn=<ViewBackward0>)

Now, a bad prediction can only be traced to the model itself, and any bad predictions in previous steps have no effect whatsoever.

# Initial hidden state is encoder's final hidden state
decoder.init_hidden(hidden_seq)
# Initial data point is the last element of source sequence
inputs = source_seq[:, -1:]

teacher_forcing_prob = 0.5
target_len = 2
for i in range(target_len):
    print(f'Hidden: {decoder.hidden}')
    out = decoder(inputs)
    print(f'Output: {out}\n')
    # If it is teacher forcing
    if torch.rand(1) <= teacher_forcing_prob:
        # Takes the actual element
        inputs = target_seq[:, i:i+1]
    else:
        # Otherwise uses the last predicted output
        inputs = out

 

Hidden: tensor([[[ 0.3105, -0.5263]]], grad_fn=<PermuteBackward0>)
Output: tensor([[[-0.2339,  0.4702]]], grad_fn=<ViewBackward0>)

Hidden: tensor([[[ 0.3913, -0.6853]]], grad_fn=<StackBackward0>)
Output: tensor([[[-0.0226,  0.4628]]], grad_fn=<ViewBackward0>)

 

class EncoderDecoder(nn.Module):
    def __init__(self, encoder, decoder, input_len, target_len, teacher_forcing_prob=0.5):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder
        self.input_len = input_len
        self.target_len = target_len
        self.teacher_forcing_prob = teacher_forcing_prob
        self.outputs = None

    def init_outputs(self, batch_size):
        device = next(self.parameters()).device
        # N, L (target), F
        self.outputs = torch.zeros(batch_size, 
                              self.target_len, 
                              self.encoder.n_features).to(device)

    def store_output(self, i, out):
        # Stores the output
        self.outputs[:, i:i+1, :] = out

    def forward(self, x):               
        # splits the data in source and target sequences
        # the target seq will be empty in testing mode
        # N, L, F
        source_seq = x[:, :self.input_len, :]
        target_seq = x[:, self.input_len:, :]
        self.init_outputs(x.shape[0])        

        # Encoder expected N, L, F
        hidden_seq = self.encoder(source_seq)
        # Output is N, L, H
        self.decoder.init_hidden(hidden_seq)

        # The last input of the encoder is also
        # the first input of the decoder
        dec_inputs = source_seq[:, -1:, :]

        # Generates as many outputs as the target length
        for i in range(self.target_len):
            # Output of decoder is N, 1, F
            out = self.decoder(dec_inputs)
            self.store_output(i, out)

            prob = self.teacher_forcing_prob
            # In evaluation/test the target sequence is
            # unknown, so we cannot use teacher forcing
            if not self.training:
                prob = 0

            # If it is teacher forcing
            if torch.rand(1) <= prob:
                # Takes the actual element
                dec_inputs = target_seq[:, i:i+1, :]
            else:
                # Otherwise uses the last predicted output
                dec_inputs = out

        return self.outputs

 

 

标签:Chapter,target,seq,Sequence,self,PyTorchStepByStep,decoder,hidden,out
From: https://www.cnblogs.com/zhangzhihui/p/18522949

相关文章

  • E. Best Subsequence
    “最大权值闭合图,即给定一张有向图,每个点都有一个权值(可以为正或负或0),你需要选择一个权值和最大的子图,使得子图中每个点的后继都在子图中。”这样的定义可以让你理解算法执行的逻辑,却难以在你赛场上遇到它时牵动你的思绪更符合你做题时真切感受的描述应该是:给你一些点,消耗一些......
  • [ARC186E] Missing Subsequence 题解
    Description给定一个整数序列\(\left(X_1,\ldots,X_M\right)\),其长度为\(M\),元素取值为\(1,\ldots,K\)。要求找出长度为\(N\)的序列\((A_1,\ldots,A_N)\)的数量,元素取值为\(1,\ldots,K\),并满足以下条件,结果取模\(998244353\):在所有长度为\(M\)的序列中,唯......
  • Leetcode 3336. Find the Number of Subsequences With Equal GCD
    Leetcode3336.FindtheNumberofSubsequencesWithEqualGCD1.解题思路2.代码实现题目链接:3336.FindtheNumberofSubsequencesWithEqualGCD1.解题思路这一题没能自力搞定,挺伤心的,看大佬的代码之后发现思路上是一个非常暴力的动态规划,就是不断地考察每一......
  • [USACO17JAN] Subsequence Reversal P
    根据数据范围,不难想到DP状态应该是\(n^4\)级别的。先考虑当没有反转区间的操作时如何转移。设\(dp_{l,r,L,R}\)表示当前区间为\(l\simr\),值域\(\in[L,R]\)时的答案。转移时枚举四个维度,可以从\(dp_{l,r,L,R-1},dp_{l,r,L+1,R},dp_{l+1,r,L,R},dp_{l,r-1,L,R}\)转移......
  • Complete the Sequence 第一次做英文c++的题
    第一次接触全是英语的题,怎么会有这么难的呢?首先我拿起了它和中文的题目一对比,发现分成了5个板块,将这5个板块细细拆分后,了解到了大意,大意为输入n组数据,其中输入x个数,然后找出它的规律,输出接下来的y个数。比如一组数据,1、2、3、4、5、6,要输出剩下的数据,你肯定会不有毫不犹豫的回答......
  • 直观解释注意力机制,Transformer的核心 | Chapter 6 | Deep Learning | 3Blue1Brown
    目录前言1.前情提要:词嵌入2.注意力是什么?Mole是什么?Tower又是什么?3.注意力模式:“一个毛茸茸的蓝色生物漫步于葱郁的森林”,名词与形容词,查询与键4.掩码:看前不看后5.上下文窗口大小6.值矩阵7.参数有多少8.交叉注意力9.多头注意力10.输出矩阵11.加深网络12.结语......
  • 直观解释大语言模型如何储存事实 | Chapter 7 | Deep Learning | 3Blue1Brown
    目录前言1.大语言模型中的事实储存在哪里?2.快速回顾一下Transformer3.针对示例所做的假设4.多层感知器内部机理5.参数统计6.叠加7.下期预告相关资料结语前言3Blue1Brown视频笔记,仅供自己参考这几个章节主要解析GPT背后的Transformer,本章主要是剖析Tra......
  • C语言程序设计:现代设计方法习题笔记《chapter5》上篇
    第一题        题目分析:程序判断一个数的位数可以通过循环除以10求余,通过计算第一次与10求余为0的次数计算位数,由此可得示例1代码,另一种思路根据提示,可得示例2代码。代码示例1:#include<stdio.h>intmain(){ printf("Enteranumber:"); intnumber,temp; sc......
  • [题解]P4552 [Poetize6] IncDec Sequence
    P4552[Poetize6]IncDecSequence我们对\(a\)做差分,得到数组\(b\)。\(a\)的区间修改,等价于选定\(i,j\in[1,n+1]\),令\(b[i]\leftarrow(b[i]+1),b[j]\leftarrow(b[j]-1)\),我们的目标是让\(b[2\simn]\)全为\(0\)。记\(x,y\)分别表示\(b[2\simn]\)中正数之和、负数的绝对值之和......
  • 【读书笔记-《网络是怎样连接的》- 2】Chapter2_1-协议栈通信详细过程
    第二章从协议栈这部分来看网络中的通信如何实现,准备从两部分来进行分解。本篇是第一部分:详细介绍TCP协议栈收发数据的过程。首先来看下面的图。从应用程序到网卡需要经过如下几部分,上面的部分通过委托下面的部分来完成工作。首先是应用程序,通过Socket库来委托协议栈完成工......