PyTorch LSTM 模型上的 CrossEntropyLoss，每个时间步一分类

标签：python machine-learning pytorch neural-network lstm

我正在尝试创建一个 LSTM 模型来检测时间序列数据中的异常情况。它需要 5 个输入并产生 1 个布尔输出（如果检测到异常则为 True/False）。异常模式通常连续 3 - 4 个时间步长。与大多数 LSTM 示例不同，它们预测未来数据或对整个数据序列进行分类，我尝试在每个时间步输出 True/False 检测标志（如果检测到，则在模式的最后一个时间步点处为 True））。

不幸的是，CrossEntropyLoss 似乎不允许超过 1D 的输出张量，在这种情况下，它将是 2D [num 个序列，带有布尔数据的序列长度]

这里是一些示例代码我想要生成什么：

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Define LSTM classifier model
class LSTMClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(LSTMClassifier, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])
        return out

# Input - 100 examples containing 5 data points per timestep (where there are 10 timesteps)
X_train = np.random.rand(100, 10, 5)
# Output - 100 examples containing 1 True/False output per timestep to match the input
y_train = np.random.choice(a=[True, False], size=(100, 10))  # Binary labels (True or False)

# Convert data to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.bool)

# Define model parameters
input_size = X_train.shape[2] # 5 inputs per timestep
hidden_size = 4 # Pattern we are trying to detect is usually 4 timesteps long
num_layers = 1
output_size = 1 # True/False

# Instantiate the model
model = LSTMClassifier(input_size, hidden_size, num_layers, output_size)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
num_epochs = 10
for epoch in range(num_epochs):
    optimizer.zero_grad()
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    loss.backward()
    optimizer.step()
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}')

# Test the model
X_test = np.random.rand(10, 10, 5) # Generate some test data - same dimensions as input
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
with torch.no_grad():
    predictions = model(X_test_tensor)
    predicted_outputs = torch.argmax(predictions, dim=1)
    print("Predicted Outputs:", predicted_outputs)

我是否需要重新调整输出，或者可能使用不同的损失函数，或者 LSTM 以外的模型？

你遇到的问题是你的LSTM模型在每个时间步都输出一个预测，而 CrossEntropyLoss 希望有一个单一预测的张量（对于每个序列）。你有几个选择来解决这个问题：

1. 使用BCELoss和sigmoid输出：

修改你的LSTM模型，使其在 fc 层后输出一个介于0和1之间的值。你可以通过在 fc 层后添加一个 sigmoid 激活函数来实现这一点：

    def forward(self, x):
        # ... your existing code ...
        out = self.fc(out)
        out = torch.sigmoid(out) # Apply sigmoid activation
        return out

使用 BCELoss （二元交叉熵损失）作为你的损失函数，它适用于二元分类问题：

criterion = nn.BCELoss()

在训练过程中，你需要将你的 y_train_tensor 转换为 float 类型：

y_train_tensor = torch.tensor(y_train, dtype=torch.float32)

2. 为每个时间步计算损失，然后汇总：

保持你的模型输出为每个时间步的预测。
使用 BCELoss 计算每个时间步的损失。
对所有时间步的损失求平均值，得到一个单一损失值用于反向传播。

criterion = nn.BCELoss(reduction='none') # Do not reduce across time steps

# ... inside the training loop:
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor.float())  # Calculate loss for each time step
    loss = loss.mean() # Average the loss over all time steps
    loss.backward()
    # ...

3. 使用LSTM进行序列分类：

如果你只关心整个序列是否包含异常，而不是每个时间步的具体预测，你可以使用LSTM进行序列分类。
修改你的模型，只在最后一个时间步输出一个预测。
使用 BCELoss 或 CrossEntropyLoss （如果你输出的是类概率），并使用一个单一标签来指示整个序列是否包含异常。

以下是如何实现选项2的示例代码：

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# ... (Your LSTMClassifier class remains the same) ...

# ... (Your data generation remains the same) ...

# ... (Your model parameters remain the same) ...

# Instantiate the model
model = LSTMClassifier(input_size, hidden_size, num_layers, output_size)

# Define loss function and optimizer
criterion = nn.BCELoss(reduction='none') # Don't reduce loss across time steps
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
num_epochs = 10
for epoch in range(num_epochs):
    optimizer.zero_grad()
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor.float()) # Calculate loss for each time step
    loss = loss.mean() # Average the loss over all time steps
    loss.backward()
    optimizer.step()
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}')

# Test the model
# ... (Your test code remains similar, but you'll get predictions for each time step) ...

请记住，为你的特定问题选择最佳方法取决于你的目标和数据的特征。

标签：python,machine-learning,pytorch,neural-network,lstm
From： 78781313

PyTorch LSTM 模型上的 CrossEntropyLoss，每个时间步一分类

相关文章

赞助商

阅读排行