标签：__ 教程 features self PyTorch call new 2.2 ___

介绍 torch.compile

原文：pytorch.org/tutorials/intermediate/torch_compile_tutorial.html

译者：飞龙

协议：CC BY-NC-SA 4.0

注意

点击这里下载完整的示例代码

作者: William Wen

torch.compile是加速 PyTorch 代码的最新方法！torch.compile通过将 PyTorch 代码 JIT 编译成优化的内核来使 PyTorch 代码运行更快，同时需要最少的代码更改。

在本教程中，我们涵盖了基本的torch.compile用法，并展示了torch.compile相对于之前的 PyTorch 编译器解决方案（如TorchScript和FX Tracing）的优势。

基本用法
演示加速效果
与 TorchScript 和 FX Tracing 的比较
TorchDynamo 和 FX 图
结论

所需的 pip 依赖项

torch >= 2.0
torchvision
numpy
scipy
tabulate

注意：为了重现下面显示的速度提升数字以及其他地方记录的数字，建议使用现代的 NVIDIA GPU（H100、A100 或 V100）进行本教程。

import torch
import warnings

gpu_ok = False
if torch.cuda.is_available():
    device_cap = torch.cuda.get_device_capability()
    if device_cap in ((7, 0), (8, 0), (9, 0)):
        gpu_ok = True

if not gpu_ok:
    warnings.warn(
        "GPU is not NVIDIA V100, A100, or H100\. Speedup numbers may be lower "
        "than expected."
    )

/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py:48: UserWarning:

GPU is not NVIDIA V100, A100, or H100\. Speedup numbers may be lower than expected.

基本用法

torch.compile已包含在最新的 PyTorch 中。在 GPU 上运行 TorchInductor 需要 Triton，Triton 已包含在 PyTorch 2.0 nightly 二进制文件中。如果 Triton 仍然缺失，请尝试通过 pip 安装torchtriton（pip install torchtriton --extra-index-url "https://download.pytorch.org/whl/nightly/cu117"用于 CUDA 11.7）。

通过将可调用对象传递给torch.compile，可以优化任意的 Python 函数。然后我们可以调用返回的优化函数来替代原始函数。

def foo(x, y):
    a = torch.sin(x)
    b = torch.cos(y)
    return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))

tensor([[ 1.6850,  1.9924,  1.7090,  0.0034,  1.1414, -0.1822,  0.4861, -0.0536,
         -0.2252,  1.9398],
        [ 0.3693, -0.0695,  0.1748,  0.3436,  0.1939,  1.5721,  1.9882, -0.2235,
          0.3161,  1.2642],
        [ 0.2480,  1.8793,  1.7152,  1.6772,  1.8881,  1.4748,  1.3466,  1.7763,
          0.7469,  1.0407],
        [-0.1121,  1.6015, -0.0188,  0.2128,  0.5218,  1.9838,  0.8185,  0.5093,
         -0.3603,  0.1793],
        [-1.7890,  1.7532, -0.4040,  0.1222, -0.0029,  1.7975, -0.3877,  0.5123,
          0.1673,  0.1330],
        [ 1.0627,  0.9609,  0.1019,  1.8814,  0.1142, -0.2338, -0.9621,  0.7631,
          0.6506,  0.1853],
        [ 0.4584,  1.7648, -0.0444,  1.9610,  1.5884,  0.7353,  1.2190,  1.3662,
          1.0938, -0.1587],
        [-0.7502,  1.6640,  0.3495,  1.3496,  0.8187,  1.1719,  0.5820,  0.1498,
          0.0885,  0.1036],
        [ 0.3961,  0.6043, -0.0861, -0.3371,  0.8622,  1.4341,  1.2988,  0.5023,
          0.3074,  0.1277],
        [ 0.9748,  0.4117,  1.2616,  1.6314,  0.4693,  0.4092,  0.0401,  1.1196,
          1.2458,  1.3280]])

或者，我们可以装饰这个函数。

@torch.compile
def opt_foo2(x, y):
    a = torch.sin(x)
    b = torch.cos(y)
    return a + b
print(opt_foo2(torch.randn(10, 10), torch.randn(10, 10)))

tensor([[ 0.5360,  0.1697, -0.0561,  0.1890, -0.1310,  1.2276,  1.1739,  0.1944,
         -0.1561,  1.6990],
        [ 1.0421,  1.9472,  0.2682,  0.2701,  1.3346,  0.7651,  1.0897,  1.1730,
          0.6161,  0.9223],
        [ 1.5756,  1.5294,  0.0112, -0.1522, -0.7674,  1.8515, -0.2443,  0.3696,
          0.2693,  0.8735],
        [-0.3701,  1.1190,  1.4164,  1.8648,  1.2080,  0.0732,  1.5274,  0.6868,
          1.2440,  1.0715],
        [-1.2454, -0.0159,  0.4315,  0.1317,  1.0530, -1.0603, -0.0532,  0.6661,
          1.7101, -0.2076],
        [-0.7091,  0.7824,  1.7161,  1.2750,  0.6368,  1.2488,  0.4897,  1.2429,
          1.3409,  1.3735],
        [ 0.8345,  0.0653,  0.3462,  1.2383, -0.4092,  1.6438, -0.0962,  0.4011,
          0.2463, -0.5802],
        [ 1.6349,  0.7297,  1.2547, -0.3113,  0.9310,  0.1162,  1.7618,  0.4882,
          0.7640,  0.2930],
        [ 1.1669, -0.7775,  1.2000,  0.6008, -0.2814,  0.5541,  0.5753,  1.4731,
          1.6835,  0.7370],
        [ 1.5087,  0.6195,  0.1153,  1.2966,  1.8815,  1.1678,  1.5686,  1.6018,
          0.2193,  1.3500]])

我们也可以优化torch.nn.Module实例。

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = torch.nn.Linear(100, 10)

    def forward(self, x):
        return torch.nn.functional.relu(self.lin(x))

mod = MyModule()
opt_mod = torch.compile(mod)
print(opt_mod(torch.randn(10, 100)))

tensor([[-0.0000, -0.0000, 0.2419, 0.0446, 0.9011, 0.2674, 0.3633, 0.4984, -0.0000,
         0.0988],
        [0.6906, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000, 0.8490, -0.0000, -0.0000,
         0.5475],
        [0.0852, 0.2762, 0.7441, -0.0000, -0.0000, 0.1820, -0.0000, -0.0000, -0.0000,
         0.0334],
        [0.3024, 0.0077, 1.2572, -0.0000, -0.0000, 0.6520, -0.0000, -0.0000, -0.0000,
         0.8976],
        [0.1998, 0.3333, -0.0000, 0.7803, 0.4202, 0.0915, -0.0000, 1.2543, -0.0000,
         0.4615],
        [0.2487, 0.4187, -0.0000, -0.0000, 0.5124, -0.0000, 0.2512, -0.0000, 0.5850,
         -0.0000],
        [-0.0000, 0.0048, -0.0000, -0.0000, -0.0000, 0.2287, -0.0000, 0.4841, 0.3915,
         -0.0000],
        [0.2017, -0.0000, 0.0896, 1.4135, 0.0593, 0.3788, -0.0000, -0.0000, -0.0000,
         0.4972],
        [-0.0000, -0.0000, 1.6580, 0.6414, -0.0000, -0.0000, -0.0000, -0.0000, 0.6491,
         0.7755],
        [-0.0000, -0.0000, 0.6442, 0.0260, 0.7456, 0.1000, -0.0000, -0.0000, 0.5366,
         0.1193]], grad_fn=<CompiledFunctionBackward>)

演示加速

让我们现在演示一下，使用torch.compile可以加速真实模型。我们将通过在随机数据上评估和训练一个torchvision模型来比较标准的急切模式和torch.compile。

在开始之前，我们需要定义一些实用函数。

# Returns the result of running `fn()` and the time it took for `fn()` to run,
# in seconds. We use CUDA events and synchronization for the most accurate
# measurements.
def timed(fn):
    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)
    start.record()
    result = fn()
    end.record()
    torch.cuda.synchronize()
    return result, start.elapsed_time(end) / 1000

# Generates random input and targets data for the model, where `b` is
# batch size.
def generate_data(b):
    return (
        torch.randn(b, 3, 128, 128).to(torch.float32).cuda(),
        torch.randint(1000, (b,)).cuda(),
    )

N_ITERS = 10

from torchvision.models import densenet121
def init_model():
    return densenet121().to(torch.float32).cuda()

首先，让我们比较推理。

请注意，在调用torch.compile时，我们有额外的mode参数，我们将在下面讨论。

model = init_model()

# Reset since we are using a different mode.
import torch._dynamo
torch._dynamo.reset()

model_opt = torch.compile(model, mode="reduce-overhead")

inp = generate_data(16)[0]
with torch.no_grad():
    print("eager:", timed(lambda: model(inp))[1])
    print("compile:", timed(lambda: model_opt(inp))[1])

eager: 0.3166423034667969
/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:140: UserWarning:

TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.

compile: 76.9008984375

请注意，与急切模式相比，torch.compile需要更长的时间才能完成。这是因为torch.compile在执行时将模型编译为优化的内核。在我们的示例中，模型的结构没有改变，因此不需要重新编译。因此，如果我们运行我们优化过的模型多次，我们应该会看到与急切模式相比的显著改进。

eager_times = []
for i in range(N_ITERS):
    inp = generate_data(16)[0]
    with torch.no_grad():
        _, eager_time = timed(lambda: model(inp))
    eager_times.append(eager_time)
    print(f"eager eval time {i}: {eager_time}")

print("~" * 10)

compile_times = []
for i in range(N_ITERS):
    inp = generate_data(16)[0]
    with torch.no_grad():
        _, compile_time = timed(lambda: model_opt(inp))
    compile_times.append(compile_time)
    print(f"compile eval time {i}: {compile_time}")
print("~" * 10)

import numpy as np
eager_med = np.median(eager_times)
compile_med = np.median(compile_times)
speedup = eager_med / compile_med
assert(speedup > 1)
print(f"(eval) eager median: {eager_med}, compile median: {compile_med}, speedup: {speedup}x")
print("~" * 10)

eager eval time 0: 0.018123775482177733
eager eval time 1: 0.01638707160949707
eager eval time 2: 0.015945728302001954
eager eval time 3: 0.015856639862060547
eager eval time 4: 0.016062463760375977
eager eval time 5: 0.016149408340454103
eager eval time 6: 0.01600307273864746
eager eval time 7: 0.01600614356994629
eager eval time 8: 0.015964159965515135
eager eval time 9: 0.015935487747192383
~~~~~~~~~~
compile eval time 0: 0.708474853515625
compile eval time 1: 0.008540160179138183
compile eval time 2: 0.00828006362915039
compile eval time 3: 0.008294400215148925
compile eval time 4: 0.00828825569152832
compile eval time 5: 0.008264703750610352
compile eval time 6: 0.008274944305419921
compile eval time 7: 0.008263680458068847
compile eval time 8: 0.008263680458068847
compile eval time 9: 0.00827187156677246
~~~~~~~~~~
(eval) eager median: 0.016004608154296874, compile median: 0.008277503967285157, speedup: 1.9335065519208734x
~~~~~~~~~~

事实上，我们可以看到使用 torch.compile 运行我们的模型会显著加速。加速主要来自减少 Python 开销和 GPU 读写，因此观察到的加速可能会受到模型架构和批量大小等因素的影响。例如，如果模型的架构简单且数据量大，则瓶颈将是 GPU 计算，观察到的加速可能不那么显著。

您可能会看到不同的加速结果，这取决于所选择的 mode 参数。"reduce-overhead" 模式使用 CUDA 图来进一步减少 Python 的开销。对于您自己的模型，您可能需要尝试不同的模式以最大化加速。您可以在这里阅读更多关于模式的信息。

您可能还注意到，我们使用torch.compile运行模型的第二次比其他运行要慢得多，尽管它比第一次运行要快得多。这是因为"reduce-overhead"模式会为 CUDA 图运行几次预热迭代。

对于一般的 PyTorch 基准测试，您可以尝试使用torch.utils.benchmark而不是我们上面定义的timed函数。我们在本教程中编写了自己的计时函数，以展示torch.compile的编译延迟。

现在，让我们考虑比较训练。

model = init_model()
opt = torch.optim.Adam(model.parameters())

def train(mod, data):
    opt.zero_grad(True)
    pred = mod(data[0])
    loss = torch.nn.CrossEntropyLoss()(pred, data[1])
    loss.backward()
    opt.step()

eager_times = []
for i in range(N_ITERS):
    inp = generate_data(16)
    _, eager_time = timed(lambda: train(model, inp))
    eager_times.append(eager_time)
    print(f"eager train time {i}: {eager_time}")
print("~" * 10)

model = init_model()
opt = torch.optim.Adam(model.parameters())
train_opt = torch.compile(train, mode="reduce-overhead")

compile_times = []
for i in range(N_ITERS):
    inp = generate_data(16)
    _, compile_time = timed(lambda: train_opt(model, inp))
    compile_times.append(compile_time)
    print(f"compile train time {i}: {compile_time}")
print("~" * 10)

eager_med = np.median(eager_times)
compile_med = np.median(compile_times)
speedup = eager_med / compile_med
assert(speedup > 1)
print(f"(train) eager median: {eager_med}, compile median: {compile_med}, speedup: {speedup}x")
print("~" * 10)

eager train time 0: 0.3557437438964844
eager train time 1: 0.0508171501159668
eager train time 2: 0.04858163070678711
eager train time 3: 0.048674816131591796
eager train time 4: 0.04914883041381836
eager train time 5: 0.04877619171142578
eager train time 6: 0.048503807067871094
eager train time 7: 0.048318462371826174
eager train time 8: 0.04821299362182617
eager train time 9: 0.04865331268310547
~~~~~~~~~~
compile train time 0: 208.459546875
compile train time 1: 5.33654541015625
compile train time 2: 0.0332677116394043
compile train time 3: 0.023565311431884766
compile train time 4: 0.023459840774536132
compile train time 5: 0.02349772834777832
compile train time 6: 0.023554048538208007
compile train time 7: 0.02490163230895996
compile train time 8: 0.023513023376464843
compile train time 9: 0.02345062446594238
~~~~~~~~~~
(train) eager median: 0.048664064407348634, compile median: 0.023559679985046385, speedup: 2.065565595043579x
~~~~~~~~~~

同样，我们可以看到torch.compile在第一次迭代中需要更长的时间，因为它必须编译模型，但在后续迭代中，与急切执行相比，我们看到了显著的加速。

我们注意到，本教程中呈现的加速比仅用于演示目的。官方加速数值可以在TorchInductor 性能仪表板上查看。

与 TorchScript 和 FX 跟踪的比较

我们已经看到 torch.compile 可以加速 PyTorch 代码。除此之外，为什么我们应该使用 torch.compile 而不是现有的 PyTorch 编译器解决方案，比如 TorchScript 或 FX 追踪呢？主要优势在于 torch.compile 能够处理任意 Python 代码，而对现有代码的更改很小。

torch.compile 可以处理其他编译器解决方案难以处理的一个案例，即数据相关的控制流（下面的 if x.sum() < 0: 行）。

def f1(x, y):
    if x.sum() < 0:
        return -y
    return y

# Test that `fn1` and `fn2` return the same result, given
# the same arguments `args`. Typically, `fn1` will be an eager function
# while `fn2` will be a compiled function (torch.compile, TorchScript, or FX graph).
def test_fns(fn1, fn2, args):
    out1 = fn1(*args)
    out2 = fn2(*args)
    return torch.allclose(out1, out2)

inp1 = torch.randn(5, 5)
inp2 = torch.randn(5, 5)

TorchScript 追踪 f1 会导致结果错误，因为只有实际的控制流路径被追踪。

traced_f1 = torch.jit.trace(f1, (inp1, inp2))
print("traced 1, 1:", test_fns(f1, traced_f1, (inp1, inp2)))
print("traced 1, 2:", test_fns(f1, traced_f1, (-inp1, inp2)))

/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py:274: TracerWarning:

Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

traced 1, 1: True
traced 1, 2: False

FX 追踪 f1 会因为存在数据相关的控制流而导致错误。

import traceback as tb
try:
    torch.fx.symbolic_trace(f1)
except:
    tb.print_exc()

Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py", line 304, in <module>
    torch.fx.symbolic_trace(f1)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 1154, in symbolic_trace
    graph = tracer.trace(root, concrete_args)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 821, in trace
    (self.create_arg(fn(*args)),),
  File "/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py", line 274, in f1
    if x.sum() < 0:
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/proxy.py", line 441, in __bool__
    return self.tracer.to_bool(self)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/proxy.py", line 301, in to_bool
    raise TraceError('symbolically traced variables cannot be used as inputs to control flow')
torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow

如果我们为 x 提供一个值，然后尝试 FX 追踪 f1，那么我们会遇到与 TorchScript 追踪相同的问题，因为追踪函数中的数据相关控制流被移除了。

fx_f1 = torch.fx.symbolic_trace(f1, concrete_args={"x": inp1})
print("fx 1, 1:", test_fns(f1, fx_f1, (inp1, inp2)))
print("fx 1, 2:", test_fns(f1, fx_f1, (-inp1, inp2)))

/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py:638: UserWarning:

Was not able to add assertion to guarantee correct input x to specialized function. It is up to the user to make sure that your inputs match the inputs you specialized the function with.

fx 1, 1: True
fx 1, 2: False

现在我们可以看到 torch.compile 正确处理了数据相关的控制流。

# Reset since we are using a different mode.
torch._dynamo.reset()

compile_f1 = torch.compile(f1)
print("compile 1, 1:", test_fns(f1, compile_f1, (inp1, inp2)))
print("compile 1, 2:", test_fns(f1, compile_f1, (-inp1, inp2)))
print("~" * 10)

compile 1, 1: True
compile 1, 2: True
~~~~~~~~~~

TorchScript 脚本化可以处理数据相关的控制流，但这种解决方案也带来了一系列问题。换句话说，TorchScript 脚本化可能需要进行重大代码更改，并且在使用不受支持的 Python 时会引发错误。

在下面的示例中，我们忘记了 TorchScript 类型注释，因此收到了一个 TorchScript 错误，因为参数 y 的输入类型为 int，与默认参数类型 torch.Tensor 不匹配。

def f2(x, y):
    return x + y

inp1 = torch.randn(5, 5)
inp2 = 3

script_f2 = torch.jit.script(f2)
try:
    script_f2(inp1, inp2)
except:
    tb.print_exc()

Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py", line 347, in <module>
    script_f2(inp1, inp2)
RuntimeError: f2() Expected a value of type 'Tensor (inferred)' for argument 'y' but instead found type 'int'.
Inferred 'y' to be of type 'Tensor' because it was not annotated with an explicit type.
Position: 1
Value: 3
Declaration: f2(Tensor x, Tensor y) -> Tensor
Cast error details: Unable to cast 3 to Tensor

然而，torch.compile 能够轻松处理 f2。

compile_f2 = torch.compile(f2)
print("compile 2:", test_fns(f2, compile_f2, (inp1, inp2)))
print("~" * 10)

compile 2: True
~~~~~~~~~~

与以往编译器解决方案相比，torch.compile 在处理非 PyTorch 函数的使用方面表现良好。

import scipy
def f3(x):
    x = x * 2
    x = scipy.fft.dct(x.numpy())
    x = torch.from_numpy(x)
    x = x * 2
    return x

TorchScript 跟踪将来自非 PyTorch 函数调用的结果视为常量，因此我们的结果可能会悄无声息地出错。

inp1 = torch.randn(5, 5)
inp2 = torch.randn(5, 5)
traced_f3 = torch.jit.trace(f3, (inp1,))
print("traced 3:", test_fns(f3, traced_f3, (inp2,)))

/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py:365: TracerWarning:

Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py:366: TracerWarning:

torch.from_numpy results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.

traced 3: False

TorchScript 脚本化和 FX 跟踪不允许非 PyTorch 函数调用。

try:
    torch.jit.script(f3)
except:
    tb.print_exc()

try:
    torch.fx.symbolic_trace(f3)
except:
    tb.print_exc()

Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py", line 383, in <module>
    torch.jit.script(f3)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/jit/_script.py", line 1395, in script
    fn = torch._C._jit_script_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_jit_internal.py", line 1216, in _try_get_dispatched_fn
    return boolean_dispatched.get(fn)
  File "/opt/conda/envs/py_3.10/lib/python3.10/weakref.py", line 453, in get
    return self.data.get(ref(key),default)
TypeError: cannot create weak reference to 'uarray._Function' object
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py", line 388, in <module>
    torch.fx.symbolic_trace(f3)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 1154, in symbolic_trace
    graph = tracer.trace(root, concrete_args)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 821, in trace
    (self.create_arg(fn(*args)),),
  File "/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py", line 365, in f3
    x = scipy.fft.dct(x.numpy())
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/scipy/fft/_backend.py", line 25, in __ua_function__
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/scipy/fft/_pocketfft/realtransforms.py", line 19, in _r2r
    tmp = _asfarray(x)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/scipy/fft/_pocketfft/helper.py", line 89, in _asfarray
    if x.dtype == np.float16:
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/proxy.py", line 546, in impl
    return tracer.create_proxy('call_function', target, args, kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/proxy.py", line 185, in create_proxy
    args_ = self.create_arg(args)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 389, in create_arg
    return super().create_arg(a)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/proxy.py", line 256, in create_arg
    return type(a)(self.create_arg(elem) for elem in a)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/proxy.py", line 256, in <genexpr>
    return type(a)(self.create_arg(elem) for elem in a)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 389, in create_arg
    return super().create_arg(a)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/proxy.py", line 292, in create_arg
    raise NotImplementedError(f"argument of type: {type(a)}")
NotImplementedError: argument of type: <class 'type'>

相比之下，torch.compile 能够轻松处理非 PyTorch 函数调用。

compile_f3 = torch.compile(f3)
print("compile 3:", test_fns(f3, compile_f3, (inp2,)))

compile 3: True

TorchDynamo 和 FX 图

torch.compile的一个重要组件是 TorchDynamo。TorchDynamo 负责将任意 Python 代码即时编译成FX 图，然后可以进一步优化。TorchDynamo 通过分析 Python 字节码并检测对 PyTorch 操作的调用来提取 FX 图。

通常，torch.compile的另一个组件 TorchInductor 会进一步将 FX 图编译成优化的内核，但 TorchDynamo 允许使用不同的后端。为了检查 TorchDynamo 输出的 FX 图，让我们创建一个自定义后端，输出 FX 图并简单地返回图的未优化前向方法。

from typing import List
def custom_backend(gm: torch.fx.GraphModule, example_inputs: List[torch.Tensor]):
    print("custom backend called with FX graph:")
    gm.graph.print_tabular()
    return gm.forward

# Reset since we are using a different backend.
torch._dynamo.reset()

opt_model = torch.compile(init_model(), backend=custom_backend)
opt_model(generate_data(16)[0])

custom backend called with FX graph:
opcode         name                                               target                                                      args                                                                                                                                                                                                                                                                                                                                                                                                                                                                kwargs
-------------  -------------------------------------------------  ----------------------------------------------------------  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  -----------------
placeholder    l_x_                                               L_x_                                                        ()                                                                                                                                                                                                                                                                                                                                                                                                                                                                  {}
call_module    l__self___features_conv0                           L__self___features_conv0                                    (l_x_,)                                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_norm0                           L__self___features_norm0                                    (l__self___features_conv0,)                                                                                                                                                                                                                                                                                                                                                                                                                                         {}
call_module    l__self___features_relu0                           L__self___features_relu0                                    (l__self___features_norm0,)                                                                                                                                                                                                                                                                                                                                                                                                                                         {}
call_module    l__self___features_pool0                           L__self___features_pool0                                    (l__self___features_relu0,)                                                                                                                                                                                                                                                                                                                                                                                                                                         {}
call_function  concated_features                                  <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_pool0], 1)                                                                                                                                                                                                                                                                                                                                                                                                                                     {}
call_module    l__self___features_denseblock1_denselayer1_norm1   L__self___features_denseblock1_denselayer1_norm1            (concated_features,)                                                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock1_denselayer1_relu1   L__self___features_denseblock1_denselayer1_relu1            (l__self___features_denseblock1_denselayer1_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output                                  L__self___features_denseblock1_denselayer1_conv1            (l__self___features_denseblock1_denselayer1_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock1_denselayer1_norm2   L__self___features_denseblock1_denselayer1_norm2            (bottleneck_output,)                                                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock1_denselayer1_relu2   L__self___features_denseblock1_denselayer1_relu2            (l__self___features_denseblock1_denselayer1_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features                                       L__self___features_denseblock1_denselayer1_conv2            (l__self___features_denseblock1_denselayer1_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_1                                <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_pool0, new_features], 1)                                                                                                                                                                                                                                                                                                                                                                                                                       {}
call_module    l__self___features_denseblock1_denselayer2_norm1   L__self___features_denseblock1_denselayer2_norm1            (concated_features_1,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock1_denselayer2_relu1   L__self___features_denseblock1_denselayer2_relu1            (l__self___features_denseblock1_denselayer2_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_2                                L__self___features_denseblock1_denselayer2_conv1            (l__self___features_denseblock1_denselayer2_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock1_denselayer2_norm2   L__self___features_denseblock1_denselayer2_norm2            (bottleneck_output_2,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock1_denselayer2_relu2   L__self___features_denseblock1_denselayer2_relu2            (l__self___features_denseblock1_denselayer2_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_2                                     L__self___features_denseblock1_denselayer2_conv2            (l__self___features_denseblock1_denselayer2_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_2                                <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_pool0, new_features, new_features_2], 1)                                                                                                                                                                                                                                                                                                                                                                                                       {}
call_module    l__self___features_denseblock1_denselayer3_norm1   L__self___features_denseblock1_denselayer3_norm1            (concated_features_2,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock1_denselayer3_relu1   L__self___features_denseblock1_denselayer3_relu1            (l__self___features_denseblock1_denselayer3_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_4                                L__self___features_denseblock1_denselayer3_conv1            (l__self___features_denseblock1_denselayer3_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock1_denselayer3_norm2   L__self___features_denseblock1_denselayer3_norm2            (bottleneck_output_4,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock1_denselayer3_relu2   L__self___features_denseblock1_denselayer3_relu2            (l__self___features_denseblock1_denselayer3_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_4                                     L__self___features_denseblock1_denselayer3_conv2            (l__self___features_denseblock1_denselayer3_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_3                                <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_pool0, new_features, new_features_2, new_features_4], 1)                                                                                                                                                                                                                                                                                                                                                                                       {}
call_module    l__self___features_denseblock1_denselayer4_norm1   L__self___features_denseblock1_denselayer4_norm1            (concated_features_3,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock1_denselayer4_relu1   L__self___features_denseblock1_denselayer4_relu1            (l__self___features_denseblock1_denselayer4_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_6                                L__self___features_denseblock1_denselayer4_conv1            (l__self___features_denseblock1_denselayer4_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock1_denselayer4_norm2   L__self___features_denseblock1_denselayer4_norm2            (bottleneck_output_6,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock1_denselayer4_relu2   L__self___features_denseblock1_denselayer4_relu2            (l__self___features_denseblock1_denselayer4_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_6                                     L__self___features_denseblock1_denselayer4_conv2            (l__self___features_denseblock1_denselayer4_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_4                                <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_pool0, new_features, new_features_2, new_features_4, new_features_6], 1)                                                                                                                                                                                                                                                                                                                                                                       {}
call_module    l__self___features_denseblock1_denselayer5_norm1   L__self___features_denseblock1_denselayer5_norm1            (concated_features_4,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock1_denselayer5_relu1   L__self___features_denseblock1_denselayer5_relu1            (l__self___features_denseblock1_denselayer5_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_8                                L__self___features_denseblock1_denselayer5_conv1            (l__self___features_denseblock1_denselayer5_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock1_denselayer5_norm2   L__self___features_denseblock1_denselayer5_norm2            (bottleneck_output_8,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock1_denselayer5_relu2   L__self___features_denseblock1_denselayer5_relu2            (l__self___features_denseblock1_denselayer5_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_8                                     L__self___features_denseblock1_denselayer5_conv2            (l__self___features_denseblock1_denselayer5_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_5                                <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_pool0, new_features, new_features_2, new_features_4, new_features_6, new_features_8], 1)                                                                                                                                                                                                                                                                                                                                                       {}
call_module    l__self___features_denseblock1_denselayer6_norm1   L__self___features_denseblock1_denselayer6_norm1            (concated_features_5,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock1_denselayer6_relu1   L__self___features_denseblock1_denselayer6_relu1            (l__self___features_denseblock1_denselayer6_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_10                               L__self___features_denseblock1_denselayer6_conv1            (l__self___features_denseblock1_denselayer6_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock1_denselayer6_norm2   L__self___features_denseblock1_denselayer6_norm2            (bottleneck_output_10,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock1_denselayer6_relu2   L__self___features_denseblock1_denselayer6_relu2            (l__self___features_denseblock1_denselayer6_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_10                                    L__self___features_denseblock1_denselayer6_conv2            (l__self___features_denseblock1_denselayer6_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  cat_6                                              <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_pool0, new_features, new_features_2, new_features_4, new_features_6, new_features_8, new_features_10], 1)                                                                                                                                                                                                                                                                                                                                      {}
call_module    l__self___features_transition1_norm                L__self___features_transition1_norm                         (cat_6,)                                                                                                                                                                                                                                                                                                                                                                                                                                                            {}
call_module    l__self___features_transition1_relu                L__self___features_transition1_relu                         (l__self___features_transition1_norm,)                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_transition1_conv                L__self___features_transition1_conv                         (l__self___features_transition1_relu,)                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_transition1_pool                L__self___features_transition1_pool                         (l__self___features_transition1_conv,)                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_function  concated_features_6                                <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool], 1)                                                                                                                                                                                                                                                                                                                                                                                                                          {}
call_module    l__self___features_denseblock2_denselayer1_norm1   L__self___features_denseblock2_denselayer1_norm1            (concated_features_6,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock2_denselayer1_relu1   L__self___features_denseblock2_denselayer1_relu1            (l__self___features_denseblock2_denselayer1_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_12                               L__self___features_denseblock2_denselayer1_conv1            (l__self___features_denseblock2_denselayer1_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock2_denselayer1_norm2   L__self___features_denseblock2_denselayer1_norm2            (bottleneck_output_12,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer1_relu2   L__self___features_denseblock2_denselayer1_relu2            (l__self___features_denseblock2_denselayer1_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_12                                    L__self___features_denseblock2_denselayer1_conv2            (l__self___features_denseblock2_denselayer1_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_7                                <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12], 1)                                                                                                                                                                                                                                                                                                                                                                                                         {}
call_module    l__self___features_denseblock2_denselayer2_norm1   L__self___features_denseblock2_denselayer2_norm1            (concated_features_7,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock2_denselayer2_relu1   L__self___features_denseblock2_denselayer2_relu1            (l__self___features_denseblock2_denselayer2_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_14                               L__self___features_denseblock2_denselayer2_conv1            (l__self___features_denseblock2_denselayer2_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock2_denselayer2_norm2   L__self___features_denseblock2_denselayer2_norm2            (bottleneck_output_14,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer2_relu2   L__self___features_denseblock2_denselayer2_relu2            (l__self___features_denseblock2_denselayer2_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_14                                    L__self___features_denseblock2_denselayer2_conv2            (l__self___features_denseblock2_denselayer2_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_8                                <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12, new_features_14], 1)                                                                                                                                                                                                                                                                                                                                                                                        {}
call_module    l__self___features_denseblock2_denselayer3_norm1   L__self___features_denseblock2_denselayer3_norm1            (concated_features_8,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock2_denselayer3_relu1   L__self___features_denseblock2_denselayer3_relu1            (l__self___features_denseblock2_denselayer3_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_16                               L__self___features_denseblock2_denselayer3_conv1            (l__self___features_denseblock2_denselayer3_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock2_denselayer3_norm2   L__self___features_denseblock2_denselayer3_norm2            (bottleneck_output_16,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer3_relu2   L__self___features_denseblock2_denselayer3_relu2            (l__self___features_denseblock2_denselayer3_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_16                                    L__self___features_denseblock2_denselayer3_conv2            (l__self___features_denseblock2_denselayer3_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_9                                <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12, new_features_14, new_features_16], 1)                                                                                                                                                                                                                                                                                                                                                                       {}
call_module    l__self___features_denseblock2_denselayer4_norm1   L__self___features_denseblock2_denselayer4_norm1            (concated_features_9,)                                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock2_denselayer4_relu1   L__self___features_denseblock2_denselayer4_relu1            (l__self___features_denseblock2_denselayer4_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_18                               L__self___features_denseblock2_denselayer4_conv1            (l__self___features_denseblock2_denselayer4_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock2_denselayer4_norm2   L__self___features_denseblock2_denselayer4_norm2            (bottleneck_output_18,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer4_relu2   L__self___features_denseblock2_denselayer4_relu2            (l__self___features_denseblock2_denselayer4_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_18                                    L__self___features_denseblock2_denselayer4_conv2            (l__self___features_denseblock2_denselayer4_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_10                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12, new_features_14, new_features_16, new_features_18], 1)                                                                                                                                                                                                                                                                                                                                                      {}
call_module    l__self___features_denseblock2_denselayer5_norm1   L__self___features_denseblock2_denselayer5_norm1            (concated_features_10,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer5_relu1   L__self___features_denseblock2_denselayer5_relu1            (l__self___features_denseblock2_denselayer5_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_20                               L__self___features_denseblock2_denselayer5_conv1            (l__self___features_denseblock2_denselayer5_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock2_denselayer5_norm2   L__self___features_denseblock2_denselayer5_norm2            (bottleneck_output_20,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer5_relu2   L__self___features_denseblock2_denselayer5_relu2            (l__self___features_denseblock2_denselayer5_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_20                                    L__self___features_denseblock2_denselayer5_conv2            (l__self___features_denseblock2_denselayer5_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_11                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12, new_features_14, new_features_16, new_features_18, new_features_20], 1)                                                                                                                                                                                                                                                                                                                                     {}
call_module    l__self___features_denseblock2_denselayer6_norm1   L__self___features_denseblock2_denselayer6_norm1            (concated_features_11,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer6_relu1   L__self___features_denseblock2_denselayer6_relu1            (l__self___features_denseblock2_denselayer6_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_22                               L__self___features_denseblock2_denselayer6_conv1            (l__self___features_denseblock2_denselayer6_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock2_denselayer6_norm2   L__self___features_denseblock2_denselayer6_norm2            (bottleneck_output_22,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer6_relu2   L__self___features_denseblock2_denselayer6_relu2            (l__self___features_denseblock2_denselayer6_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_22                                    L__self___features_denseblock2_denselayer6_conv2            (l__self___features_denseblock2_denselayer6_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_12                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12, new_features_14, new_features_16, new_features_18, new_features_20, new_features_22], 1)                                                                                                                                                                                                                                                                                                                    {}
call_module    l__self___features_denseblock2_denselayer7_norm1   L__self___features_denseblock2_denselayer7_norm1            (concated_features_12,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer7_relu1   L__self___features_denseblock2_denselayer7_relu1            (l__self___features_denseblock2_denselayer7_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_24                               L__self___features_denseblock2_denselayer7_conv1            (l__self___features_denseblock2_denselayer7_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock2_denselayer7_norm2   L__self___features_denseblock2_denselayer7_norm2            (bottleneck_output_24,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer7_relu2   L__self___features_denseblock2_denselayer7_relu2            (l__self___features_denseblock2_denselayer7_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_24                                    L__self___features_denseblock2_denselayer7_conv2            (l__self___features_denseblock2_denselayer7_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_13                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12, new_features_14, new_features_16, new_features_18, new_features_20, new_features_22, new_features_24], 1)                                                                                                                                                                                                                                                                                                   {}
call_module    l__self___features_denseblock2_denselayer8_norm1   L__self___features_denseblock2_denselayer8_norm1            (concated_features_13,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer8_relu1   L__self___features_denseblock2_denselayer8_relu1            (l__self___features_denseblock2_denselayer8_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_26                               L__self___features_denseblock2_denselayer8_conv1            (l__self___features_denseblock2_denselayer8_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock2_denselayer8_norm2   L__self___features_denseblock2_denselayer8_norm2            (bottleneck_output_26,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer8_relu2   L__self___features_denseblock2_denselayer8_relu2            (l__self___features_denseblock2_denselayer8_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_26                                    L__self___features_denseblock2_denselayer8_conv2            (l__self___features_denseblock2_denselayer8_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_14                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12, new_features_14, new_features_16, new_features_18, new_features_20, new_features_22, new_features_24, new_features_26], 1)                                                                                                                                                                                                                                                                                  {}
call_module    l__self___features_denseblock2_denselayer9_norm1   L__self___features_denseblock2_denselayer9_norm1            (concated_features_14,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer9_relu1   L__self___features_denseblock2_denselayer9_relu1            (l__self___features_denseblock2_denselayer9_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_28                               L__self___features_denseblock2_denselayer9_conv1            (l__self___features_denseblock2_denselayer9_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock2_denselayer9_norm2   L__self___features_denseblock2_denselayer9_norm2            (bottleneck_output_28,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer9_relu2   L__self___features_denseblock2_denselayer9_relu2            (l__self___features_denseblock2_denselayer9_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_28                                    L__self___features_denseblock2_denselayer9_conv2            (l__self___features_denseblock2_denselayer9_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_15                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12, new_features_14, new_features_16, new_features_18, new_features_20, new_features_22, new_features_24, new_features_26, new_features_28], 1)                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock2_denselayer10_norm1  L__self___features_denseblock2_denselayer10_norm1           (concated_features_15,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer10_relu1  L__self___features_denseblock2_denselayer10_relu1           (l__self___features_denseblock2_denselayer10_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_30                               L__self___features_denseblock2_denselayer10_conv1           (l__self___features_denseblock2_denselayer10_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock2_denselayer10_norm2  L__self___features_denseblock2_denselayer10_norm2           (bottleneck_output_30,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer10_relu2  L__self___features_denseblock2_denselayer10_relu2           (l__self___features_denseblock2_denselayer10_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_30                                    L__self___features_denseblock2_denselayer10_conv2           (l__self___features_denseblock2_denselayer10_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_16                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12, new_features_14, new_features_16, new_features_18, new_features_20, new_features_22, new_features_24, new_features_26, new_features_28, new_features_30], 1)                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock2_denselayer11_norm1  L__self___features_denseblock2_denselayer11_norm1           (concated_features_16,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer11_relu1  L__self___features_denseblock2_denselayer11_relu1           (l__self___features_denseblock2_denselayer11_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_32                               L__self___features_denseblock2_denselayer11_conv1           (l__self___features_denseblock2_denselayer11_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock2_denselayer11_norm2  L__self___features_denseblock2_denselayer11_norm2           (bottleneck_output_32,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer11_relu2  L__self___features_denseblock2_denselayer11_relu2           (l__self___features_denseblock2_denselayer11_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_32                                    L__self___features_denseblock2_denselayer11_conv2           (l__self___features_denseblock2_denselayer11_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_17                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12, new_features_14, new_features_16, new_features_18, new_features_20, new_features_22, new_features_24, new_features_26, new_features_28, new_features_30, new_features_32], 1)                                                                                                                                                                                                                               {}
call_module    l__self___features_denseblock2_denselayer12_norm1  L__self___features_denseblock2_denselayer12_norm1           (concated_features_17,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer12_relu1  L__self___features_denseblock2_denselayer12_relu1           (l__self___features_denseblock2_denselayer12_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_34                               L__self___features_denseblock2_denselayer12_conv1           (l__self___features_denseblock2_denselayer12_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock2_denselayer12_norm2  L__self___features_denseblock2_denselayer12_norm2           (bottleneck_output_34,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock2_denselayer12_relu2  L__self___features_denseblock2_denselayer12_relu2           (l__self___features_denseblock2_denselayer12_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_34                                    L__self___features_denseblock2_denselayer12_conv2           (l__self___features_denseblock2_denselayer12_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  cat_19                                             <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition1_pool, new_features_12, new_features_14, new_features_16, new_features_18, new_features_20, new_features_22, new_features_24, new_features_26, new_features_28, new_features_30, new_features_32, new_features_34], 1)                                                                                                                                                                                                              {}
call_module    l__self___features_transition2_norm                L__self___features_transition2_norm                         (cat_19,)                                                                                                                                                                                                                                                                                                                                                                                                                                                           {}
call_module    l__self___features_transition2_relu                L__self___features_transition2_relu                         (l__self___features_transition2_norm,)                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_transition2_conv                L__self___features_transition2_conv                         (l__self___features_transition2_relu,)                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_transition2_pool                L__self___features_transition2_pool                         (l__self___features_transition2_conv,)                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_function  concated_features_18                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool], 1)                                                                                                                                                                                                                                                                                                                                                                                                                          {}
call_module    l__self___features_denseblock3_denselayer1_norm1   L__self___features_denseblock3_denselayer1_norm1            (concated_features_18,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer1_relu1   L__self___features_denseblock3_denselayer1_relu1            (l__self___features_denseblock3_denselayer1_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_36                               L__self___features_denseblock3_denselayer1_conv1            (l__self___features_denseblock3_denselayer1_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock3_denselayer1_norm2   L__self___features_denseblock3_denselayer1_norm2            (bottleneck_output_36,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer1_relu2   L__self___features_denseblock3_denselayer1_relu2            (l__self___features_denseblock3_denselayer1_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_36                                    L__self___features_denseblock3_denselayer1_conv2            (l__self___features_denseblock3_denselayer1_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_19                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36], 1)                                                                                                                                                                                                                                                                                                                                                                                                         {}
call_module    l__self___features_denseblock3_denselayer2_norm1   L__self___features_denseblock3_denselayer2_norm1            (concated_features_19,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer2_relu1   L__self___features_denseblock3_denselayer2_relu1            (l__self___features_denseblock3_denselayer2_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_38                               L__self___features_denseblock3_denselayer2_conv1            (l__self___features_denseblock3_denselayer2_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock3_denselayer2_norm2   L__self___features_denseblock3_denselayer2_norm2            (bottleneck_output_38,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer2_relu2   L__self___features_denseblock3_denselayer2_relu2            (l__self___features_denseblock3_denselayer2_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_38                                    L__self___features_denseblock3_denselayer2_conv2            (l__self___features_denseblock3_denselayer2_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_20                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38], 1)                                                                                                                                                                                                                                                                                                                                                                                        {}
call_module    l__self___features_denseblock3_denselayer3_norm1   L__self___features_denseblock3_denselayer3_norm1            (concated_features_20,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer3_relu1   L__self___features_denseblock3_denselayer3_relu1            (l__self___features_denseblock3_denselayer3_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_40                               L__self___features_denseblock3_denselayer3_conv1            (l__self___features_denseblock3_denselayer3_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock3_denselayer3_norm2   L__self___features_denseblock3_denselayer3_norm2            (bottleneck_output_40,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer3_relu2   L__self___features_denseblock3_denselayer3_relu2            (l__self___features_denseblock3_denselayer3_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_40                                    L__self___features_denseblock3_denselayer3_conv2            (l__self___features_denseblock3_denselayer3_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_21                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40], 1)                                                                                                                                                                                                                                                                                                                                                                       {}
call_module    l__self___features_denseblock3_denselayer4_norm1   L__self___features_denseblock3_denselayer4_norm1            (concated_features_21,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer4_relu1   L__self___features_denseblock3_denselayer4_relu1            (l__self___features_denseblock3_denselayer4_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_42                               L__self___features_denseblock3_denselayer4_conv1            (l__self___features_denseblock3_denselayer4_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock3_denselayer4_norm2   L__self___features_denseblock3_denselayer4_norm2            (bottleneck_output_42,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer4_relu2   L__self___features_denseblock3_denselayer4_relu2            (l__self___features_denseblock3_denselayer4_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_42                                    L__self___features_denseblock3_denselayer4_conv2            (l__self___features_denseblock3_denselayer4_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_22                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42], 1)                                                                                                                                                                                                                                                                                                                                                      {}
call_module    l__self___features_denseblock3_denselayer5_norm1   L__self___features_denseblock3_denselayer5_norm1            (concated_features_22,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer5_relu1   L__self___features_denseblock3_denselayer5_relu1            (l__self___features_denseblock3_denselayer5_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_44                               L__self___features_denseblock3_denselayer5_conv1            (l__self___features_denseblock3_denselayer5_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock3_denselayer5_norm2   L__self___features_denseblock3_denselayer5_norm2            (bottleneck_output_44,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer5_relu2   L__self___features_denseblock3_denselayer5_relu2            (l__self___features_denseblock3_denselayer5_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_44                                    L__self___features_denseblock3_denselayer5_conv2            (l__self___features_denseblock3_denselayer5_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_23                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44], 1)                                                                                                                                                                                                                                                                                                                                     {}
call_module    l__self___features_denseblock3_denselayer6_norm1   L__self___features_denseblock3_denselayer6_norm1            (concated_features_23,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer6_relu1   L__self___features_denseblock3_denselayer6_relu1            (l__self___features_denseblock3_denselayer6_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_46                               L__self___features_denseblock3_denselayer6_conv1            (l__self___features_denseblock3_denselayer6_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock3_denselayer6_norm2   L__self___features_denseblock3_denselayer6_norm2            (bottleneck_output_46,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer6_relu2   L__self___features_denseblock3_denselayer6_relu2            (l__self___features_denseblock3_denselayer6_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_46                                    L__self___features_denseblock3_denselayer6_conv2            (l__self___features_denseblock3_denselayer6_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_24                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46], 1)                                                                                                                                                                                                                                                                                                                    {}
call_module    l__self___features_denseblock3_denselayer7_norm1   L__self___features_denseblock3_denselayer7_norm1            (concated_features_24,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer7_relu1   L__self___features_denseblock3_denselayer7_relu1            (l__self___features_denseblock3_denselayer7_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_48                               L__self___features_denseblock3_denselayer7_conv1            (l__self___features_denseblock3_denselayer7_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock3_denselayer7_norm2   L__self___features_denseblock3_denselayer7_norm2            (bottleneck_output_48,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer7_relu2   L__self___features_denseblock3_denselayer7_relu2            (l__self___features_denseblock3_denselayer7_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_48                                    L__self___features_denseblock3_denselayer7_conv2            (l__self___features_denseblock3_denselayer7_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_25                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48], 1)                                                                                                                                                                                                                                                                                                   {}
call_module    l__self___features_denseblock3_denselayer8_norm1   L__self___features_denseblock3_denselayer8_norm1            (concated_features_25,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer8_relu1   L__self___features_denseblock3_denselayer8_relu1            (l__self___features_denseblock3_denselayer8_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_50                               L__self___features_denseblock3_denselayer8_conv1            (l__self___features_denseblock3_denselayer8_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock3_denselayer8_norm2   L__self___features_denseblock3_denselayer8_norm2            (bottleneck_output_50,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer8_relu2   L__self___features_denseblock3_denselayer8_relu2            (l__self___features_denseblock3_denselayer8_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_50                                    L__self___features_denseblock3_denselayer8_conv2            (l__self___features_denseblock3_denselayer8_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_26                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50], 1)                                                                                                                                                                                                                                                                                  {}
call_module    l__self___features_denseblock3_denselayer9_norm1   L__self___features_denseblock3_denselayer9_norm1            (concated_features_26,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer9_relu1   L__self___features_denseblock3_denselayer9_relu1            (l__self___features_denseblock3_denselayer9_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_52                               L__self___features_denseblock3_denselayer9_conv1            (l__self___features_denseblock3_denselayer9_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock3_denselayer9_norm2   L__self___features_denseblock3_denselayer9_norm2            (bottleneck_output_52,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer9_relu2   L__self___features_denseblock3_denselayer9_relu2            (l__self___features_denseblock3_denselayer9_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_52                                    L__self___features_denseblock3_denselayer9_conv2            (l__self___features_denseblock3_denselayer9_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_27                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52], 1)                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock3_denselayer10_norm1  L__self___features_denseblock3_denselayer10_norm1           (concated_features_27,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer10_relu1  L__self___features_denseblock3_denselayer10_relu1           (l__self___features_denseblock3_denselayer10_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_54                               L__self___features_denseblock3_denselayer10_conv1           (l__self___features_denseblock3_denselayer10_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer10_norm2  L__self___features_denseblock3_denselayer10_norm2           (bottleneck_output_54,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer10_relu2  L__self___features_denseblock3_denselayer10_relu2           (l__self___features_denseblock3_denselayer10_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_54                                    L__self___features_denseblock3_denselayer10_conv2           (l__self___features_denseblock3_denselayer10_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_28                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54], 1)                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer11_norm1  L__self___features_denseblock3_denselayer11_norm1           (concated_features_28,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer11_relu1  L__self___features_denseblock3_denselayer11_relu1           (l__self___features_denseblock3_denselayer11_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_56                               L__self___features_denseblock3_denselayer11_conv1           (l__self___features_denseblock3_denselayer11_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer11_norm2  L__self___features_denseblock3_denselayer11_norm2           (bottleneck_output_56,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer11_relu2  L__self___features_denseblock3_denselayer11_relu2           (l__self___features_denseblock3_denselayer11_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_56                                    L__self___features_denseblock3_denselayer11_conv2           (l__self___features_denseblock3_denselayer11_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_29                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56], 1)                                                                                                                                                                                                                               {}
call_module    l__self___features_denseblock3_denselayer12_norm1  L__self___features_denseblock3_denselayer12_norm1           (concated_features_29,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer12_relu1  L__self___features_denseblock3_denselayer12_relu1           (l__self___features_denseblock3_denselayer12_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_58                               L__self___features_denseblock3_denselayer12_conv1           (l__self___features_denseblock3_denselayer12_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer12_norm2  L__self___features_denseblock3_denselayer12_norm2           (bottleneck_output_58,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer12_relu2  L__self___features_denseblock3_denselayer12_relu2           (l__self___features_denseblock3_denselayer12_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_58                                    L__self___features_denseblock3_denselayer12_conv2           (l__self___features_denseblock3_denselayer12_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_30                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58], 1)                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock3_denselayer13_norm1  L__self___features_denseblock3_denselayer13_norm1           (concated_features_30,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer13_relu1  L__self___features_denseblock3_denselayer13_relu1           (l__self___features_denseblock3_denselayer13_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_60                               L__self___features_denseblock3_denselayer13_conv1           (l__self___features_denseblock3_denselayer13_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer13_norm2  L__self___features_denseblock3_denselayer13_norm2           (bottleneck_output_60,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer13_relu2  L__self___features_denseblock3_denselayer13_relu2           (l__self___features_denseblock3_denselayer13_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_60                                    L__self___features_denseblock3_denselayer13_conv2           (l__self___features_denseblock3_denselayer13_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_31                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60], 1)                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer14_norm1  L__self___features_denseblock3_denselayer14_norm1           (concated_features_31,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer14_relu1  L__self___features_denseblock3_denselayer14_relu1           (l__self___features_denseblock3_denselayer14_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_62                               L__self___features_denseblock3_denselayer14_conv1           (l__self___features_denseblock3_denselayer14_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer14_norm2  L__self___features_denseblock3_denselayer14_norm2           (bottleneck_output_62,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer14_relu2  L__self___features_denseblock3_denselayer14_relu2           (l__self___features_denseblock3_denselayer14_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_62                                    L__self___features_denseblock3_denselayer14_conv2           (l__self___features_denseblock3_denselayer14_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_32                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60, new_features_62], 1)                                                                                                                                                                            {}
call_module    l__self___features_denseblock3_denselayer15_norm1  L__self___features_denseblock3_denselayer15_norm1           (concated_features_32,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer15_relu1  L__self___features_denseblock3_denselayer15_relu1           (l__self___features_denseblock3_denselayer15_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_64                               L__self___features_denseblock3_denselayer15_conv1           (l__self___features_denseblock3_denselayer15_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer15_norm2  L__self___features_denseblock3_denselayer15_norm2           (bottleneck_output_64,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer15_relu2  L__self___features_denseblock3_denselayer15_relu2           (l__self___features_denseblock3_denselayer15_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_64                                    L__self___features_denseblock3_denselayer15_conv2           (l__self___features_denseblock3_denselayer15_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_33                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60, new_features_62, new_features_64], 1)                                                                                                                                                           {}
call_module    l__self___features_denseblock3_denselayer16_norm1  L__self___features_denseblock3_denselayer16_norm1           (concated_features_33,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer16_relu1  L__self___features_denseblock3_denselayer16_relu1           (l__self___features_denseblock3_denselayer16_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_66                               L__self___features_denseblock3_denselayer16_conv1           (l__self___features_denseblock3_denselayer16_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer16_norm2  L__self___features_denseblock3_denselayer16_norm2           (bottleneck_output_66,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer16_relu2  L__self___features_denseblock3_denselayer16_relu2           (l__self___features_denseblock3_denselayer16_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_66                                    L__self___features_denseblock3_denselayer16_conv2           (l__self___features_denseblock3_denselayer16_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_34                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60, new_features_62, new_features_64, new_features_66], 1)                                                                                                                                          {}
call_module    l__self___features_denseblock3_denselayer17_norm1  L__self___features_denseblock3_denselayer17_norm1           (concated_features_34,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer17_relu1  L__self___features_denseblock3_denselayer17_relu1           (l__self___features_denseblock3_denselayer17_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_68                               L__self___features_denseblock3_denselayer17_conv1           (l__self___features_denseblock3_denselayer17_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer17_norm2  L__self___features_denseblock3_denselayer17_norm2           (bottleneck_output_68,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer17_relu2  L__self___features_denseblock3_denselayer17_relu2           (l__self___features_denseblock3_denselayer17_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_68                                    L__self___features_denseblock3_denselayer17_conv2           (l__self___features_denseblock3_denselayer17_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_35                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60, new_features_62, new_features_64, new_features_66, new_features_68], 1)                                                                                                                         {}
call_module    l__self___features_denseblock3_denselayer18_norm1  L__self___features_denseblock3_denselayer18_norm1           (concated_features_35,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer18_relu1  L__self___features_denseblock3_denselayer18_relu1           (l__self___features_denseblock3_denselayer18_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_70                               L__self___features_denseblock3_denselayer18_conv1           (l__self___features_denseblock3_denselayer18_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer18_norm2  L__self___features_denseblock3_denselayer18_norm2           (bottleneck_output_70,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer18_relu2  L__self___features_denseblock3_denselayer18_relu2           (l__self___features_denseblock3_denselayer18_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_70                                    L__self___features_denseblock3_denselayer18_conv2           (l__self___features_denseblock3_denselayer18_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_36                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60, new_features_62, new_features_64, new_features_66, new_features_68, new_features_70], 1)                                                                                                        {}
call_module    l__self___features_denseblock3_denselayer19_norm1  L__self___features_denseblock3_denselayer19_norm1           (concated_features_36,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer19_relu1  L__self___features_denseblock3_denselayer19_relu1           (l__self___features_denseblock3_denselayer19_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_72                               L__self___features_denseblock3_denselayer19_conv1           (l__self___features_denseblock3_denselayer19_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer19_norm2  L__self___features_denseblock3_denselayer19_norm2           (bottleneck_output_72,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer19_relu2  L__self___features_denseblock3_denselayer19_relu2           (l__self___features_denseblock3_denselayer19_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_72                                    L__self___features_denseblock3_denselayer19_conv2           (l__self___features_denseblock3_denselayer19_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_37                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60, new_features_62, new_features_64, new_features_66, new_features_68, new_features_70, new_features_72], 1)                                                                                       {}
call_module    l__self___features_denseblock3_denselayer20_norm1  L__self___features_denseblock3_denselayer20_norm1           (concated_features_37,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer20_relu1  L__self___features_denseblock3_denselayer20_relu1           (l__self___features_denseblock3_denselayer20_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_74                               L__self___features_denseblock3_denselayer20_conv1           (l__self___features_denseblock3_denselayer20_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer20_norm2  L__self___features_denseblock3_denselayer20_norm2           (bottleneck_output_74,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer20_relu2  L__self___features_denseblock3_denselayer20_relu2           (l__self___features_denseblock3_denselayer20_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_74                                    L__self___features_denseblock3_denselayer20_conv2           (l__self___features_denseblock3_denselayer20_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_38                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60, new_features_62, new_features_64, new_features_66, new_features_68, new_features_70, new_features_72, new_features_74], 1)                                                                      {}
call_module    l__self___features_denseblock3_denselayer21_norm1  L__self___features_denseblock3_denselayer21_norm1           (concated_features_38,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer21_relu1  L__self___features_denseblock3_denselayer21_relu1           (l__self___features_denseblock3_denselayer21_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_76                               L__self___features_denseblock3_denselayer21_conv1           (l__self___features_denseblock3_denselayer21_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer21_norm2  L__self___features_denseblock3_denselayer21_norm2           (bottleneck_output_76,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer21_relu2  L__self___features_denseblock3_denselayer21_relu2           (l__self___features_denseblock3_denselayer21_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_76                                    L__self___features_denseblock3_denselayer21_conv2           (l__self___features_denseblock3_denselayer21_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_39                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60, new_features_62, new_features_64, new_features_66, new_features_68, new_features_70, new_features_72, new_features_74, new_features_76], 1)                                                     {}
call_module    l__self___features_denseblock3_denselayer22_norm1  L__self___features_denseblock3_denselayer22_norm1           (concated_features_39,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer22_relu1  L__self___features_denseblock3_denselayer22_relu1           (l__self___features_denseblock3_denselayer22_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_78                               L__self___features_denseblock3_denselayer22_conv1           (l__self___features_denseblock3_denselayer22_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer22_norm2  L__self___features_denseblock3_denselayer22_norm2           (bottleneck_output_78,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer22_relu2  L__self___features_denseblock3_denselayer22_relu2           (l__self___features_denseblock3_denselayer22_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_78                                    L__self___features_denseblock3_denselayer22_conv2           (l__self___features_denseblock3_denselayer22_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_40                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60, new_features_62, new_features_64, new_features_66, new_features_68, new_features_70, new_features_72, new_features_74, new_features_76, new_features_78], 1)                                    {}
call_module    l__self___features_denseblock3_denselayer23_norm1  L__self___features_denseblock3_denselayer23_norm1           (concated_features_40,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer23_relu1  L__self___features_denseblock3_denselayer23_relu1           (l__self___features_denseblock3_denselayer23_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_80                               L__self___features_denseblock3_denselayer23_conv1           (l__self___features_denseblock3_denselayer23_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer23_norm2  L__self___features_denseblock3_denselayer23_norm2           (bottleneck_output_80,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer23_relu2  L__self___features_denseblock3_denselayer23_relu2           (l__self___features_denseblock3_denselayer23_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_80                                    L__self___features_denseblock3_denselayer23_conv2           (l__self___features_denseblock3_denselayer23_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_41                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60, new_features_62, new_features_64, new_features_66, new_features_68, new_features_70, new_features_72, new_features_74, new_features_76, new_features_78, new_features_80], 1)                   {}
call_module    l__self___features_denseblock3_denselayer24_norm1  L__self___features_denseblock3_denselayer24_norm1           (concated_features_41,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer24_relu1  L__self___features_denseblock3_denselayer24_relu1           (l__self___features_denseblock3_denselayer24_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_82                               L__self___features_denseblock3_denselayer24_conv1           (l__self___features_denseblock3_denselayer24_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock3_denselayer24_norm2  L__self___features_denseblock3_denselayer24_norm2           (bottleneck_output_82,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock3_denselayer24_relu2  L__self___features_denseblock3_denselayer24_relu2           (l__self___features_denseblock3_denselayer24_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_82                                    L__self___features_denseblock3_denselayer24_conv2           (l__self___features_denseblock3_denselayer24_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  cat_44                                             <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition2_pool, new_features_36, new_features_38, new_features_40, new_features_42, new_features_44, new_features_46, new_features_48, new_features_50, new_features_52, new_features_54, new_features_56, new_features_58, new_features_60, new_features_62, new_features_64, new_features_66, new_features_68, new_features_70, new_features_72, new_features_74, new_features_76, new_features_78, new_features_80, new_features_82], 1)  {}
call_module    l__self___features_transition3_norm                L__self___features_transition3_norm                         (cat_44,)                                                                                                                                                                                                                                                                                                                                                                                                                                                           {}
call_module    l__self___features_transition3_relu                L__self___features_transition3_relu                         (l__self___features_transition3_norm,)                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_transition3_conv                L__self___features_transition3_conv                         (l__self___features_transition3_relu,)                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_module    l__self___features_transition3_pool                L__self___features_transition3_pool                         (l__self___features_transition3_conv,)                                                                                                                                                                                                                                                                                                                                                                                                                              {}
call_function  concated_features_42                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool], 1)                                                                                                                                                                                                                                                                                                                                                                                                                          {}
call_module    l__self___features_denseblock4_denselayer1_norm1   L__self___features_denseblock4_denselayer1_norm1            (concated_features_42,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer1_relu1   L__self___features_denseblock4_denselayer1_relu1            (l__self___features_denseblock4_denselayer1_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_84                               L__self___features_denseblock4_denselayer1_conv1            (l__self___features_denseblock4_denselayer1_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock4_denselayer1_norm2   L__self___features_denseblock4_denselayer1_norm2            (bottleneck_output_84,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer1_relu2   L__self___features_denseblock4_denselayer1_relu2            (l__self___features_denseblock4_denselayer1_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_84                                    L__self___features_denseblock4_denselayer1_conv2            (l__self___features_denseblock4_denselayer1_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_43                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84], 1)                                                                                                                                                                                                                                                                                                                                                                                                         {}
call_module    l__self___features_denseblock4_denselayer2_norm1   L__self___features_denseblock4_denselayer2_norm1            (concated_features_43,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer2_relu1   L__self___features_denseblock4_denselayer2_relu1            (l__self___features_denseblock4_denselayer2_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_86                               L__self___features_denseblock4_denselayer2_conv1            (l__self___features_denseblock4_denselayer2_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock4_denselayer2_norm2   L__self___features_denseblock4_denselayer2_norm2            (bottleneck_output_86,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer2_relu2   L__self___features_denseblock4_denselayer2_relu2            (l__self___features_denseblock4_denselayer2_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_86                                    L__self___features_denseblock4_denselayer2_conv2            (l__self___features_denseblock4_denselayer2_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_44                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86], 1)                                                                                                                                                                                                                                                                                                                                                                                        {}
call_module    l__self___features_denseblock4_denselayer3_norm1   L__self___features_denseblock4_denselayer3_norm1            (concated_features_44,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer3_relu1   L__self___features_denseblock4_denselayer3_relu1            (l__self___features_denseblock4_denselayer3_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_88                               L__self___features_denseblock4_denselayer3_conv1            (l__self___features_denseblock4_denselayer3_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock4_denselayer3_norm2   L__self___features_denseblock4_denselayer3_norm2            (bottleneck_output_88,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer3_relu2   L__self___features_denseblock4_denselayer3_relu2            (l__self___features_denseblock4_denselayer3_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_88                                    L__self___features_denseblock4_denselayer3_conv2            (l__self___features_denseblock4_denselayer3_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_45                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88], 1)                                                                                                                                                                                                                                                                                                                                                                       {}
call_module    l__self___features_denseblock4_denselayer4_norm1   L__self___features_denseblock4_denselayer4_norm1            (concated_features_45,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer4_relu1   L__self___features_denseblock4_denselayer4_relu1            (l__self___features_denseblock4_denselayer4_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_90                               L__self___features_denseblock4_denselayer4_conv1            (l__self___features_denseblock4_denselayer4_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock4_denselayer4_norm2   L__self___features_denseblock4_denselayer4_norm2            (bottleneck_output_90,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer4_relu2   L__self___features_denseblock4_denselayer4_relu2            (l__self___features_denseblock4_denselayer4_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_90                                    L__self___features_denseblock4_denselayer4_conv2            (l__self___features_denseblock4_denselayer4_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_46                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90], 1)                                                                                                                                                                                                                                                                                                                                                      {}
call_module    l__self___features_denseblock4_denselayer5_norm1   L__self___features_denseblock4_denselayer5_norm1            (concated_features_46,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer5_relu1   L__self___features_denseblock4_denselayer5_relu1            (l__self___features_denseblock4_denselayer5_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_92                               L__self___features_denseblock4_denselayer5_conv1            (l__self___features_denseblock4_denselayer5_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock4_denselayer5_norm2   L__self___features_denseblock4_denselayer5_norm2            (bottleneck_output_92,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer5_relu2   L__self___features_denseblock4_denselayer5_relu2            (l__self___features_denseblock4_denselayer5_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_92                                    L__self___features_denseblock4_denselayer5_conv2            (l__self___features_denseblock4_denselayer5_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_47                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92], 1)                                                                                                                                                                                                                                                                                                                                     {}
call_module    l__self___features_denseblock4_denselayer6_norm1   L__self___features_denseblock4_denselayer6_norm1            (concated_features_47,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer6_relu1   L__self___features_denseblock4_denselayer6_relu1            (l__self___features_denseblock4_denselayer6_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_94                               L__self___features_denseblock4_denselayer6_conv1            (l__self___features_denseblock4_denselayer6_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock4_denselayer6_norm2   L__self___features_denseblock4_denselayer6_norm2            (bottleneck_output_94,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer6_relu2   L__self___features_denseblock4_denselayer6_relu2            (l__self___features_denseblock4_denselayer6_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_94                                    L__self___features_denseblock4_denselayer6_conv2            (l__self___features_denseblock4_denselayer6_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_48                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92, new_features_94], 1)                                                                                                                                                                                                                                                                                                                    {}
call_module    l__self___features_denseblock4_denselayer7_norm1   L__self___features_denseblock4_denselayer7_norm1            (concated_features_48,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer7_relu1   L__self___features_denseblock4_denselayer7_relu1            (l__self___features_denseblock4_denselayer7_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_96                               L__self___features_denseblock4_denselayer7_conv1            (l__self___features_denseblock4_denselayer7_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock4_denselayer7_norm2   L__self___features_denseblock4_denselayer7_norm2            (bottleneck_output_96,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer7_relu2   L__self___features_denseblock4_denselayer7_relu2            (l__self___features_denseblock4_denselayer7_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_96                                    L__self___features_denseblock4_denselayer7_conv2            (l__self___features_denseblock4_denselayer7_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_49                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92, new_features_94, new_features_96], 1)                                                                                                                                                                                                                                                                                                   {}
call_module    l__self___features_denseblock4_denselayer8_norm1   L__self___features_denseblock4_denselayer8_norm1            (concated_features_49,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer8_relu1   L__self___features_denseblock4_denselayer8_relu1            (l__self___features_denseblock4_denselayer8_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_98                               L__self___features_denseblock4_denselayer8_conv1            (l__self___features_denseblock4_denselayer8_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock4_denselayer8_norm2   L__self___features_denseblock4_denselayer8_norm2            (bottleneck_output_98,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer8_relu2   L__self___features_denseblock4_denselayer8_relu2            (l__self___features_denseblock4_denselayer8_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_98                                    L__self___features_denseblock4_denselayer8_conv2            (l__self___features_denseblock4_denselayer8_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_50                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92, new_features_94, new_features_96, new_features_98], 1)                                                                                                                                                                                                                                                                                  {}
call_module    l__self___features_denseblock4_denselayer9_norm1   L__self___features_denseblock4_denselayer9_norm1            (concated_features_50,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer9_relu1   L__self___features_denseblock4_denselayer9_relu1            (l__self___features_denseblock4_denselayer9_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    bottleneck_output_100                              L__self___features_denseblock4_denselayer9_conv1            (l__self___features_denseblock4_denselayer9_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    l__self___features_denseblock4_denselayer9_norm2   L__self___features_denseblock4_denselayer9_norm2            (bottleneck_output_100,)                                                                                                                                                                                                                                                                                                                                                                                                                                            {}
call_module    l__self___features_denseblock4_denselayer9_relu2   L__self___features_denseblock4_denselayer9_relu2            (l__self___features_denseblock4_denselayer9_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_module    new_features_100                                   L__self___features_denseblock4_denselayer9_conv2            (l__self___features_denseblock4_denselayer9_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                 {}
call_function  concated_features_51                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92, new_features_94, new_features_96, new_features_98, new_features_100], 1)                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock4_denselayer10_norm1  L__self___features_denseblock4_denselayer10_norm1           (concated_features_51,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer10_relu1  L__self___features_denseblock4_denselayer10_relu1           (l__self___features_denseblock4_denselayer10_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_102                              L__self___features_denseblock4_denselayer10_conv1           (l__self___features_denseblock4_denselayer10_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock4_denselayer10_norm2  L__self___features_denseblock4_denselayer10_norm2           (bottleneck_output_102,)                                                                                                                                                                                                                                                                                                                                                                                                                                            {}
call_module    l__self___features_denseblock4_denselayer10_relu2  L__self___features_denseblock4_denselayer10_relu2           (l__self___features_denseblock4_denselayer10_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_102                                   L__self___features_denseblock4_denselayer10_conv2           (l__self___features_denseblock4_denselayer10_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_52                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92, new_features_94, new_features_96, new_features_98, new_features_100, new_features_102], 1)                                                                                                                                                                                                                                              {}
call_module    l__self___features_denseblock4_denselayer11_norm1  L__self___features_denseblock4_denselayer11_norm1           (concated_features_52,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer11_relu1  L__self___features_denseblock4_denselayer11_relu1           (l__self___features_denseblock4_denselayer11_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_104                              L__self___features_denseblock4_denselayer11_conv1           (l__self___features_denseblock4_denselayer11_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock4_denselayer11_norm2  L__self___features_denseblock4_denselayer11_norm2           (bottleneck_output_104,)                                                                                                                                                                                                                                                                                                                                                                                                                                            {}
call_module    l__self___features_denseblock4_denselayer11_relu2  L__self___features_denseblock4_denselayer11_relu2           (l__self___features_denseblock4_denselayer11_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_104                                   L__self___features_denseblock4_denselayer11_conv2           (l__self___features_denseblock4_denselayer11_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_53                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92, new_features_94, new_features_96, new_features_98, new_features_100, new_features_102, new_features_104], 1)                                                                                                                                                                                                                            {}
call_module    l__self___features_denseblock4_denselayer12_norm1  L__self___features_denseblock4_denselayer12_norm1           (concated_features_53,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer12_relu1  L__self___features_denseblock4_denselayer12_relu1           (l__self___features_denseblock4_denselayer12_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_106                              L__self___features_denseblock4_denselayer12_conv1           (l__self___features_denseblock4_denselayer12_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock4_denselayer12_norm2  L__self___features_denseblock4_denselayer12_norm2           (bottleneck_output_106,)                                                                                                                                                                                                                                                                                                                                                                                                                                            {}
call_module    l__self___features_denseblock4_denselayer12_relu2  L__self___features_denseblock4_denselayer12_relu2           (l__self___features_denseblock4_denselayer12_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_106                                   L__self___features_denseblock4_denselayer12_conv2           (l__self___features_denseblock4_denselayer12_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_54                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92, new_features_94, new_features_96, new_features_98, new_features_100, new_features_102, new_features_104, new_features_106], 1)                                                                                                                                                                                                          {}
call_module    l__self___features_denseblock4_denselayer13_norm1  L__self___features_denseblock4_denselayer13_norm1           (concated_features_54,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer13_relu1  L__self___features_denseblock4_denselayer13_relu1           (l__self___features_denseblock4_denselayer13_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_108                              L__self___features_denseblock4_denselayer13_conv1           (l__self___features_denseblock4_denselayer13_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock4_denselayer13_norm2  L__self___features_denseblock4_denselayer13_norm2           (bottleneck_output_108,)                                                                                                                                                                                                                                                                                                                                                                                                                                            {}
call_module    l__self___features_denseblock4_denselayer13_relu2  L__self___features_denseblock4_denselayer13_relu2           (l__self___features_denseblock4_denselayer13_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_108                                   L__self___features_denseblock4_denselayer13_conv2           (l__self___features_denseblock4_denselayer13_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_55                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92, new_features_94, new_features_96, new_features_98, new_features_100, new_features_102, new_features_104, new_features_106, new_features_108], 1)                                                                                                                                                                                        {}
call_module    l__self___features_denseblock4_denselayer14_norm1  L__self___features_denseblock4_denselayer14_norm1           (concated_features_55,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer14_relu1  L__self___features_denseblock4_denselayer14_relu1           (l__self___features_denseblock4_denselayer14_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_110                              L__self___features_denseblock4_denselayer14_conv1           (l__self___features_denseblock4_denselayer14_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock4_denselayer14_norm2  L__self___features_denseblock4_denselayer14_norm2           (bottleneck_output_110,)                                                                                                                                                                                                                                                                                                                                                                                                                                            {}
call_module    l__self___features_denseblock4_denselayer14_relu2  L__self___features_denseblock4_denselayer14_relu2           (l__self___features_denseblock4_denselayer14_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_110                                   L__self___features_denseblock4_denselayer14_conv2           (l__self___features_denseblock4_denselayer14_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_56                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92, new_features_94, new_features_96, new_features_98, new_features_100, new_features_102, new_features_104, new_features_106, new_features_108, new_features_110], 1)                                                                                                                                                                      {}
call_module    l__self___features_denseblock4_denselayer15_norm1  L__self___features_denseblock4_denselayer15_norm1           (concated_features_56,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer15_relu1  L__self___features_denseblock4_denselayer15_relu1           (l__self___features_denseblock4_denselayer15_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_112                              L__self___features_denseblock4_denselayer15_conv1           (l__self___features_denseblock4_denselayer15_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock4_denselayer15_norm2  L__self___features_denseblock4_denselayer15_norm2           (bottleneck_output_112,)                                                                                                                                                                                                                                                                                                                                                                                                                                            {}
call_module    l__self___features_denseblock4_denselayer15_relu2  L__self___features_denseblock4_denselayer15_relu2           (l__self___features_denseblock4_denselayer15_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_112                                   L__self___features_denseblock4_denselayer15_conv2           (l__self___features_denseblock4_denselayer15_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  concated_features_57                               <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92, new_features_94, new_features_96, new_features_98, new_features_100, new_features_102, new_features_104, new_features_106, new_features_108, new_features_110, new_features_112], 1)                                                                                                                                                    {}
call_module    l__self___features_denseblock4_denselayer16_norm1  L__self___features_denseblock4_denselayer16_norm1           (concated_features_57,)                                                                                                                                                                                                                                                                                                                                                                                                                                             {}
call_module    l__self___features_denseblock4_denselayer16_relu1  L__self___features_denseblock4_denselayer16_relu1           (l__self___features_denseblock4_denselayer16_norm1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    bottleneck_output_114                              L__self___features_denseblock4_denselayer16_conv1           (l__self___features_denseblock4_denselayer16_relu1,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    l__self___features_denseblock4_denselayer16_norm2  L__self___features_denseblock4_denselayer16_norm2           (bottleneck_output_114,)                                                                                                                                                                                                                                                                                                                                                                                                                                            {}
call_module    l__self___features_denseblock4_denselayer16_relu2  L__self___features_denseblock4_denselayer16_relu2           (l__self___features_denseblock4_denselayer16_norm2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_module    new_features_114                                   L__self___features_denseblock4_denselayer16_conv2           (l__self___features_denseblock4_denselayer16_relu2,)                                                                                                                                                                                                                                                                                                                                                                                                                {}
call_function  cat_61                                             <built-in method cat of type object at 0x7fcfe8e34840>      ([l__self___features_transition3_pool, new_features_84, new_features_86, new_features_88, new_features_90, new_features_92, new_features_94, new_features_96, new_features_98, new_features_100, new_features_102, new_features_104, new_features_106, new_features_108, new_features_110, new_features_112, new_features_114], 1)                                                                                                                                  {}
call_module    features                                           L__self___features_norm5                                    (cat_61,)                                                                                                                                                                                                                                                                                                                                                                                                                                                           {}
call_function  out                                                <function relu at 0x7fcf10a65510>                           (features,)                                                                                                                                                                                                                                                                                                                                                                                                                                                         {'inplace': True}
call_function  out_1                                              <function adaptive_avg_pool2d at 0x7fcf10a65000>            (out, (1, 1))                                                                                                                                                                                                                                                                                                                                                                                                                                                       {}
call_function  out_2                                              <built-in method flatten of type object at 0x7fcfe8e34840>  (out_1, 1)                                                                                                                                                                                                                                                                                                                                                                                                                                                          {}
call_module    out_3                                              L__self___classifier                                        (out_2,)                                                                                                                                                                                                                                                                                                                                                                                                                                                            {}
output         output                                             output                                                      ((out_3,),)                                                                                                                                                                                                                                                                                                                                                                                                                                                         {}

tensor([[ 0.0614, -0.4023, -0.2792,  ..., -0.5549,  0.0976, -0.0634],
        [-0.2032, -0.2706, -0.0935,  ..., -0.4815,  0.0758, -0.1038],
        [ 0.0637, -0.3492, -0.1492,  ..., -0.4841,  0.1776, -0.0723],
        ...,
        [-0.1050, -0.3393,  0.0092,  ..., -0.4862,  0.0555, -0.1058],
        [ 0.0018, -0.2431, -0.1656,  ..., -0.5072,  0.0977, -0.1387],
        [ 0.1192, -0.3563, -0.1147,  ..., -0.4839,  0.1770, -0.0659]],
       device='cuda:0', grad_fn=<AddmmBackward0>)

使用我们的自定义后端，我们现在可以看到 TorchDynamo 如何处理数据相关的控制流。考虑下面的函数，其中if b.sum() < 0这一行是数据相关控制流的源头。

def bar(a, b):
    x = a / (torch.abs(a) + 1)
    if b.sum() < 0:
        b = b * -1
    return x * b

opt_bar = torch.compile(bar, backend=custom_backend)
inp1 = torch.randn(10)
inp2 = torch.randn(10)
opt_bar(inp1, inp2)
opt_bar(inp1, -inp2)

custom backend called with FX graph:
opcode         name    target                                                  args         kwargs
-------------  ------  ------------------------------------------------------  -----------  --------
placeholder    l_a_    L_a_                                                    ()           {}
placeholder    l_b_    L_b_                                                    ()           {}
call_function  abs_1   <built-in method abs of type object at 0x7fcfe8e34840>  (l_a_,)      {}
call_function  add     <built-in function add>                                 (abs_1, 1)   {}
call_function  x       <built-in function truediv>                             (l_a_, add)  {}
call_method    sum_1   sum                                                     (l_b_,)      {}
call_function  lt      <built-in function lt>                                  (sum_1, 0)   {}
output         output  output                                                  ((x, lt),)   {}
custom backend called with FX graph:
opcode         name    target                   args          kwargs
-------------  ------  -----------------------  ------------  --------
placeholder    l_x_    L_x_                     ()            {}
placeholder    l_b_    L_b_                     ()            {}
call_function  mul     <built-in function mul>  (l_x_, l_b_)  {}
output         output  output                   ((mul,),)     {}
custom backend called with FX graph:
opcode         name    target                   args         kwargs
-------------  ------  -----------------------  -----------  --------
placeholder    l_b_    L_b_                     ()           {}
placeholder    l_x_    L_x_                     ()           {}
call_function  b       <built-in function mul>  (l_b_, -1)   {}
call_function  mul_1   <built-in function mul>  (l_x_, b)    {}
output         output  output                   ((mul_1,),)  {}

tensor([-0.0176,  1.0753,  0.0282,  0.0756, -0.0176,  0.0633, -0.9161,  0.1333,
        -0.1971, -0.3406])

输出显示 TorchDynamo 提取了 3 个不同的 FX 图，对应以下代码（顺序可能与上面的输出不同）：

x = a / (torch.abs(a) + 1)
b = b * -1; return x * b
return x * b

当 TorchDynamo 遇到不支持的 Python 特性，比如数据相关的控制流，它会中断计算图，让默认的 Python 解释器处理不支持的代码，然后继续捕获图。

让我们通过示例来探究 TorchDynamo 如何逐步执行bar。如果b.sum() < 0，那么 TorchDynamo 将运行图 1，让 Python 确定条件的结果，然后运行图 2。另一方面，如果not b.sum() < 0，那么 TorchDynamo 将运行图 1，让 Python 确定条件的结果，然后运行图 3。

这突显了 TorchDynamo 与以前的 PyTorch 编译器解决方案之间的主要区别。当遇到不支持的 Python 特性时，以前的解决方案要么引发错误，要么悄悄失败。另一方面，TorchDynamo 会中断计算图。

我们可以通过使用torch._dynamo.explain来查看 TorchDynamo 中断图的位置：

# Reset since we are using a different backend.
torch._dynamo.reset()
explain_output = torch._dynamo.explain(bar)(torch.randn(10), torch.randn(10))
print(explain_output)

Graph Count: 2
Graph Break Count: 1
Op Count: 6
Break Reasons:
  Break Reason 1:
    Reason: generic_jump TensorVariable()
    User Stack:
      <FrameSummary file /var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py, line 434 in bar>
Ops per Graph:
  Ops 1:
    <built-in method abs of type object at 0x7fcfe8e34840>
    <built-in function add>
    <built-in function truediv>
    <built-in function lt>
  Ops 2:
    <built-in function mul>
    <built-in function mul>
Out Guards:
  Guard 1:
    Name: "L['a']"
    Source: local
    Create Function: TENSOR_MATCH
    Guard Types: ['TENSOR_MATCH']
    Code List: ["hasattr(L['a'], '_dynamo_dynamic_indices') == False"]
    Object Weakref: <weakref at 0x7fce93c22110; dead>
    Guarded Class Weakref: <weakref at 0x7fcf10f646d0; to 'torch._C._TensorMeta' at 0x60eec90 (Tensor)>
  Guard 2:
    Name: ''
    Source: shape_env
    Create Function: SHAPE_ENV
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 3:
    Name: ''
    Source: global
    Create Function: GRAD_MODE
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 4:
    Name: ''
    Source: global
    Create Function: CONFIG_HASH_MATCH
    Guard Types: ['CONFIG_HASH_MATCH']
    Code List: ["___compile_config_hash() == 'a9446d0645a24f8e5db15f38d621b2a5'"]
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 5:
    Name: ''
    Source: global
    Create Function: DETERMINISTIC_ALGORITHMS
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 6:
    Name: ''
    Source: global
    Create Function: HAS_GRAPH_BREAK
    Guard Types: ['HAS_GRAPH_BREAK']
    Code List: ['not ___needs_nopython()']
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 7:
    Name: ''
    Source: global
    Create Function: TORCH_FUNCTION_STATE
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 8:
    Name: "L['b']"
    Source: local
    Create Function: TENSOR_MATCH
    Guard Types: ['TENSOR_MATCH']
    Code List: ["hasattr(L['b'], '_dynamo_dynamic_indices') == False"]
    Object Weakref: <weakref at 0x7fce89d67c90; dead>
    Guarded Class Weakref: <weakref at 0x7fcf10f646d0; to 'torch._C._TensorMeta' at 0x60eec90 (Tensor)>
  Guard 9:
    Name: ''
    Source: global
    Create Function: DEFAULT_DEVICE
    Guard Types: ['DEFAULT_DEVICE']
    Code List: ['utils_device.CURRENT_DEVICE == None']
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 10:
    Name: "G['torch']"
    Source: global
    Create Function: FUNCTION_MATCH
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 11:
    Name: ''
    Source: global
    Create Function: BACKEND_MATCH
    Guard Types: ['BACKEND_MATCH']
    Code List: ['(___skip_backend_check() or ___current_backend() == ___lookup_backend(140524583700368))']
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 12:
    Name: ''
    Source: shape_env
    Create Function: SHAPE_ENV
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 13:
    Name: ''
    Source: global
    Create Function: GRAD_MODE
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 14:
    Name: ''
    Source: global
    Create Function: CONFIG_HASH_MATCH
    Guard Types: ['CONFIG_HASH_MATCH']
    Code List: ["___compile_config_hash() == 'a9446d0645a24f8e5db15f38d621b2a5'"]
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 15:
    Name: "L['b']"
    Source: local
    Create Function: TENSOR_MATCH
    Guard Types: ['TENSOR_MATCH']
    Code List: ["hasattr(L['b'], '_dynamo_dynamic_indices') == False"]
    Object Weakref: <weakref at 0x7fce89d67c90; dead>
    Guarded Class Weakref: <weakref at 0x7fcf10f646d0; to 'torch._C._TensorMeta' at 0x60eec90 (Tensor)>
  Guard 16:
    Name: ''
    Source: global
    Create Function: DETERMINISTIC_ALGORITHMS
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 17:
    Name: ''
    Source: global
    Create Function: TORCH_FUNCTION_STATE
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 18:
    Name: ''
    Source: global
    Create Function: DEFAULT_DEVICE
    Guard Types: ['DEFAULT_DEVICE']
    Code List: ['utils_device.CURRENT_DEVICE == None']
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 19:
    Name: ''
    Source: global
    Create Function: BACKEND_MATCH
    Guard Types: ['BACKEND_MATCH']
    Code List: ['(___skip_backend_check() or ___current_backend() == ___lookup_backend(140524583700368))']
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 20:
    Name: "L['x']"
    Source: local
    Create Function: TENSOR_MATCH
    Guard Types: ['TENSOR_MATCH']
    Code List: ["hasattr(L['x'], '_dynamo_dynamic_indices') == False"]
    Object Weakref: <weakref at 0x7fce8830e1b0; dead>
    Guarded Class Weakref: <weakref at 0x7fcf10f646d0; to 'torch._C._TensorMeta' at 0x60eec90 (Tensor)>
Compile Times: TorchDynamo compilation metrics:
Function                         Runtimes (s)
-------------------------------  --------------
_compile.<locals>.compile_inner  0.0143, 0.0078
OutputGraph.call_user_compiler   0.0010, 0.0000

为了最大化加速，图中断应该受到限制。我们可以通过使用fullgraph=True来强制 TorchDynamo 在遇到第一个图中断时引发错误：

opt_bar = torch.compile(bar, fullgraph=True)
try:
    opt_bar(torch.randn(10), torch.randn(10))
except:
    tb.print_exc()

Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py", line 482, in <module>
    opt_bar(torch.randn(10), torch.randn(10))
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 655, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 383, in _convert_frame_assert
    compiled_product = _compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 646, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 244, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 562, in compile_inner
    out_code = transform_code_object(code, transform)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1033, in transform_code_object
    transformations(instructions, code_options)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 151, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 527, in transform
    tracer.run()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2128, in run
    super().run()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 818, in run
    and self.step()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 781, in step
    getattr(self, inst.opname)(inst)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 445, in inner
    raise exc.UserError(
torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment. Please use functorch.experimental.control_flow.cond to explicitly capture the control flow. For more information about this error, see: https://pytorch.org/docs/main/generated/exportdb/index.html#cond-operands

from user code:
   File "/var/lib/jenkins/workspace/intermediate_source/torch_compile_tutorial.py", line 434, in bar
    if b.sum() < 0:

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

在下面，我们演示了 TorchDynamo 在我们用于演示加速的模型上不会中断图。

opt_model = torch.compile(init_model(), fullgraph=True)
print(opt_model(generate_data(16)[0]))

tensor([[ 0.1346,  0.1981, -0.2058,  ...,  0.1868, -0.1626, -0.1644],
        [ 0.2679,  0.2347, -0.1904,  ...,  0.2167, -0.0060,  0.0307],
        [ 0.0166,  0.2182, -0.1113,  ...,  0.1708, -0.1683, -0.0637],
        ...,
        [ 0.0808,  0.2680, -0.1887,  ...,  0.0378, -0.2078, -0.1444],
        [-0.0211,  0.0857, -0.2459,  ...,  0.1863, -0.1282, -0.0283],
        [-0.0388,  0.0728, -0.1961,  ...,  0.0860, -0.2200, -0.1485]],
       device='cuda:0', grad_fn=<CompiledFunctionBackward>)

我们可以使用torch.export（从 PyTorch 2.1+开始）从输入的 PyTorch 程序中提取一个可导出的 FX 图。导出的图旨在在不同（即无 Python）环境上运行。一个重要的限制是torch.export不支持图断点。请查看此教程了解更多关于torch.export的详细信息。

结论

在本教程中，我们介绍了torch.compile，涵盖了基本用法，演示了相对于急切模式的加速效果，与之前的 PyTorch 编译器解决方案进行了比较，并简要调查了 TorchDynamo 及其与 FX 图的交互。我们希望您会尝试使用torch.compile！

脚本的总运行时间：（6 分钟 7.888 秒）

下载 Python 源代码：torch_compile_tutorial.py

下载 Jupyter 笔记本：torch_compile_tutorial.ipynb

Sphinx-Gallery 生成的画廊

电感器 CPU 后端调试和性能分析

原文：pytorch.org/tutorials/intermediate/inductor_debug_cpu.html

译者：飞龙

协议：CC BY-NC-SA 4.0

注意

点击这里下载完整示例代码

作者: Xuan Liao, Haozhe Zhu, Jiong Gong, Weihan Wang

概述

PyTorch 2.0 引入了名为torch.compile的编译 API。这一新功能通过默认的电感器后端提供的图级优化，显著加快了急切模式执行的速度。

本教程旨在通过深入研究torch.compile的复杂性，提供有关电感器 CPU 后端调试和性能分析的深入介绍。

同时，您还可以在基本用法、全面的故障排除和 GPU 特定知识（如GPU 性能分析）周围找到与torch.compile相关的教程。

我们将从一个激励示例开始调试，通过演示调试过程来准确定位问题，触发编译问题和准确性问题。

通过启用日志记录并探索生成的底层代码，您可以逐步学习如何缩小失败范围，最终找出根本原因。

接下来，我们将讨论如何对编译后的代码进行性能分析，并通过与急切模式的性能比较详细说明为什么torch.compile可以提供额外的性能提升，与其急切模式相比。

调试

这里有一个简单的示例，使用 Inductor 运行torch.compile并将其结果与急切模式进行比较：

import torch

def foo1(x1, x2):
    a = torch.neg(x1)
    b = torch.maximum(x2, a)
    y = torch.cat([b], dim=0)
    return y

x1 = torch.randint(256, (1, 8), dtype=torch.uint8)
x2 = torch.randint(256, (8390, 8), dtype=torch.uint8)

compiled_foo1 = torch.compile(foo1)
result = compiled_foo1(x1, x2)

在cpp代码生成中，neg的正确实现如下：

def neg1(x):
    return f"decltype({x})(-{x})"

为了演示调试，我们将稍后将函数修改为错误的。

获取更多日志信息

如果您默认运行这个简单示例，将不会提供调试信息。为了获得更多有用的调试和日志信息，通常我们会添加一个TORCH_COMPILE_DEBUG环境变量，如下所示：

TORCH_COMPILE_DEBUG=1  python  xx.py

这将在输出日志中打印更多的调试信息，并且在代码生成过程中生成的中间 IR 也会被转储。您可以在日志中找到转储文件路径，如下所示：

torch._inductor.debug:  [WARNING]  model___20  debug  trace:  /tmp/torchinductor_root/rx/crxfi2ybd7yp5sbj2pnhw33wfhtdw7wumvrobyp5sjvdui5ktjc2.debug

在这个目录中，以下文件被保存用于调试目的：

文件	描述
`fx_graph_runnable.py`	可执行的 FX 图，在分解之后，在模式匹配之前
`fx_graph_transformed.py`	经过模式匹配后的转换后的 FX 图
`ir_post_fusion.txt`	融合前的电感 IR
`ir_pre_fusion.txt`	融合后的电感 IR
`output_code.py`	生成的用于图形的 Python 代码，带有 C++/Triton 内核

请注意，为了更容易调试，fx_graph_runnable.py和output_code.py都是可运行和可编辑的。以下是从文件中提取的代码的主要部分，我们将 C++生成的行与 FX 代码行进行了对应。

fx_graph_runnable:

def forward1(self, arg0_1, arg1_1):
    neg = torch.ops.aten.neg.default(arg0_1);  arg0_1 = None
    maximum = torch.ops.aten.maximum.default(arg1_1, neg);  arg1_1 = neg = None
    clone = torch.ops.aten.clone.default(maximum);  maximum = None
    return (clone,)

output_code中的 C++内核：

from torch._inductor.codecache import AsyncCompile
async_compile = AsyncCompile()

cpp_fused_cat_maximum_neg_0 = async_compile.cpp('''
#include "/tmp/torchinductor_root/gv/cgv6n5aotqjo5w4vknjibhengeycuattfto532hkxpozszcgxr3x.h"
extern "C" void kernel(const unsigned char* in_ptr0,
 const unsigned char* in_ptr1,
 unsigned char* out_ptr0)
{
 {
 #pragma GCC ivdep
 for(long i0=static_cast<long>(0L); i0<static_cast<long>(8390L); i0+=static_cast<long>(1L))
 {
 #pragma GCC ivdep
 for(long i1=static_cast<long>(0L); i1<static_cast<long>(8L); i1+=static_cast<long>(1L))
 {
 auto tmp0 = in_ptr0[static_cast<long>(i1 + (8L*i0))];
 auto tmp1 = in_ptr1[static_cast<long>(i1)];
 // Corresponding FX code line: neg = torch.ops.aten.neg.default(arg0_1);  arg0_1 = None
 auto tmp2 = decltype(tmp1)(-tmp1);
 // Corresponding FX code line: maximum = torch.ops.aten.maximum.default(arg1_1, neg);  arg1_1 = neg = None
 auto tmp3 = max_propagate_nan(tmp0, tmp2);
 // Corresponding FX code line: clone = torch.ops.aten.clone.default(maximum);  maximum = None
 out_ptr0[static_cast<long>(i1 + (8L*i0))] = tmp3;
 }
 }
 }
}''')

确定错误的组件

在遇到错误或准确性问题时，找到错误的一个直接解决方案是缩小问题范围。首先要做的是确定错误发生的组件。幸运的是，通过更改torch.compile的后端就可以简单实现。

代码	描述
`torch.compile(fn, backend="eager")`	启用 Dynamo
`torch.compile(fn, backend="aot_eager")`	启用 Dynamo + AOT Autograd
`torch.compile(fn, backend="inductor")`	启用 Dynamo + AOT Autograd + Inductor

如果模型在将后端设置为eager或aot_eager时可以成功运行，而在inductor时失败，我们可以将失败缩小到 Inductor。

编译错误

正如我们所知，图级优化的演变链是这样的：

torch.neg  (Python)  ->  torch.ops.aten.neg.default  (within  FX  graph)  ->  ops.neg  (within  IR  node)  ->  tmp2  =  -tmp1  (within  C++  kernel)

如果遇到编译错误，说明在输出代码中编译 C++内核时出现了问题。这种类型的错误表明在将 IR 节点降级为输出代码时引入了错误。编译错误的根本原因通常在回溯日志中显示。

例如，neg函数被修改如下：

def neg2(x):
    return f"-{x}"

日志记录显示了以下编译错误，原因相当明确。

 torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
 CppCompileError: C++ compile error
 /tmp/torchinductor_root/xg/cxga5tk3b4lkwoxyigrtocjp5s7vc5cg2ikuscf6bk6pjqip2bhx.cpp: In function ‘void kernel(const unsigned char*, const unsigned char*, unsigned char*)’:
 /tmp/torchinductor_root/xg/cxga5tk3b4lkwoxyigrtocjp5s7vc5cg2ikuscf6bk6pjqip2bhx.cpp:17:57: error: no matching function for call to ‘max_propagate_nan(unsigned char&, int&)’
   17 |                 auto tmp3 = max_propagate_nan(tmp0, tmp2);
        |                                                         ^
 In file included from /tmp/torchinductor_root/xg/cxga5tk3b4lkwoxyigrtocjp5s7vc5cg2ikuscf6bk6pjqip2bhx.cpp:2:
 /tmp/torchinductor_root/gv/cgv6n5aotqjo5w4vknjibhengeycuattfto532hkxpozszcgxr3x.h:27:17: note: candidate: ‘template<class scalar_t> scalar_t max_propagate_nan(scalar_t, scalar_t)’
 27 | inline scalar_t max_propagate_nan(scalar_t a, scalar_t b) {
      |                 ^~~~~~~~~~~~~~~~~
 /tmp/torchinductor_root/gv/cgv6n5aotqjo5w4vknjibhengeycuattfto532hkxpozszcgxr3x.h:27:17: note:   template argument deduction/substitution failed:
/tmp/torchinductor_root/xg/cxga5tk3b4lkwoxyigrtocjp5s7vc5cg2ikuscf6bk6pjqip2bhx.cpp:17:57: note:   deduced conflicting types for parameter ‘scalar_t’ (‘unsigned char’ and ‘int’)
 17 |                 auto tmp3 = max_propagate_nan(tmp0, tmp2);
      |                                                         ^

让我们也看看输出代码和 IR 节点中对应的 C++内核。

C++内核：

include  "/tmp/torchinductor_root/gv/cgv6n5aotqjo5w4vknjibhengeycuattfto532hkxpozszcgxr3x.h"
extern  "C"  void  kernel(const  unsigned  char*  in_ptr0,
  const  unsigned  char*  in_ptr1,
  unsigned  char*  out_ptr0)
{
  {
  #pragma GCC ivdep
  for(long  i0=static_cast<long>(0L);  i0<static_cast<long>(8390L);  i0+=static_cast<long>(1L))
  {
  #pragma GCC ivdep
  for(long  i1=static_cast<long>(0L);  i1<static_cast<long>(8L);  i1+=static_cast<long>(1L))
  {
  auto  tmp0  =  in_ptr0[static_cast<long>(i1  +  (8L*i0))];
  auto  tmp1  =  in_ptr1[static_cast<long>(i1)];
  auto  tmp2  =  -tmp1;
  auto  tmp3  =  max_propagate_nan(tmp0,  tmp2);
  out_ptr0[static_cast<long>(i1  +  (8L*i0))]  =  tmp3;
  }
  }
  }
}

IR 节点：

buf0:  SchedulerNode(ComputedBuffer)
buf0.writes  =  [MemoryDep('buf0',  c0,  {c0:  67120})]
buf0.unmet_dependencies  =  []
buf0.met_dependencies  =
  [  MemoryDep('arg0_1',  c1,  {c0:  8390,  c1:  8}),
  MemoryDep('arg1_1',  c0,  {c0:  67120})]
buf0.users  =  [NodeUser(node=OUTPUT,  can_inplace=False)]
buf0.group.device  =  cpu
buf0.group.iteration  =  ((8390,  8),  ())
buf0.sizes  =  ([8390,  8],  [])
class  buf0_loop_body:
  var_ranges  =  {z0:  8390,  z1:  8}
  index0  =  8*z0  +  z1
  index1  =  z1
  def  body(self,  ops):
  get_index  =  self.get_index('index0')
  load  =  ops.load('arg1_1',  get_index)
  get_index_1  =  self.get_index('index1')
  load_1  =  ops.load('arg0_1',  get_index_1)
  neg  =  ops.neg(load_1)
  maximum  =  ops.maximum(load,  neg)
  get_index_2  =  self.get_index('index0')
  store  =  ops.store('buf0',  get_index_2,  maximum,  None)
  return  store

根据回溯日志，编译错误是由于max_propagate_nan的输入数据类型不一致造成的。通过检查 C++内核，我们知道在执行-后，tmp2不再是long，因为tmp0是long。我们可以在 C++内核中使用ops.neg和ops.maximum分别匹配-和max_propagate_nan。

现在我们成功找到了根本原因，即在cpp代码生成中ops.neg的实现，当执行neg时会悄悄地更改数据类型。

准确性调试

否则，如果模型运行时出现其他错误或准确性问题，可以使用名为Minifier的 PyTorch 调试工具。

Minifier的核心思想是不断删除图的节点和输入，直到找到具有问题的最小图。它通过 4 种策略自动生成一个经过缩小的有问题的图：截断后缀、增量调试、消除死代码和删除未使用的输入。

现在我们将展示如何通过Minifer来调试准确性问题。准确性问题指的是后端 eager 和 inductor 的输出不同的情况。

例如，我们将示例修改如下：

from torch._dynamo.utils import same

def foo2(x1, x2):
    a = torch.neg(x1)
    b = torch.maximum(x2, a)
    y = torch.cat([b], dim=0)
    return y

x1 = torch.randn((1, 8), dtype=torch.float32)
x2 = torch.randn((8390, 8), dtype=torch.float32)

expected_result = foo2(x1, x2)

compiled_foo2 = torch.compile(foo2)
actual_result = compiled_foo2(x1, x2)

assert same(expected_result, actual_result) == True

还要修改neg函数：

def neg3(x):
    return f"decltype({x})(2 * {x})"

准确性问题将如下提出：

torch._dynamo.utils:  [ERROR]  Accuracy  failed:  allclose  not  within  tol=0.0001
Traceback  (most  recent  call  last):
  File  "test_script.py",  line  18,  in  <module>
  assert  same(expected_result,  actual_result)  ==  True
AssertionError

要调试 Minifier 的准确性问题，需要两个环境变量：

TORCHDYNAMO_REPRO_AFTER="aot"  TORCHDYNAMO_REPRO_LEVEL=4  python  xx.py

这给我们提供了记录信息，展示了缩小步骤的过程：

Started  off  with  6  nodes

Trying  granularity  2
Strategy:  Truncate  suffix  (G:  2)  (6  nodes,  2  inputs)
SUCCESS:  Went  from  6  to  4  nodes

Trying  granularity  4
Strategy:  Remove  unused  inputs  (G:  4)  (4  nodes,  2  inputs)
SUCCESS:  Went  from  4  to  3  nodes

运行后，我们得到了目标节点neg的最终缩小图：

def forward2(self, arg0_1):
    neg = torch.ops.aten.neg.default(arg0_1);  arg0_1 = None
    return (neg,)

关于 Minifier 的更多使用细节，请参考Troubleshooting。

性能分析

在本节中，我们将演示使用 Inductor CPU 后端编译的模型进行性能分析的过程。在下面的示例中，我们使用急切模式和 Inductor 图模式对 Hugging Face Transformer 模型MobileBertForQuestionAnswering进行基准测试。基准测试后打印出 Inductor 的执行时间和加速比。我们使用 Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz，并在第一个插槽上运行基准测试，以展示本节中的优化。我们设置以下环境变量作为在 Intel(R) CPU 上进行基准测试的最佳实践。

export  KMP_BLOCKTIME=1
export  KMP_SETTINGS=1
export  KMP_AFFINITY=granularity=fine,compact,1,0
export  LD_PRELOAD=${CONDA_PREFIX:-"$(dirname  $(which  conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname  $(which  conda))/../"}/lib/libjemalloc.so
export  MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
numactl  -C  0-31  -m  0  python  bench.py

# bench.py
from transformers import MobileBertForQuestionAnswering
# Initialize an eager model
model = MobileBertForQuestionAnswering.from_pretrained("csarron/mobilebert-uncased-squad-v2")
seq_length = 128
bs = 128
vocab_size = model.config.vocab_size
input = torch.randint(0, vocab_size, (bs, seq_length), dtype=torch.int64)
input_dict = {"input_ids": input}

# Initialize the inductor model
compiled_model = torch.compile(model)
with torch.no_grad():
    compiled_model(**input_dict)

NUM_ITERS=50
import timeit
with torch.no_grad():
    # warmup
    for _ in range(10):
        model(**input_dict)
    eager_t = timeit.timeit("model(**input_dict)", number=NUM_ITERS, globals=globals())

with torch.no_grad():
    # warmup
    for _ in range(10):
        compiled_model(**input_dict)
    inductor_t = timeit.timeit("compiled_model(**input_dict)", number=NUM_ITERS, globals=globals())
# print(f"eager use: {eager_t * 1000 / NUM_ITERS} ms/iter")
# print(f"inductor use: {inductor_t * 1000 / NUM_ITERS} ms/iter")
# print(f"speed up ratio: {eager_t / inductor_t}")

/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning:

torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.

Downloading config.json:   0%|          | 0.00/765 [00:00<?, ?B/s]
Downloading config.json: 100%|##########| 765/765 [00:00<00:00, 4.90MB/s]

Downloading model.safetensors:   0%|          | 0.00/98.5M [00:00<?, ?B/s]
Downloading model.safetensors:  32%|###1      | 31.5M/98.5M [00:00<00:00, 293MB/s]
Downloading model.safetensors:  75%|#######4  | 73.4M/98.5M [00:00<00:00, 342MB/s]
Downloading model.safetensors: 100%|##########| 98.5M/98.5M [00:00<00:00, 346MB/s]

输出：

eager  use:  802.1023553796113  ms/iter
inductor  use:  339.95180135127157  ms/iter
speed  up  ratio:  2.359459053287382

在我们自己的测试中，我们发现 Inductor CPU 后端可以将模型的速度提高约 2.355 倍。

接下来，让我们深入了解操作级别的性能，以了解速度提升来自哪里。Pytorch Profiler是一个帮助我们的好工具。Inductor CPU 后端支持使用enable_kernel_profile配置选项将融合内核的时间报告给性能分析器：

from torch._inductor import config
config.cpp.enable_kernel_profile = True

按照Pytorch Profiler中的步骤，我们能够获得性能分析表和跟踪文件。

# bench.py
from torch.profiler import profile, schedule, ProfilerActivity
RESULT_DIR = "./prof_trace"
my_schedule = schedule(
    skip_first=10,
    wait=5,
    warmup=5,
    active=1,
    repeat=5)

def trace_handler(p):
    output = p.key_averages().table(sort_by="self_cpu_time_total", row_limit=20)
    # print(output)
    p.export_chrome_trace(f"{RESULT_DIR}/{p.step_num}.json")

for _ in range(10):
    model(**input_dict)  # compiled_model(**input_dict) to get inductor model profiling

total = 0
with profile(
    activities=[ProfilerActivity.CPU],
    schedule=my_schedule,
    on_trace_ready=trace_handler
) as p:
    for _ in range(50):
        model(**input_dict)  # compiled_model(**input_dict) to get inductor model profiling
        p.step()

我们得到了急切模式模型的以下性能分析表（省略了一些列）：

-------------------------  ------------  ------------  ------------
  Name  CPU  total  %  CPU  total  # of Calls
-------------------------  ------------  ------------  ------------
  aten::addmm  45.73%  370.814ms  362
  aten::add  19.89%  161.276ms  363
  aten::copy_  14.97%  121.416ms  488
  aten::mul  9.02%  73.154ms  194
  aten::clamp_min  8.81%  71.444ms  96
  aten::bmm  5.46%  44.258ms  48
  ProfilerStep*  100.00%  810.920ms  1
  aten::div  2.89%  23.447ms  24
  aten::_softmax  1.00%  8.087ms  24
  aten::linear  46.48%  376.888ms  362
  aten::clone  2.77%  22.430ms  98
  aten::t  0.31%  2.502ms  362
  aten::view  0.14%  1.161ms  850
  aten::transpose  0.17%  1.377ms  386
  aten::index_select  0.12%  952.000us  3
  aten::expand  0.12%  986.000us  458
  aten::matmul  8.31%  67.420ms  48
  aten::cat  0.09%  703.000us  1
  aten::as_strided  0.08%  656.000us  963
  aten::relu  8.86%  71.864ms  96
-------------------------  ------------  ------------  ------------
Self  CPU  time  total:  810.920ms

同样，我们还得到了使用 Inductor 编译模型的表格（省略了一些列）：

-----------------------------------------------  ------------  ------------  ------------
  Name  CPU  total  %  CPU  total  # of Calls
-----------------------------------------------  ------------  ------------  ------------
  mkl::_mkl_linear  68.79%  231.573ms  362
  aten::bmm  8.02%  26.992ms  48
  ProfilerStep*  100.00%  336.642ms  1
  graph_0_cpp_fused_constant_pad_nd_embedding_0  0.27%  915.000us  1
  aten::empty  0.27%  911.000us  362
  graph_0_cpp_fused__mkl_linear_add_mul_relu_151  0.27%  901.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_226  0.27%  899.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_361  0.27%  898.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_121  0.27%  895.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_31  0.27%  893.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_76  0.26%  892.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_256  0.26%  892.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_346  0.26%  892.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_241  0.26%  891.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_316  0.26%  891.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_91  0.26%  890.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_106  0.26%  890.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_211  0.26%  890.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_61  0.26%  889.000us  1
  graph_0_cpp_fused__mkl_linear_add_mul_relu_286  0.26%  889.000us  1
-----------------------------------------------  ------------  ------------  ------------
Self  CPU  time  total:  336.642ms

从急切模型的分析表中，我们可以看到最耗时的操作是[aten::addmm, aten::add, aten::copy_, aten::mul, aten::clamp_min, aten::bmm]。与感应器模型的分析表相比，我们注意到一个mkl::_mkl_linear条目和多个形式为graph_0_cpp_fused_*的融合内核。它们是感应器模型正在进行的主要优化。让我们分别讨论它们。

关于mkl::_mkl_linear：您可能会注意到对该内核的调用次数为 362，这恰好与急切模型分析表中的aten::linear相同。aten::linear的 CPU 总时间为 376.888 毫秒，而mkl::_mkl_linear为 231.573 毫秒。这表明“linear”部分的速度提升约为 1.63 倍。速度提升主要来自将权重张量打包到块内存格式，并在 Inductor CPU 后端中调用cblas_sgemm_compute，以在 GEMM 计算过程中获得更好的缓存行为。

(2) 关于其他内存密集型操作：在我们的测试中，急切/电感器模型的端到端延迟为 802/339 毫秒。因此，我们可以粗略推断，其他内存密集型操作的加速大约是 3.94 倍。让我们阅读生成的代码，以了解电感器是如何实现这一令人印象深刻的优化的。您可以通过在 output_code.py 中搜索cpp_fused__mkl_linear_add_mul_relu_151来找到生成的代码。

cpp_fused__mkl_linear_add_mul_relu_151 = async_compile.cpp('''
#include <ATen/record_function.h>
#include "/tmp/torchinductor_root/lr/clrlgu27q4ggd472umdzwsu6qcpqxcuusjxqvx2hwitjbujiiz7z.h"
extern "C" void kernel(float* in_out_ptr0,
 const float* in_ptr0,
 const float* in_ptr1,
 const float* in_ptr2,
 const float* in_ptr3)
{
 RECORD_FUNCTION("graph_0_cpp_fused__mkl_linear_add_mul_relu_151", c10::ArrayRef<c10::IValue>({}));
 #pragma omp parallel num_threads(32)
 {
 {
 #pragma omp for
 for(long i0=static_cast<long>(0L); i0<static_cast<long>(16384L); i0+=static_cast<long>(1L))
 {
 for(long i1=static_cast<long>(0L); i1<static_cast<long>(512L); i1+=static_cast<long>(8L))
 {
 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + static_cast<long>(i1 + (512L*i0)));
 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + static_cast<long>(i1));
 auto tmp3 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + static_cast<long>(i1 + (512L*i0)));
 auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr2 + static_cast<long>(i1));
 auto tmp7 = at::vec::Vectorized<float>::loadu(in_ptr3 + static_cast<long>(i1));
 auto tmp2 = tmp0 + tmp1;
 auto tmp4 = tmp2 + tmp3;
 auto tmp6 = tmp4 * tmp5;
 auto tmp8 = tmp6 + tmp7;
 tmp8.store(in_out_ptr0 + static_cast<long>(i1 + (512L*i0)));
 }
 }
 }
 }
}''')

从上面生成的代码中，我们可以看到这个内核在[add, add, mul, add]上进行了典型的循环融合。这是一个限制性能的内存瓶颈。为了更直观地了解这种优化，我们可以推断输入的大小和步幅，并进一步对这种[add, add, mul, add]模式进行基准测试。

# bench.py
def func(arg_0, arg_1, arg_2, arg_3, arg_4):
    add_0 = arg_0 + arg_1
    add_1 = add_0 + arg_2
    mul_1 = add_1 * arg_3
    add_2 = mul_1 + arg_4
    arg_2 = add_2
    return arg_2

arg_0 = torch.rand(16384, 512)
arg_1 = torch.rand(1, 512)
arg_2 = torch.zeros(16384, 512)
arg_3 = torch.rand(1, 512)
arg_4 = torch.rand(1, 512)

input = (arg_0, arg_1, arg_2, arg_3, arg_4)
inductor_func = torch.compile(func)
with torch.no_grad():
    inductor_func(*input)

import timeit
NUM_ITERS=100
with torch.no_grad():
    # warmup
    for _ in range(10):
        func(*input)
    eager_t = timeit.timeit("func(*input)", number=NUM_ITERS, globals=globals())

with torch.no_grad():
    # warmup
    for _ in range(10):
        inductor_func(*input)
    inductor_t = timeit.timeit("inductor_func(*input)", number=NUM_ITERS, globals=globals())
# print(f"eager use: {eager_t * 1000 / NUM_ITERS} ms/iter")
# print(f"inductor use: {inductor_t * 1000 / NUM_ITERS} ms/iter")
# print(f"speed up ratio: {eager_t / inductor_t}")

输出：

eager  use:  5.780875144992024  ms/iter
inductor  use:  0.9588955780491233  ms/iter
speed  up  ratio:  6.0286805751604735

这只是一个例子。在这个模型中，分析表显示所有的逐元素操作都会自动在电感器内部融合。您可以在 output_code.py 中阅读更多内核。

结论

本文档为电感器 CPU 后端提供了深入的教程。

通过激励性示例，我们演示了调试和性能分析的过程。主要思想是缩小问题范围。

我们逐步演示了深入研究问题并找到失败根本原因的方法，借助调试日志和 Minifier 工具的帮助。首先确定故障发生在哪个组件，然后尝试生成能够重现故障的最小代码片段。

当电感器的性能优于急切模式时，我们为性能分析提供了一种可靠的分析方法。我们展示了如何使用 PyTorch Profiler 找到耗时的热点，并找出解释现象的操作级或内核级原因。

脚本的总运行时间：（9 分钟 21.695 秒）

下载 Python 源代码：inductor_debug_cpu.py

下载 Jupyter 笔记本：inductor_debug_cpu.ipynb