The torch.autograd.grad
function is a part of PyTorch's automatic differentiation package and is used to compute the gradients of given outputs with respect to given inputs. This function is useful when you need to compute gradients explicitly, rather than accumulating them in the .grad
attribute of the input tensors.
Parameters:
- outputs: A sequence of tensors representing the outputs of the differentiated function.
- inputs: A sequence of tensors for which gradients will be calculated.
- grad_outputs: The "vector" in the vector-Jacobian product, usually gradients with respect to each output. Default is None.
- retain_graph: If set to False, the computation graph will be freed. Default value depends on the
create_graph
parameter. - create_graph: If set to True, the graph of the derivative will be constructed, allowing higher-order derivative products. Default is False.
- allow_unused: If set to False, specifying unused inputs when computing outputs will raise an error. Default is False.
- is_grads_batched: If set to True, the first dimension of each tensor in grad_outputs will be interpreted as the batch dimension. Default is False.
Return type:
A tuple containing the gradients with respect to each input tensor.
Example:
Consider a simple example of computing the gradient of a function y = x^2 with respect to x. Here, x is the input and y is the output.
import torch
# Define the input tensor and enable gradient tracking
x = torch.tensor(2.0, requires_grad=True)
# Define the function y = x^2
y = x ** 2
# Compute the gradient of y with respect to x
grads = torch.autograd.grad(outputs=y, inputs=x)
print(grads) # Output: (tensor(4.0),)
In this example, we first define the input tensor x
with a value of 2.0 and enable gradient tracking by setting requires_grad=True
. Then, we define the function y = x^2
. Next, we compute the gradient of y
with respect to x
using torch.autograd.grad(outputs=y, inputs=x)
. The result is a tuple containing the gradient (4.0 in this case), which is the derivative of x^2 with respect to x evaluated at x=2.
The grad_outputs
parameter in the torch.autograd.grad
function represents the "vector" in the vector-Jacobian product. It is a sequence of tensors containing the gradients with respect to each output. The grad_outputs parameter is used when you want to compute a specific vector-Jacobian product, instead of the full Jacobian matrix.
When the gradient is computed using torch.autograd.grad
, PyTorch computes the dot product of the Jacobian matrix (the matrix of partial derivatives) and the provided grad_outputs
vector. If grad_outputs
is not provided (i.e., set to None), PyTorch assumes it to be a vector of ones with the same shape as the output tensor.
Here's an example to help illustrate the concept:
import torch
# Define input tensors and enable gradient tracking
x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)
# Define the output function: z = x^2 + y^2
z = x ** 2 + y ** 2
# Compute the gradients of z with respect to x and y using different grad_outputs values
# Case 1: Default grad_outputs (None)
grads1 = torch.autograd.grad(outputs=z, inputs=(x, y))
print("Case 1 - Default grad_outputs:", grads1) # Output: (tensor(4.0), tensor(6.0))
# Case 2: Custom grad_outputs (scalar value)
grad_outputs_scalar = torch.tensor(2.0)
grads2 = torch.autograd.grad(outputs=z, inputs=(x, y), grad_outputs=grad_outputs_scalar)
print("Case 2 - Custom grad_outputs (scalar):", grads2) # Output: (tensor(8.0), tensor(12.0))
# Case 3: Custom grad_outputs (tensor value)
grad_outputs_tensor = torch.tensor(3.0)
grads3 = torch.autograd.grad(outputs=z, inputs=(x, y), grad_outputs=grad_outputs_tensor)
print("Case 3 - Custom grad_outputs (tensor):", grads3) # Output: (tensor(12.0), tensor(18.0))
In this example, we define two input tensors x
and y
with values 2.0 and 3.0 respectively, and enable gradient tracking by setting requires_grad=True
. Then, we define the output function z = x^2 + y^2
. We compute the gradients of z
with respect to x
and y
using three different values for grad_outputs
.
- Case 1 - Default
grad_outputs
: The gradients are (4.0, 6.0), which correspond to the partial derivatives of z with respect to x and y (2x and 2y) evaluated at x=2 and y=3. - Case 2 - Custom
grad_outputs
(scalar): We provide a scalar value of 2.0 asgrad_outputs
. The gradients are (8.0, 12.0), which are the original gradients (4.0, 6.0) multiplied by the scalar value 2. - Case 3 - Custom
grad_outputs
(tensor): We provide a tensor value of 3.0 asgrad_outputs
. The gradients are (12.0, 18.0), which are the original gradients (4.0, 6.0) multiplied by the tensor value 3.
As you can see from the examples, providing different values for grad_outputs
affects the resulting gradients, as it represents the vector in the vector-Jacobian product. This parameter can be useful when you want to weight the gradients differently, or when you need to compute a specific vector-Jacobian product.
Here's another example with a multi-output function to further illustrate the concept:
import torch
# Define input tensor and enable gradient tracking
x = torch.tensor([2.0, 3.0], requires_grad=True)
# Define the multi-output function: y = [x0^2, x1^2]
y = x ** 2
# Compute the gradients of y with respect to x using different grad_outputs values
# Case 1: Default grad_outputs (None)
grads1 = torch.autograd.grad(outputs=y, inputs=x)
print("Case 1 - Default grad_outputs:", grads1) # Output: (tensor([4., 6.]),)
# Case 2: Custom grad_outputs (tensor)
grad_outputs_tensor = torch.tensor([1.0, 2.0])
grads2 = torch.autograd.grad(outputs=y, inputs=x, grad_outputs=grad_outputs_tensor)
print("Case 2 - Custom grad_outputs (tensor):", grads2) # Output: (tensor([ 4., 12.]),)
In this example, we define an input tensor x
with two elements and enable gradient tracking. We then define a multi-output function y = [x0^2, x1^2]
. We compute the gradients of y
with respect to x
using different values for grad_outputs
.
- Case 1 - Default
grad_outputs
: The gradients are (4.0, 6.0), which correspond to the partial derivatives of y with respect to x (2x0 and 2x1) evaluated at x0=2 and x1=3. - Case 2 - Custom
grad_outputs
(tensor): We provide a tensor with values[1.0, 2.0]
asgrad_outputs
. The gradients are (4.0, 12.0), which are the original gradients (4.0, 6.0) multiplied element-wise by thegrad_outputs
tensor.
In the second case, the gradients are computed as the product of the Jacobian matrix and the provided grad_outputs
tensor. This allows us to compute specific vector-Jacobian products or weight the gradients differently for each output.