Does PyTorch pass by reference? #1004

Kart1290 · 2024-07-06T08:34:14Z

Kart1290
Jul 6, 2024

My Question

Just wondering how the information of the gradients is passed between the optimizer, criterion (loss function object) and the model

My code

optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = nn.MSELoss()
for _ in range(100): # number of epochs
  optimizer.zero_grad() # zeros the gradient
  pred = model(X_train) # runs the model and makes prediction -> graph
  loss = criterion(pred, y_train) # calculates the loss
  loss.backward() # backward propagation to determine the grad -> updates graph
  optimizer.step() # updates the weights and bias with a step (-grad * lr)

What I assume

I assume that the optimizer, criterion and model all share a reference to the gradient and the parameters? Since this is not done explicitly (with the exception of the parameters into the optimizer) I assume this is done in the background?

LuluW8071 · 2024-07-09T18:01:54Z

LuluW8071
Jul 9, 2024

Yes, you are correct. In PyTorch, the optimizer, criterion, and model objects interact in a coordinated manner.

If you know the back propagation process of gradient descent, u will get the gist of what it does.

First, the model predicts randomly called logits
This logit is compared with true_labels and calculates the difference by formula
$$\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2$$
where $\hat{y_i}$ is predicted value and $y_i$ is true value
Then in next iteration, during backpropagation the optimizer SGD updates the model weights and bias accordingly to minimize the MSE Loss
This is repeated until the Model starts to predicts closer to true label, MSE Loss becomes minimum and reaches global minima.

Note: SGD takes each sample at a time meaning if u have 10k samples, u have 10k iterations in a epoch.

0 replies

tal7aouy · 2024-07-19T09:14:39Z

tal7aouy
Jul 19, 2024

Your assumptions are correct. The sharing of gradient and parameter information between the optimizer, criterion, and model happens implicitly in the background through the use of PyTorch's autograd system. Here's a breakdown of how it works:

Model Parameters:
When you initialize the optimizer with model.parameters(), the optimizer keeps references to the model's parameters. These parameters are instances of torch.nn.Parameter, which are tensors with the ability to accumulate gradients.
```
optimizer = optim.SGD(model.parameters(), lr=0.01)
```
Forward Pass:
During the forward pass, you use the model to compute the predictions. The computation graph (also called the dynamic computational graph) is built at this stage. This graph tracks the operations performed on the tensors, which is essential for backpropagation.
```
pred = model(X_train)
```
Loss Calculation:
The criterion (loss function) calculates the loss between the predictions and the true labels. This step also involves operations that are added to the computation graph.
```
loss = criterion(pred, y_train)
```
Backward Pass:
The loss.backward() call triggers backpropagation. During backpropagation, PyTorch computes the gradients of the loss with respect to each parameter by traversing the computation graph in reverse. These gradients are accumulated (summed) in the grad attribute of each parameter tensor.
```
loss.backward()
```
Optimizer Step:
The optimizer updates the parameters using the gradients stored in their grad attributes. The optimizer.step() method iterates over each parameter, applies the update rule (e.g., for SGD, it subtracts lr * grad from the parameter value), and updates the parameter values in place.
```
optimizer.step()
```
Zeroing the Gradients:
Before the next iteration, you zero the gradients to prevent them from accumulating from previous iterations. This is done using optimizer.zero_grad(), which sets the grad attribute of each parameter to zero.
```
optimizer.zero_grad()
```

1 reply

Kart1290 Jul 20, 2024
Author

Thanks for the detailed explanation, appreciate it :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does PyTorch pass by reference? #1004

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Does PyTorch pass by reference? #1004

Kart1290 Jul 6, 2024

My Question

My code

What I assume

Replies: 2 comments · 1 reply

LuluW8071 Jul 9, 2024

tal7aouy Jul 19, 2024

Kart1290 Jul 20, 2024 Author

Kart1290
Jul 6, 2024

Replies: 2 comments 1 reply

LuluW8071
Jul 9, 2024

tal7aouy
Jul 19, 2024

Kart1290 Jul 20, 2024
Author