PyTorch comes with several packages that make working with neural nets easy.
In the previous lesson, we learnt about torch.autograd
which allows automatic calculation of gradients during backpropagation.
This lesson introduces torch.nn
and torch.optim
. They are often imported with:
import torch.nn as nn
import torch.optim as optim
The torch.nn.functional
module contains all the functions of the torch.nn
package. By convention, it is imported as F
import torch.nn.functional as F
These functions include loss functions, activation functions, pooling functions … i.e. all the functions that are used in the building and training of a neural net. Since torch.autograd
can be used on any callable object, you can also create and use your own functions.
In our previous lesson, we calculated a loss function manually with:
loss = (predicted - real).pow(2).sum()
Within torch.nn.functional
, you can select from a large range of loss functions:
to calculate the binary cross entropy between the target and the outputbinary_cross_entropy_with_logits
to calculate the binary cross entropy between target and output logitspoisson_nll_loss
for Poisson negative log likelihood loss Go to the documentation for a full list.
If we want to use the negative log likelihood loss function, we can run:
loss = F.nll_loss(predicted, real)
As mentioned earlier, torch.nn.functional
also has activation functions.
can be called with torch.nn.functional.relu()
with torch.nn.functional.softmax()
is the base class for all neural network modules. To build your model, you create a subclass of torch.nn.Module
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
Python’s class inheritance gives our subclass all the functionality of torch.nn.Module
while allowing us to customize it.
Then, you can create submodules and assign them as attributes:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 10)
If this Python syntax is obscure to you, you should have a look at the answers to this question
, as well as this answer to a similar question
Finally, you define the method for the forward pass in your subclass of torch.nn.Module
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
Now, we have our network architecture, so we can create an instance of it:
model = Net()
Even better, we can send it to our device of choice (CPU or GPU):
model = Net().to(device)
The package torch.optim
contains classic optimization algorithms such as optim.SGD()
, optim.Adam()
, or optim.Adadelta()
To use them, you define an optimizer with one such algorithms:
optimizer = optim.Adadelta(model.parameters(), lr=1.0)
Then use:
optimizer.zero_grad() # resets the gradient to 0
into the training cluster:
Create a directory for this project and cd
into it:
mkdir mnist; cd mnist
Start a first Python script:
nano # use the text editor of your choice
Let’s start with the simplest possible neural net: a multilayer perceptron (MLP)
It is a feed-forward (i.e. no loop), fully-connected (i.e. each neuron of one layer is connected to all the neurons of the adjacent layers) neural network with a single hidden layer.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import StepLR
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
def train(model, device, train_loader, optimizer, epoch):
for batch_idx, (data, target) in enumerate(train_loader):
data, target =,
output = model(data)
loss = F.nll_loss(output, target)
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target =,
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
def main():
epochs = 3
device = torch.device('cpu')
transform = transforms.Compose([
transforms.Normalize((0.1307,), (0.3081,))
train_data = datasets.MNIST(
train=True, download=True, transform=transform)
test_data = datasets.MNIST(
train=False, transform=transform)
train_loader =, batch_size=64)
test_loader =, batch_size=1000)
model = Net().to(device)
optimizer = optim.Adadelta(model.parameters(), lr=1.0)
scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
for epoch in range(1, epochs + 1):
train(model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
We will only run our model over 3 epochs to save time. Obviously, you normally would run it much longer.
We are using CPUs.
We will use the Adadelta algorithm as optimizer.
Finally, we run the whole model by running main()
Write an
#SBATCH --time=0:5:0
#SBATCH --cpus-per-task=1
#SBATCH --mem=3G
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
# Activate your virtual env
source ~/env/bin/activate
# Run your Python script
python ~/mnist/
Monitor its status with:
Let’s step this up and build a CNN. Convolutional Neural Networks are particularly well-suited to image data.
The figure below is not an exact scheme of the model we will build, but it represents a similar model made of convolution, pooling, and fully-connected layers.
Our new script will be very similar to
: we will only change the model architecture. So you can copy
and edit the copy:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
Write a
script. We need more time here as a CNN will take longer to run.
#SBATCH --time=0:15:0
#SBATCH --cpus-per-task=1
#SBATCH --mem=3G
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
# Activate your virtual env
source ~/env/bin/activate
# Run your Python script
python ~/mnist/
Monitor its status with: