1943: Warren McCulloch & Walter Pitts—mathematical model of artificial neuron.
1961: Frank Rosenblatt—perceptron.
1961: Arthur Samuel’s checkers program.
1986: James McClelland, David Rumelhart & PDP Research Group—book: “Parallel Distributed Processing”.
A NN is a parameterized function which can, in theory, solve any problem to any level of accuracy.
The learning process is the mapping of input data to output data (in a training set) through the adjustment of the parameters.
Image by <a href="https://news.berkeley.edu/2020/03/19/high-speed-microscope-captures-fleeting-brain-signals/" target="_blank">Na Ji, UC Berkeley</a>
Modified from <a href="https://royalsocietypublishing.org/doi/10.1098/rsta.2019.0163" target="_blank">O.C. Akgun & J. Mei 2019</a>
Single layer of artificial neurons → Unable to learn even some of the simple mathematical functions (Marvin Minsky & Seymour Papert).
Two layers → Theoretically can approximate any math model, but in practice very slow.
More layers → Deeper networks
Single layer of artificial neurons → Unable to learn even some of the simple mathematical functions (Marvin Minsky & Seymour Papert).
Two layers → Theoretically can approximate any math model, but in practice very slow.
More layers → Deeper networks → deep learning.
Used for spatially structured data.
Convolution layers → each neuron receives input only from a subarea of the previous layer.
Pooling → combines the outputs of neurons in a subarea to reduce the data dimensions.
Not fully connected.
Used for chain structured data (e.g. text).
Not feedforward.
First, we need an architecture (size, depth, types of layers, etc.).
This is set before training and does not change.
A model also comprises parameters.
Those are set to some initial values, but will change during training.
To train the model, we need labelled data in the form of input/output pairs.
Inputs and parameters are fed to the architecture.
We get predictions as outputs.
A metric (e.g. error rate) compares predictions and labels and is a measure of model performance.
Because it is not always sensitive enough to changes in parameter values, we compute a loss function …
… which allows to adjust the parameters slightly through backpropagation.
This cycle gets repeated for a number of steps.
At the end of the training process, what matters is the combination of architecture and trained parameters.
That’s what constitute a model.
A model can be considered as a regular program …
… and be used to obtain outputs from inputs.
fastai
is a deep learning library that builds on top of PyTorch, adding a higher level of functionality.
[It] is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable.
Manual
Tutorials
Peer-reviewed paper
Paperback version
Free MOOC version of part 1 of the book
Jupyter notebooks version of the book
Create iterators with the training and validation data.
Train the model.
Get predictions from our model.
In our case, we need the vision
library:
from fastai.vision.all import *
Other domains available:
from fastai.text.all import *
from fastai.tabular.all import *
from fastai.collab import *
Note that import *
is not recommended in Python outside the context of fastai.
A fastai class.
A simple wrapper around the PyTorch DataLoader
class with added functionality (a DataLoader
creates batches of data and sends them to the CPU or GPU as you iterate through it).
Creates an object of class DataLoaders which contains a validation DataLoader and a training DataLoader.
Using search_images_bing
, a convenience function to download images from the Bing API (free registration required).
key = os.environ.get('AZURE_SEARCH_KEY', '<your-private-key>')
Let’s download paintings from Monet:
monet = search_images_bing(key, 'monet')
ims = monet.attrgot('content_url')
path = Path('dataset')
fns = get_image_files(path)
fns
Note that this last output is of a fastai class L
: a Python list with added functionality.
We can do the same with Van Gogh:
vangogh = search_images_bing(key, 'vangogh')
ims = vangogh.attrgot('content_url')
path = Path('dataset')
fns = get_image_files(path)
fns
Data block API:
paintings = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=Resize(128))
valid_pct=0.2
: keep a validation set (20% of the data) to test the model.
dls = paintings.dataloaders(path)
We now have our DataLoaders
object dls
.
Let’s have a look at 4 items:
dls.valid.show_batch(max_n=4, nrows=1)
To standardize the size of images, Resize
cropped them, but there are alternative methods:
paintings = paintings.new(item_tfms=Resize(128, ResizeMethod.Squish))
dls = paintings.dataloaders(path)
dls.valid.show_batch(max_n=4, nrows=1)
To standardize the size of images, Resize
cropped them, but there are alternative methods:
paintings = paintings.new(item_tfms=Resize(128, ResizeMethod.Pad,
pad_mode='zeros'))
dls = paintings.dataloaders(path)
dls.valid.show_batch(max_n=4, nrows=1)
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.398155 | 0.167293 | 0.068152 | 00:03 |
1 | 0.204518 | 0.135118 | 0.050011 | 00:02 |
2 | 0.199835 | 0.099103 | 0.040012 | 00:05 |
3 | 0.179214 | 0.046119 | 0.025813 | 00:03 |
For each function, there is another function representing, not the values, but the rate of change of the values of the first function.
We need the gradients of the parameters with respect to the loss function to know in which direction and with which magnitude to adjust them at each step.
To start tracking all operations performed on our model parameters:
params = tensor.requires_grad_()
Get the values predicted by our model with our parameters:
preds = model(params)
Calculate the loss:
loss = loss_func(preds, targets)
Backpropagation:
loss.backward()
Get the gradients:
params.grad
Update the parameters:
params.data -= params.grad.data * lr
.data
stops the gradient from being calculated on this operation.lr
: learning rate.Get the gradients:
params.grad
Update the parameters:
params.data -= params.grad.data * lr
.data
stops the gradient from being calculated on this operation.lr
: learning rate.
Reset the gradient:
params.grad = None
Putting it all together:
def apply_step(params, prn=True):
preds = model(params)
loss = loss_func(preds, targets)
loss.backward()
params.data -= params.grad.data * lr
params.grad = None
if prn: print(loss.item())
return preds
for i in range(4): apply_step(params)
Bias is always present in data.
Document the limitations and scope of your data as best as possible.
Problems to watch for:
The last one is particularly problematic whenever the model outputs the next round of data based on interactions of the current round of data with the real world.
Solution: ensure there are human circuit breakers and oversight.