This notebook can be found on my github: https://github.com/tonygallen/JPUG
(a tutorial by someone who knows nothing about Machine Learning)
Cons:
Pros:
Pros:
Cons:
I should mention Knet.jl seems like a good option as well.
Note: If you want GPU support, check out CuArrays.jl (https://fluxml.ai/Flux.jl/stable/gpu/) if you have an appropriate NVIDIA GPU. I do not use it in this tutorial.
#using Pkg
#Pkg.add("Flux")
using Flux
The soul of Machine Learning
using Flux.Tracker
# executable math
f(x) = x^2+1
# f'(x) = 2x
df(x) = gradient(f,x,nest=true)[1] # df is a tuple, [1] gets the first coordinate
df(4)
# f''(x) = 2
ddf(x) = gradient(df,x,nest=true)[1]
ddf(0)
h(x) = -cos(x)^cos(x)
# h'(x) = tan(x)cos(x)^(cos(x)+1)(log(cos(x))+1) obviously
dh(x)=gradient(h,x)[1]
dh(pi/4)
But in ML, the functions are over something like $\mathbb{R}^{bajillion}$.
So for functions of multiple variables:
f(x,y,z) = x^2 + y^2 + z^2
#grad(f) = (2x,2y,2z)
gradient(f,1,2,3)
And if we have a bunch of different parameters:
# Quick Example to introduce Params(): Linear Regression
# random initial parameters
W = rand(5,10)
b = rand(5)
fhat(x) = W*x + b
function loss(x,y)
yhat = fhat(x) # our prediction for y
return sum((y-yhat).^2)
end
x = rand(10)
y = rand(5)
loss(x,y) # big loss with random parameters
# I have 50+ paramters, how do I pass them all at once?
W = param(W)
b = param(b)
grads = gradient(() -> loss(x, y), Params([W, b]))
# gradient descent
alpha = 0.01 # learning rate, step size, etc
gW = grads[W]
gb = grads[b]
Tracker.update!(W,-alpha*gW) # essentially W = W - alpha * gW. It does something else I don't understand
Tracker.update!(b,-alpha*gb);
loss(x,y)
Run this several times and watch the loss go down!
Enough about gradients, there are other packages if you're interested in doing more (ForwardDiff.jl for forward-mode, Calculus.jl for symbolic/finite differences, Zygote)
Just repeat the above linear regression example and compose.
function sigmoid(x)
return 1/(1+exp(-x))
end
W1 = param(rand(7, 10))
b1 = param(rand(7))
layer1(x) = W1 * x .+ b1
W2 = param(rand(5, 7))
b2 = param(rand(5))
layer2(x) = W2 * x .+ b2
model(x) = layer2(sigmoid.(layer1(x)))
layer1(x) = Dense(10,7,sigmoid)
layer2(x) = Dense(7,5)
model(x) = layer2(layer1(x))
# or equivalently
model2(x) = Chain(layer1,layer2)
# cool thing about Chain is that it supports indexing
model2(x)[1]
Flux has the common objective functions and optimizers built in.
# To train a model call somthing like
train!(objective, parameters, data, optimizer, cb = () -> println("still training..."))
# cb stands for callback. Its useful to updating you about training (e.g. what the loss is currently)
# By default, it is called after every batch. Use Flux.throttle() to change this
This only trains for 1 epoch. To train for more, use the @epoch macro. e.g.
@epoch 5 train!(...)
using Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated
#using CuArrays if you want to use GPU
imgs = Flux.Data.MNIST.images()
labels = Flux.Data.MNIST.labels();
The Training data consists of 60,000 images of hand written digits like this:
imgs[27454] # pick a number 1-60000
The goal is learn how to identify them:
labels[27454]
## Boring Preprocessing
X = hcat(float.(reshape.(imgs, :))...) #stack all the images
Y = onehotbatch(labels, 0:9) # just a common way to encode categorical variables
Lets create our model!
# Our model, just like before, chaining dense layers
# Go from 28^2 dimensional space (images are 28x28) to 10 dimensional space (labels are 0-9)
m = Chain(
Dense(28^2, 32, relu),
Dense(32, 10),
softmax)
# softmax just converts output to probability distribution
Now, to choose our objective function (loss) and optimizer!
loss(x, y) = crossentropy(m(x), y)
opt = ADAM(); # popular stochastic gradient descent variant
accuracy(x, y) = mean(onecold(m(x)) .== onecold(y)) # cute way to find average of correct guesses
dataset = repeated((X,Y),200) # repeat the data set 200 times, as opposed to @epochs 200 ...
evalcb = () -> @show(loss(X, Y)) # callback to show loss
Time to train!
Flux.train!(loss, params(m), dataset, opt, cb = throttle(evalcb, 10)); #took me ~5 minutes to train on CPU
10,000 images were saved to test our model. Lets look at one of them.
Flux.Data.MNIST.images(:test)[5287] # give me a number 1-10000
# Same preprocessing
test_X = hcat(float.(reshape.(Flux.Data.MNIST.images(:test), :))...)
test_Y = onehotbatch(Flux.Data.MNIST.labels(:test), 0:9);
m(test_X[:,5287]) # Note the 7th index ( corresponding to the digit 6 ) is nearly 1
#decode
onecold(m(test_X[:,5287])) - 1 #minus 1 since we start from 0, but indexing in Julia starts at 1
Overall, heres how our model does:
# Training set accuracy
accuracy(X, Y)
# Test set accuracy
accuracy(test_X, test_Y)
Talk at JuliaCon 2017: https://www.youtube.com/watch?v=9KBaRS2gy-U
Documentation: https://fluxml.ai/Flux.jl/stable/
Many examples here! https://github.com/FluxML/model-zoo/