How to build a simple neural network in a few lines of Python code

Building a simple neural network can be done in a few lines of Python code.

Formula for calculating the neuron’s output

The formula for calculating the neuron’s output: Take the weighted sum of the neuron’s inputs:

Next we normalise this, so the result is between 0 and 1. For this, we use a mathematically convenient function, called the Sigmoid function:

If plotted on a graph, the Sigmoid function draws an S shaped curve.

Diagram 4

So by substituting the first equation into the second, the final formula for the output of the neuron is:

You might have noticed that we’re not using a minimum firing threshold, to keep things simple.

Formula for adjusting the weights

During the training cycle (Diagram 3), we adjust the weights. But how much do we adjust the weights by? We can use the “Error Weighted Derivative” formula:

Why this formula? First we want to make the adjustment proportional to the size of the error. Secondly, we multiply by the input, which is either a 0 or a 1. If the input is 0, the weight isn’t adjusted. Finally, we multiply by the gradient of the Sigmoid curve (Diagram 4). To understand this last one, consider that:

  1. We used the Sigmoid curve to calculate the output of the neuron.
  2. If the output is a large positive or negative number, it signifies the neuron was quite confident one way or another.
  3. From Diagram 4, we can see that at large numbers, the Sigmoid curve has a shallow gradient.
  4. If the neuron is confident that the existing weight is correct, it doesn’t want to adjust it very much. Multiplying by the Sigmoid curve gradient achieves this.

The gradient of the Sigmoid curve, can be found by taking the derivative:

So by substituting the second equation into the first equation, the final formula for adjusting the weights is:

There are alternative formulae, which would allow the neuron to learn more quickly, but this one has the advantage of being fairly simple.

There are a couple of articles that I can point you to:


How to build a simple neural network in 9 lines of Python code

As part of my quest to learn about AI, I set myself the goal of building a simple neural network in Python. To ensure I truly understand it, I had to build it from scratch without using a neural network library. Thanks to an excellent blog post by Andrew Trask I achieved my goal. Here it is in just 9 lines of code:

In this blog post, I’ll explain how I did it, so you can build your own. I’ll also provide a longer, but more beautiful version of the source code.

But first, what is a neural network? The human brain consists of 100 billion cells called neurons, connected together by synapses. If sufficient synaptic inputs to a neuron fire, that neuron will also fire. We call this process “thinking”.

Diagram 1

We can model this process by creating a neural network on a computer. It’s not necessary to model the biological complexity of the human brain at a molecular level, just its higher level rules. We use a mathematical technique called matrices, which are grids of numbers. To make it really simple, we will just model a single neuron, with three inputs and one output.


Constructing the Python code

Although we won’t use a neural network library, we will import four methods from a Python mathematics library called numpy. These are:

  • exp — the natural exponential
  • array — creates a matrix
  • dot — multiplies matrices
  • random — gives us random numbers

For example we can use the array() method to represent the training set shown earlier:

The ‘.T’ function, transposes the matrix from horizontal to vertical. So the computer is storing the numbers like this.

Ok. I think we’re ready for the more beautiful version of the source code. Once I’ve given it to you, I’ll conclude with some final thoughts.

I have added comments to my source code to explain everything, line by line. Note that in each iteration we process the entire training set simultaneously. Therefore our variables are matrices, which are grids of numbers. Here is a complete working example written in Python:

Also available here:

A Neural Network in 11 lines of Python (Part 1)

A bare bones neural network implementation to describe the inner workings of backpropagation.

Posted by iamtrask on July 12, 2015

Summary: I learn best with toy code that I can play with. This tutorial teaches backpropagation via a very simple toy example, a short python implementation.

Edit: Some folks have asked about a followup article, and I’m planning to write one. I’ll tweet it out when it’s complete at @iamtrask. Feel free to follow if you’d be interested in reading it and thanks for all the feedback!

Just Give Me The Code:

01.= np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])
02.= np.array([[0,1,1,0]]).T
03.syn0 = 2*np.random.random((3,4)) - 1
04.syn1 = 2*np.random.random((4,1)) - 1
05.for in xrange(60000):
06.l1 = 1/(1+np.exp(-(,syn0))))
07.l2 = 1/(1+np.exp(-(,syn1))))
08.l2_delta = (y - l2)*(l2*(1-l2))
09.l1_delta = * (l1 * (1-l1))
10.syn1 +=
11.syn0 +=

Other Languages: DC++ CUDA

However, this is a bit terse…. let’s break it apart into a few simple parts.

Part 1: A Tiny Toy Network

A neural network trained with backpropagation is attempting to use input to predict output.



Consider trying to predict the output column given the three input columns. We could solve this problem by simply measuring statistics between the input values and the output values. If we did so, we would see that the leftmost input column is perfectly correlated with the output. Backpropagation, in its simplest form, measures statistics like this to make a model. Let’s jump right in and use it to do this.

2 Layer Neural Network:

01.import numpy as np
03.# sigmoid function
04.def nonlin(x,deriv=False):
06.return x*(1-x)
07.return 1/(1+np.exp(-x))
09.# input dataset
10.= np.array([  [0,0,1],
13.[1,1,1] ])
15.# output dataset           
16.= np.array([[0,0,1,1]]).T
18.# seed random numbers to make calculation
19.# deterministic (just a good practice)
22.# initialize weights randomly with mean 0
23.syn0 = 2*np.random.random((3,1)) - 1
25.for iter in xrange(10000):
27.# forward propagation
28.l0 = X
29.l1 = nonlin(,syn0))
31.# how much did we miss?
32.l1_error = - l1
34.# multiply how much we missed by the
35.# slope of the sigmoid at the values in l1
36.l1_delta = l1_error * nonlin(l1,True)
38.# update weights
39.syn0 +=,l1_delta)
41.print "Output After Training:"
42.print l1
Output After Training:
[[ 0.00966449]
 [ 0.00786506]
 [ 0.99358898]
 [ 0.99211957]]


XInput dataset matrix where each row is a training example
yOutput dataset matrix where each row is a training example
l0First Layer of the Network, specified by the input data
l1Second Layer of the Network, otherwise known as the hidden layer
syn0First layer of weights, Synapse 0, connecting l0 to l1.
*Elementwise multiplication, so two vectors of equal size are multiplying corresponding values 1-to-1 to generate a final vector of identical size.
Elementwise subtraction, so two vectors of equal size are subtracting corresponding values 1-to-1 to generate a final vector of identical size. x and y are vectors, this is a dot product. If both are matrices, it’s a matrix-matrix multiplication. If only one is a matrix, then it’s vector matrix multiplication.

As you can see in the “Output After Training”, it works!!! Before I describe processes, I recommend playing around with the code to get an intuitive feel for how it works. You should be able to run it “as is” in an ipython notebook (or a script if you must, but I HIGHLY recommend the notebook). Here are some good places to look in the code:

• Compare l1 after the first iteration and after the last iteration.
• Check out the “nonlin” function. This is what gives us a probability as output.
• Check out how l1_error changes as you iterate.
• Take apart line 36. Most of the secret sauce is here.
• Check out line 39. Everything in the network prepares for this operation.

Let’s walk through the code line by line.

Recommendation: open this blog in two screens so you can see the code while you read it. That’s kinda what I did while I wrote it. 🙂

Line 01: This imports numpy, which is a linear algebra library. This is our only dependency.

Line 04: This is our “nonlinearity”. While it can be several kinds of functions, this nonlinearity maps a function called a “sigmoid”. A sigmoid function maps any value to a value between 0 and 1. We use it to convert numbers to probabilities. It also has several other desirable properties for training neural networks.


See also: