Deep Learning is a way to teach computers to hear, see, talk, or, more generally, think in order to solve a problem or create new ideas. To do this, we show the computer a lot of examples of how the task is done such that when presented with a problem that is similar but not the same as the examples, the computer will know how to perform the same task on the new problem. It’s no coincidence that this process sounds like the way you and I would learn a new skill. Deep Learning was inspired by how the brain works.
Over the next three months, I will be going through some Deep Learning material and sharing what I learn in simple English. I will try to use as little Math or technical language as possible. I’ll also share ways in which you can use Deep Learning on your own. Deep Learning has shown potential to completely change our lives and it is my hope that everyone who is interested understands how it works and benefits from the impending revolution that will be ushered by the technique.
The best place to start looking at Deep Learning is its origins and foundations. The fundamental technique behind Deep Learning is something called Neural Networks, which is an idea first developed in 1940s in a paper titled A Logical Calculus of Ideas Immanent in Nervous Activity. The paper’s authors, Warren McCulloch and Walter Pitts, built a model of how they thought neurons (brain cells) worked and over the years the idea was further developed to what we have now. Each brain cell or neuron receives some small electric charges from neighboring brain cells and depending on all these charges the brain cell either does nothing or it produces its own charge to send to other neighboring brain cells. Similarly, subsequent brain cells perform the same action with the charges they receive. To portray this behavior in a very simple model we replace the electric charges with numbers and we have our neurons run a calculation with the numbers to produce a result. For the case where the brain cell sends a charge to neighboring brain cells our neuron model will send the results of its calculation to neighboring neurons.
Simple Neuron Model
Sometimes the result of the calculation will be zero and the neuron will send the zero to its neighboring neurons. This is important because it portrays the case where the brain cell receives electric charges but does not produce or send any electric charges. In other words, sending a zero in the simple neuron model is like doing nothing in the real case of a brain cell.
The neuron described in the simple neuron model is a building block. On its own, the neuron does not perform any important tasks. It simply receives information, processes the information in a calculation of some form, produces a result, and then sends the result to some neighboring neurons. However, when several neuron are connected together and their respective calculations are carefully crafted, that group of neurons can be made to perform useful tasks. Such a group of neurons is called a neural network. The following is an example of a neural network with three neurons.
The diagram is read from left to right and information also flows from left to right (input to output). The arrows pointing from a number to a neuron indicate that the neuron is given a copy of that number. The arrows pointing from one neuron to another neuron indicate that one neuron sends the result of its calculation to the other neuron. To summarize the network from the left to the right:
- A copy of 1 and 0 on the far left (the input) flows along the lines to each of the 2 neurons in layer 1
- Each of the 2 neurons in layer 1 ends up with a copy of both inputs, 1 and 0
- The 2 neurons process the 2 numbers is some calculations to produce a result
- The results of the neurons in layer 1 flow along the arrows to the neuron in layer 2
- The neuron in layer 2 ends up with 2 numbers –the results from the previous layer
- Neuron in layer 2 processes the 2 numbers in a calculation and the result, which is 1, flows along the arrow to the output
Building a simple Neural Network
The neural network in Figure 2.1 is not particularly important to us yet but we can train it to do interesting things. For example, by carefully crafting the calculations that each neuron carries out we can get the neural network to always a produce a result of 1 in the output when the numbers in the input are 1 and 0 regardless of their arrangement. In addition, we can get the network to also produce a result of 0 when both input numbers are 1 or both input number are 0. The diagrams below show all four possibilities.
Building a Neural Network for more complicated Tasks
Comparing 2 number is interesting but it doesn’t seem very useful. It’s also a far-cry from the definition of Deep Learning that I gave at the beginning. To get a little closer to the definition consider the following images.
It is easy to tell that these images are the numbers 2, 7, 4 and 1 even with slight alterations e.g. say a digit is on the bottom-most line (7th row), in the middle or 2 rows shorter. I want to build a neural network that will do the same, that is to see the digits and correctly report that they are 2, 7, 4 and 1 regardless of whether a digit is all the way on the left, right or a little squished together. To make this simpler I will not use a camera. Instead, I will tell the neural network which of the little boxes is shaded or not. For a shaded box I will use 1 and for a clear box 0. I chose 0 and 1 because they are easy to work with. Here is an example for the digit 2.
In Figure 4.2 the digit 2 on the left is first transformed into a grid of 1s and 0s. 1 represents a dark box and 0 represents a clear box. The rows are then joined together into a single column of 49 numbers which become the input of the neural network. Note: Some of the rows are not shown in the input because there is not enough space.
Starting from the left, each of the neurons in layer 1 of the neural network in Fig 4.3 received a copy of the 49 numbers. Likewise, each of the neurons in layer 2 receives 3 results from the 3 neurons in the previous layer and so forth. Based on past experience, three neurons in layer 1 and layer 2 should be enough. The calculations in each of the neurons are carefully crafted such that the last column -containing 10 neurons – will produce the number 1 or close to 1 only in the neuron that corresponds with the correct digit and approximately 0 in the rest of the neurons. Here are some examples.
Now the neural network correctly reports the digits that are fed into the input. It took a lot more neurons but our neural network can ‘see’ the digits albeit with a few compromises. This is a significant step up from a network that can only compare two numbers. The important difference between the two networks is the number of neurons and the columns of neurons used. The concept is the same.
Similarly, to build a network that can see photos of dogs or cats and correctly report the pet we need more neurons and more columns. A photo typically has a grid much larger, say 300×300, and each little box can be shaded all the way, not at all or somewhere in between. Moreover, each little box can have one of several thousands of colors, each of which is represented by 3 numbers which show how much red, green or blue is in that color. Needless to say, photos have vastly more information and to capture all that information we need a much bigger neural network with more layers and neurons. When a neural network is this size and has this many layers, it is called a Deep Neural network and the field concerned with building such networks is called Deep Learning.
Deep learning uses very simple building blocks like neurons to perform complex tasks like seeing a number and reporting the number seen. In the next post I will explain how the neurons are configured in order to be able to work together to perform a task.