So far we have looked at how Deep Learning works, different kinds of deep learning techniques and specific examples of deep learning, but perhaps one of the most important aspects of Deep Learning is preparing the data that we use to train a neural network. In this post I will describe common pitfalls and best practices for handling data. I will also go through an example inspired by the exercise from the previous post. Making the wrong choices about data can cause problems that are not easy to spot until the problem occurs. The conditions that trigger the problem may also not occur until after the neural has been deployed which is a terrible time to spot the problem. Also, neural networks do not provide an explanation along with their results so some problems may not be detected at all so it’s really important to take precautions with preparing data.
Relationship between input and output
Deep learning presupposes a relationship between the input and the output. In fact, the techniques that we have looked at work because they find the patterns that link the input to the output and generalizes them enough to be able to apply the patterns to new input. When there is only a weak relationship between the input and the output, it means the neural network is trained on incorrectly information. In order to train, a convolutional neural network to see faces in photos, one has to pick relevant information to train the model. Training the network using, say, information about the colors in the photo would result in incorrect results because a photo of an arm might be read as a face simply because it has the same colors as a face. To make matters worse, if all the photos of faces used in the training have just about the same set of colors, the network might seem like it’s working correctly until or unless it is tested on a different color face. In order to prevent such a problem it’s important to carry out a sanity check to make sure that there is indeed a strong relationship between the input and the output.
The Number of Training Examples
The quantity of data that is used to train a network can affect how well the network performs. Complex and nuanced data requires a lot of examples to train than simple data. An example of complex and simple data is a 28×28 pixel color photo and a 7×7 grid respectively. The digit nine can be represented on both but the grid has only one color whereas the photo can have thousands of colors for the same digit. There are a lot less configuration on the grid for the network to learn than on the photo. With about ten grid drawing for the digit nine, one can represent all or almost all variations of the digit but for a color photo one would need hundreds if not thousands of photos to represent sufficient variations.
The model of a neuron that is the basis of Deep Learning is inspired by the human brain. As we have seen in the previous posts, the model can be extended to do interesting tasks like seeing and writing English words to mimic Shakespeare, and all of this is done using numbers. The process of converting information like words to numbers is integral to Deep Learning and it has to be carried out carefully to conserve the relationship between different pieces of information. For example, representing the words ‘house’ and ‘building’ with the numbers 20 and 5400 causes some important information to be lost, particularly the fact that the words are close together in meaning. Numbers that closer together like 20 and 23 are better at showing the closeness of the words. Better yet, we can use a series of numbers (known as a vector) to represent each word so that we have more room to show the relationship between any two words. House and building could be represented by (2, 3, 8, 1) and (2, 3, 6, 1) respectively. In fact, there is process for determining this series of numbers for each word we want to use in our neural network and that process is called word vectorization.
Sometimes the training examples are so numerous that we cannot train on all of them together. What we do instead is create smaller batches to train with, one at time ie. use one batch to nudge each parameter in the right direction, move on to the next batch and repeat. The batches have to be carefully created so that they are not skewed in one direction or another. For example, if the first few batches we use to train the convolutional neural network mentioned above have only 1% faces, then the neural network does not learn enough about faces and any changes made to the parameters will be skewed and therefore incorrect. Each batch has to represent the variety of examples found in the entire set of examples. If there are 50% photos of faces in our example or our dataset then each batch should have roughly 50% faces. If there are different kinds of faces (e.g. faces with hats and faces without hats) then it is best for the batches to reflect that variety with roughly the same proportions.
Taking these precautions with data prevents some common problems in Deep Learning. Also, understanding the sort of issues that can be caused by poor handling of data can be helpful in troubleshooting issues when they occur.