Starting with the last two posts, I decided to do a deep dive into Recurrent Neural Networks because they are a very broad topic and some of the cutting edge applications of Deep Learning use RNNs. This post is the third installment of the RNN series but it will be slightly different. I’m going to break from precedence and write in technical language in the bottom half of the post. I realized that while there is some fairly substantial learning material for building RNNs from scratch using Tensorflow, like the RNN tutorial here, the materials seems to have been written for deep learning experts, leaving a gap for beginners or even intermediates. My hope is that this post will make it easier to approach and write Tensorflow code for a basic RNN.
In the last post, I modified and ran Andrej Karpathy’s basic RNN that learns to predict the next character in a sentence from Shakespeare’s works. The RNN did not come close to composing verse but it learned the structure of sentences and paragraphs until it started producing what looked like Shakespeare-like paragraph and some english words. The RNN learned some things but I did not pick up all patterns. To fix this we could try something that we learned in regular neural networks and convolutional neural networks. We add layers. Also, it turns out that the basic RNN that I used does not do a good job of remembering clues. As the sequence gets longer, the network loses some clues and to solve this we use a special RNN called Long Short Term Memory or LSTM in short. In an LSTM, there is a second clues that does not go through layers of neurons. At each part of the sequence, an LSTM adds or subtracts some numbers from the clue. This corresponds to remembering new clues and forget some old ones.
Building An LSTM
The following is a synopsis of how to build an LSTM like the one above but for the task in the previous post which is predicting characters to make up sentences. Same as before, the network is trained on Shakespeare’s work.
- Start building the network from the input. The input is a text file of all Shakespeare’s work available here. Split the text into sentences that are 25 character long -regardless of where actual sentences start and end- and group these sentences of sequences into batches of, say, 20 each.
- Convert the character in each sentence to numbers. The character “a” could 1, ‘b’ could be 2 and so forth. From there, change the numbers to a series of zeros and ones. The character ‘a’ which is represented by one would be 00000000000000000000000001, ‘b’ would be 00000000000000000000000010 and so forth
- Join two LSTM units to each other such that the output of the first is the input of the second
- Add one layer of neurons to the output of the second LSTM unit. The purpose of this layer is to make sure that the output has enough values to represent the ones and zeros of character e.g. 00000000000000000000000100 for ‘c’. Also, this layer makes sure that the output values are between 0 and 1
- Calculate the correctness score of the network
- Train the network on one batch of inputs at a time i.e. find the values of parameters that give the highest correctness score
Here is the actual code and comments explaining each piece (work in progress). Some snippets of the code are from Tensorflow code for the RNN tutorial