Deep Learning with Sequences in Practice

In the last post I described how deep neural networks, particularly Recurrent Neural Networks (RNN), can be used on input with varying sizes or sequences. In this post I’m going to describe an example of an RNN and show the results it produces. Before I do that  I’m going to explain what is in the RNN block which I called ‘neural network’ in the previous post.

RNN Step 2

Fig 1.0 RNN: A Recurrent Neural Network with an input sequence of four letters drawn on a 7×7 grid. The neural network starts with the first input on the left, produces two result (an output, #, and a clue, s), then it moves on the next input. The neural network uses the clues to guess the word in the sequence and it reports its guesses in the output at the top.

RNN Unit

The neural network block in an RNN is called an RNN unit. It is similar the neural network from the first post with some minor differences. Instead of taking one set of input values and producing one set of result, an RNN unit takes in two values -the input and the clue- and produces two results -an output and another clue. Here is an illustrated example of the steps that the RNN in Figure 1.0 would take to produce a pair of output value and clue.

Fig. 2.0 Step 1: Convert the input from a grid drawing to a series of numbers


Fig. 2.1 Step 2: The neurons in layer 1 of the RNN unit receive both the input from the sequence and the clues about inputs seen in the earlier parts of the sequence. If the RNN unit is at the beginning of the sequence the clues are just a column of zeros because the RNN unit has not seen any inputs yet.


RNN Unit Output
Step 3: After receiving the clues and the input from the sequence, the neurons calculate the new clues. The new clues flow to the right to be used further in the sequence and to the top to be used by the next layer to produce the output. Each neuron in Layer 2 get a copy of the clue values from the layer below.

To calculate the new clue, the neurons in layer 1 multiple each value from the sequence input as well as the old clues with a parameter and add the results together. Each neuron has a parameter for each of the 49 input values and also a parameter for each of the clue values.


Sequence input:  0 x parameter1

                            + 0 x parameter2

                            + …

                            + 0 x parameter49

                        = input subtotal


Clue:                  -0.246 x parameter50

                          + 0.342 x parameter51

                          + (-1.010) x parameter52

                          + …

                        = clue subtotal


Total = input subtotal + clue subtotal + bias


Just like the neuron we have seen before, the RNN neurons apply an activation to the total. The activation that is commonly used for RNN is called tanh and it gives results that are between -1 and 1.

                   New clue = tanh ( Total )

Since each neuron in layer 1 produces one value for the new clue, there are as many values in the clue as there are neurons in layer 1.

Finally, the neurons in layer 2 produce an output, but unlike the previous layer, no activation used.


Char RNN

Using the three steps above, I used Andrej Karpathy’s code to train an RNN to guess the next character or letter of a sentence given the first letter. I trained the RNN using works written by Shakespeare, so the RNN should hopefully produce sentence that look like a Shakespeare’s work. The first letter of the sentence is fed into the input of the RNN unit to get it to guess the next letter. Whatever the RNN unit guesses as the next letter is also fed into the RNN unit along with the clue from the previous letter. This process can be repeated as often as desired.


Before I show the results, here is a snippet of the text used to train the RNN:



                         SCENE XIV.

                    CLEOPATRA’S palace

                  Enter ANTONY and EROS

 ANTONY. Eros, thou yet behold’st me?

 EROS. Ay, noble lord.

 ANTONY. Sometime we see a cloud that’s dragonish;

   A vapour sometime like a bear or lion,

   A tower’d citadel, a pendent rock,

   A forked mountain, or blue promontory

   With trees upon’t that nod unto the world

   And mock our eyes with air. Thou hast seen these signs;

   They are black vesper’s pageants.

 EROS. Ay, my lord.

 ANTONY. That which is now a horse, even with a thought

   The rack dislimns, and makes it indistinct,

   As water is in water.

 EROS. It does, my lord.


After the first rounds of training, the RNN produced gibberish:


}>T1iOo yF}gf1;Fl A1sF8F”1I1?a&FU1>1. LFjF61) &Fd d1>]<F0Fpn}1C;tvT1DGXFyFbUq1″FP1HF)Fx1?KL1wzaFSFjF.1u1E,nF5n.1WF) J1|Fxkv19Fx7C7& SFE1n:wF” S )F(SD1|1AFWF>Fe1GFp T}m 71:F61HFl1K1lFf h1″FQZ`pGrv1! 6R


Slowly, the gibberish started to change to nonsense words after 1100 cycles of training. Paragraphs started to appear and the length of the lines started to resemble the source text:


k ouw nh dro la a  ios,
hy,  h i e    nqG r eeeu-omo gt
og yus o gys isJo  e s oked
te thlot
s e sa Fnaheur esrb nLatea te t b  h s,a e l nlea nuer Wtee
iy ol   

enng o g ls<s a d ono tf r vh


After a couple hours, some words appear and formatting of sentences and paragraphs starts to resemble the original text.


Cleret will difdens you letteadle th’ll that splech candond am the bead. Iy brides forcengary meards!

   by grebhounch: genition amendal this the, are so suck.
   And evers live her wands your of that thai’d wisburicuspsting kny,
   Nis. Heads
   Whit thou printious to whille is a fanod
   Weel rithe fitil’d ‘As yet Hat?

 KING furrsiav to of to tro.

   Sorstemstyen And will
   Roors, buh,
   As des his truplly bacicholes this gent your of trown-et? Havild?

It appears the RNN is learning so much simply from picking up patterns in the order of characters. I will share the RNN so you can try it for yourself. For the next post, I will use a more powerful RNN to get better results faster.

Leave a Comment