Implement LSTM Network using Python with TensorFlow and Keras for prediction and classification

A powerful type of Recurrent Neural Networks – Long Short-Term Memory (LSTM) is not only transmitting output information to the next time step, but they are also storing and transmitting the state of the so-called LSTM cell. This cell contains four neural networks – gates that determine which information is stored in the cell state and pushed to output. As a result, the output of the network at a one-time step is dependent on n previous time steps rather than just the previous time step.

In this article, we will look at two similar language modeling problems and see how they can be solved using two different APIs. To begin, we will build a network that can predict words based on the provided text, and we will use TensorFlow for this. In the second implementation, we will use Keras to classify reviews from the IMDB dataset.

Implement a Model with Tensorflow

The DataHandler class in DataHandler.py will be used. This class serves two functions: it loads data from a file and assigns a number to each symbol. The code is as follows:

import numpy as np
import collections

class DataHandler:
    def read_data(self, fname):
        with open(fname) as f:
            content = f.readlines()
        content = [x.strip() for x in content]
        content = [content[i].split() for i in range(len(content))]
        content = np.array(content)
        content = np.reshape(content, [-1, ])
        return content
    
    def build_datasets(self, words):
        count = collections.Counter(words).most_common()
        dictionary = dict()
        for word, _ in count:
            dictionary[word] = len(dictionary)
        reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
        return dictionary, reverse_dictionary

We will create a new class RNNGenerator in rnn_generator.py that can generate an LSTM network based on the parameters passed in.

import tensorflow as tf
from tensorflow.contrib import rnn

class RNNGenerator:
    def create_LSTM(self, inputs, weights, biases, seq_size, num_units):
        # Reshape input to [1, sequence_size] and split it into sequences
        inputs = tf.reshape(inputs, [–1, seq_size])
        inputs = tf.split(inputs, seq_size, 1)
    
        # LSTM with 2 layers
        rnn_model = rnn.MultiRNNCell([rnn.BasicLSTMCell(num_units),rnn.BasicLSTMCell(num_units)])
    
        # Generate prediction
        outputs, states = rnn.static_rnn(rnn_model, inputs, dtype=tf.float32)
    
        return tf.matmul(outputs[–1], weights['out']) + biases['out']

TensorFlow itself, as well as the RNN class from tensorflow.contrib, were imported. We will use our LSTM Network, which is a subtype of RNNs, to create our model. First, we reshaped our input and then divided it into three-symbol sequences. The model was then created.

Using the BasicLSTMCell method, we created two LSTM layers. The parameter num units specify the number of units in each of these layers. Aside from that, we use MultiRNNCell to integrate these two layers into a single network. The static RNN method was then used to build the network and generate predictions.

Finally, we will employ the SessionRunner class in session_runner.py. This class contains the environment used to run and evaluate our model. Here’s how the code works:

import tensorflow as tf
import random
import numpy as np

class SessionRunner():
    training_iters = 50000
            
    def __init__(self, optimizer, accuracy, cost, lstm, initilizer, writer):
        self.optimizer = optimizer
        self.accuracy = accuracy
        self.cost = cost
        self.lstm = lstm
        self.initilizer = initilizer
        self.writer = writer
    
    def run_session(self, x, y, n_input, dictionary, reverse_dictionary, training_data):
        
        with tf.Session() as session:
            session.run(self.initilizer)
            step = 0
            offset = random.randint(0, n_input + 1)
            acc_total = 0
        
            self.writer.add_graph(session.graph)
        
            while step < self.training_iters: if offset > (len(training_data) - n_input - 1):
                    offset = random.randint(0, n_input+1)
        
                sym_in_keys = [ [dictionary[ str(training_data[i])]] for i in range(offset, offset+n_input) ]
                sym_in_keys = np.reshape(np.array(sym_in_keys), [-1, n_input, 1])
        
                sym_out_onehot = np.zeros([len(dictionary)], dtype=float)
                sym_out_onehot[dictionary[str(training_data[offset+n_input])]] = 1.0
                sym_out_onehot = np.reshape(sym_out_onehot,[1,-1])
        
                _, acc, loss, onehot_pred = session.run([self.optimizer, self.accuracy, self.cost, self.lstm], feed_dict={x: sym_in_keys, y: sym_out_onehot})
                acc_total += acc
                
                if (step + 1) % 1000 == 0:
                    print("Iteration = " + str(step + 1) + ", Average Accuracy= " + "{:.2f}%".format(100*acc_total/1000))
                    acc_total = 0
                step += 1
                offset += (n_input+1)

Our model is being run through 50000 iterations. We injected the model, optimizer, loss function, and other information into the constructor so that the class could use it. Naturally, the first step is to slice up the data in the provided dictionary and generate encoded outputs. In addition, we are introducing random sequences into the model to avoid overfitting. The offset variable handles this. Finally, we’ll run the session to determine accuracy. Don’t be confused by the final if statement in the code; it’s just for show (at every 1000 iterations present the average accuracy).

Our main script main.py combines all of this into one, as shown below:

import tensorflow as tf
from DataHandler import DataHandler
from RNN_generator import RNNGenerator
from SessionRunner import SessionRunner

log_path = '/output/tensorflow/'
writer = tf.summary.FileWriter(log_path)

# Load and prepare data
data_handler = DataHandler()

training_data =  data_handler.read_data('meditations.txt')

dictionary, reverse_dictionary = data_handler.build_datasets(training_data)

# TensorFlow Graph input
n_input = 3
n_units = 512

x = tf.placeholder("float", [None, n_input, 1])
y = tf.placeholder("float", [None, len(dictionary)])

# RNN output weights and biases
weights = {
    'out': tf.Variable(tf.random_normal([n_units, len(dictionary)]))
}
biases = {
    'out': tf.Variable(tf.random_normal([len(dictionary)]))
}

rnn_generator = RNNGenerator()
lstm = rnn_generator.create_LSTM(x, weights, biases, n_input, n_units)

# Loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=lstm, labels=y))
optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001).minimize(cost)

# Model evaluation
correct_pred = tf.equal(tf.argmax(lstm,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
initilizer = tf.global_variables_initializer()

session_runner = SessionRunner(optimizer, accuracy, cost, lstm, initilizer, writer)
session_runner.run_session(x, y, n_input, dictionary, reverse_dictionary, training_data)

The content of meditations.txt is “In a sense, people are our proper occupation. Our job is to do them good and put up with them. But when they obstruct our proper tasks, they become irrelevant to us—like sun, wind, and animals. Our actions may be impeded by them, but there can be no impeding our intentions or our dispositions. Because we can accommodate and adapt. The mind adapts and converts to its own purposes the obstacle to our acting. The impediment to action advances action. What stands in the way becomes the way .”

We run the code and get accuracy above 95% with iteration 50000

Implement a model with Keras

This TensorFlow example was straightforward and simple. We used a small amount of data, and the network learned this fairly quickly. What if we have a more complicated issue? Assume we want to categorize the sentiment of each movie review on a website. Fortunately, there is already a dataset dedicated to this issue – The Large Movie Review Dataset (often referred to as the IMDB dataset).

Stanford researchers collected this dataset in 2011. It includes 25000 movie reviews (both positive and negative) for training and the same number of reviews for testing. Our goal is to build a network that can determine which reviews are positive and which are negative.

The power of Keras is that it abstracts a lot of what we had to worry about while using TensorFlow. However, it provides us with less flexibility. Of course, everything has a cost. So, let’s begin by importing the necessary classes and libraries.

There is a slight difference in imports between examples where we used standard ANN and examples where we used Convolutional Neural Network. We brought in Sequential, Dense, and Dropout. Nonetheless, we can see a couple of new imports. We used Embedding and LSTM from keras.layers. As you might expect, LSTM is used to create LSTM layers in networks. In contrast, embedding is used to provide a dense representation of words.

This is an interesting technique for mapping each movie review into a real vector domain. Words are encoded as real-valued vectors in a high-dimensional space, with similarity in meaning corresponding to closeness in the vector space.

We are loading the top 1000 words dataset. Following that, we must divide the dataset and generate and pad sequences. This is accomplished by utilizing sequence from keras.preprocessing

from keras.preprocessing import sequence 
from keras.models import Sequential 
from keras.layers import Dense, Dropout, Embedding, LSTM 
from keras.datasets import imdb 

num_words = 1000 
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=num_words) 

X_train = sequence.pad_sequences(X_train, maxlen=200) 
X_test = sequence.pad_sequences(X_test, maxlen=200)

# Define network architecture and compile 
model = Sequential() 
model.add(Embedding(num_words, 50, input_length=200)) 
model.add(Dropout(0.2)) 
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2)) 
model.add(Dense(250, activation='relu')) 
model.add(Dropout(0.2)) 
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 


model.fit(X_train, y_train, batch_size=64, epochs=10) 

print('\nAccuracy: {}'. format(model.evaluate(X_test, y_test)[1]))

We used the number 200 in the padding to indicate that our sequences will be 200 words long. This is how the training input data looks:

Sequential is used for model composition, as we have seen in previous articles. The first layer added to it is Embedding, which we discussed in the previous chapter. We added one LSTM layer after the word embedding was completed. Finally, because this is a classification problem, we add a dense layer with a sigmoid function to determine whether the review was good or bad. Finally, the model is compiled using binary cross-entropy and the Adam optimizer.

We got an accuracy of 85.05%

When developing LSTM networks, we observed two approaches. Both approaches dealt with simple problems and employed a different API. As can be seen, TensorFlow is more detailed and flexible; however, you must take care of a lot more details than when using Keras. Keras is simpler and easier to use, but it lacks the flexibility and possibilities that pure TensorFlow provides. Both of these examples produced acceptable results, but they could have been better. Especially in the second example, where we typically use a combination of CNN and RNN to improve accuracy, but that is a topic for another article.

Training and deploying machine learning models with TensorFlow.js JavaScript Library

Deep learning and machine learning algorithms can be executed on Javascript with TensorFlow.js. In the article, we will introduce how to define, train and run your machine learning models using the API of TensorFlow.js.

With a few lines of Javascript, developers can implement pre-trained models for complex tasks such as visual recognition, generating music, or human poses detection. Node.js allows TensorFlow.js can be used in backend Javascript applications rather than use Python.

TensorFlow Javascript

TensorFlow is a popular open-source software library for machine learning applications. Many neural networks and other deep learning algorithms use the TensorFlow library, which was originally a Python library released by Google in November 2015. TensorFlow can use either CPU or GPU-based computation for training and evaluating machine learning models. The library was originally developed to operate on high-performance servers with GPUs.

In May 2017, Tensorflow Lite, a lightweight version of the library for mobile and embedded devices, was released. This was accompanied by MobileNet, a new series of pre-trained deep learning models for vision recognition tasks. MobileNet models were created to perform well in resource-constrained environments such as mobile devices.

TensorFlow.js follow TensorFlow Lite, was announced in March 2018. This version of the library was built on an earlier project called deeplearn.js and was designed to run in the browser. WebGL allows for GPU access to the library. To train, load, and run models, developers use a JavaScript API.

TensorFlow.js is a JavaScript library that can be used to train and deploy machine learning models in the browser and Node.js. TensorFlow.js was recently extended to run on Node.js by using the tfjs-node extension library.

Are you familiar with concepts such as Tensors, Layers, Optimizers, and Loss Functions (or willing to learn them)? TensorFlow.js is a JavaScript library that provides flexible building blocks for neural network programming.

Learn how to use TensorFlow.js code in the browser or Node.js to get started.

Get Setup

Importing Existing Models Into TensorFlow.js

The TensorFlow.js library can be used to run existing TensorFlow and Keras models. Models must be converted to a new format using this tool before they can be executed. Github hosts pre-trained and converted models for image classification, pose detection, and k-nearest neighbors are available on Github.

Learn how to convert pre-trained Python models to TensorFlow.js here.

Learn by looking at existing TensorFlow.js code. tfjs-examples provide small code examples that use TensorFlow.js to implement various ML tasks. See it on GitHub

Loading TensorFlow Libraries

TensorFlow’s JavaScript API is accessible via the core library. Node.js extension modules do not expose any additional APIs.

const tf = require('@tensorflow/tfjs')
// Load the binding (CPU computation)
require('@tensorflow/tfjs-node')
// Or load the binding (GPU computation)
require('@tensorflow/tfjs-node-gpu')

Loading TensorFlow Models

TensorFlow.js includes an NPM library (tfjs-models) to make it easier to load pre-trained and converted models for image classification, pose detection, and k-nearest neighbors.

The MobileNet image classification model is a deep neural network that has been trained to recognize 1000 different classes.

The following example code is used to load the model in the project’s README.

import * as mobilenet from '@tensorflow-models/mobilenet';

// Load the model.
const model = await mobilenet.load();

One of the first issues I ran into was that this does not work on Node.js.

Error: browserHTTPRequest is not supported outside the web browser.

The mobilenet library is a wrapper around the underlying tf.Model class, according to the source code. When the load() method is invoked, the correct model files are automatically downloaded from an external HTTP address and the TensorFlow model is instantiated.

The Node.js extension does not yet support HTTP requests to retrieve models dynamically. Models must instead be manually loaded from the filesystem.

After reading the library’s source code, I was able to devise a workaround…

Loading Models From a Filesystem

If the MobileNet class is created manually, rather than calling the module’s load method, the auto-generated path variable containing the model’s HTTP address can be overwritten with a local filesystem path. After that, calling the load method on the class instance will invoke the filesystem loader class rather than the browser-based HTTP loader.

const path = "mobilenet/model.json"
const mn = new mobilenet.MobileNet(1, 1);
mn.path = `file://${path}`
await mn.load()

MobileNet Models

TensorFlow.js models are made up of two file types: a model configuration file in JSON and model weights in binary format. Model weights are frequently sharded into multiple files for better browser caching.

The automatic loading code for MobileNet models retrieves model configuration and weight shards from a public storage bucket at this address.

https://storage.googleapis.com/tfjs-models/tfjs/mobilenet_v${version}_${alpha}_${size}/

The URL template parameters refer to the model versions listed here. On that page, the classification accuracy results for each version are also displayed.

According to the source code, the tensorflow-models/mobilenet library can only load MobileNet v1 models.

The HTTP retrieval code loads the model.json file from this location and then recursively retrieves all model weights shards that are referenced. These files are in the groupX-shard1of1 format.