Machine Learning

TensorFlow.js Crash Course – Machine Learning For The Web – Handwriting Recognition


Subscribe On YouTube

DemoCode

Welcome to the second episode of the CodingTheSmartWay.com TensorFlow.js Crash Course for absolute beginners.

In the first part TensorFlow.js Crash Course – Machine Learning For The Web – Getting Started we’ve covered the following topics:

  • What TensorFlow.js is
  • How TensorFlow.js is added to your web application
  • How TensorFlow.js can be used to add machine learning capabilities to your web application

In this part we’re going one step further and will explore another use case: the recognition of handwritten digits. Therefore it is assumed that you’re familiar with the the basic building blocks of TensorFlow.js which have been introduced in the first episode.


If you like CodingTheSmartWay, then consider supporting us via Patreon. With your help we’re able to release developer tutorial more often. Thanks a lot!


What We’re Going To Build

Let’s take a look at the application which we’re going to build in this tutorial. The application will use the MNIST data set train a neural network. The model is being built and trained when the website is loaded. The progress can be seen in the Log Output area:


Once the training procedure is completed the user is informed with the message “Training complete” and the button in the Predict area is activated.


Pressing the button randomly selects one dataset fromn the MNIST data source to perform a prediction with the trained model. The output looks like the following:


The image of the handwritten digit is presented, the original value and the predicted value is outputted. If the prediction is correct the text “Value recognized successfully” is visible as well. This shows us, that the trained neural network was able to recognize the digit from the image correctly.

The user is able to use the button multiple times. The output is extended as you can see in the following screenshot:

The MNIST Database Of Handwritten Digits

MNIST is a data set which contains the images of handwritten digits from 0–9. We’ll use that database of images to train the model of our application. Furthermore we’ll make use of randomly selected images from the MINSIT data set to test if the neural network is able to perform predictions.

Preparing The Project

Again let’s start with setting up the project by creating a new folder:

$ mkdir tfjs02

Change into that newly created project folder:

$ cd tfjs02

Inside the folder we’re now ready to create a package.json file, so that we’re able to manage dependencies by using the Node.js Package Manager:

$ npm init -y

Because we’ll be installing the dependencies (e.g. the Tensorflow.js library) locally in our project folder we need to use a module bundler for our web application. To keep things as easy as possible we’re going to use the Parcel web application bundler because Parcel is working with zero configuration. Let’s install the Parcel bundler by executing the following command in the project directory:

$ npm install -g parcel-bundler

Next, let’s create two new empty files for our implementation:

$ touch index.html index.js

Finally let’s add the Bootstrap library as a dependency because we will be using some Bootstrap CSS classes for our user interface elements:

$ npm install bootstrap

Now let’s add further depependencies to the project to make sure that we’re able to use latest EcmaScript features like async / await:

npm install --save-dev babel-plugin-transform-runtime babel-runtime

Create a .babelrc file and add:

{
    "plugins": [
        ["transform-runtime",
        {
            "polyfill": false,
            "regenerator": true
        }]
    ]
}

Last but not least we do not need to forget to install TensorFlow.js as well:

$ npm install @tensorflow/tfjs

Building The Convolutional Neural Network Model

Creating A Sequential Model Instance

Before we’re starting to built the convolutional neural network model we’re defining a variable model which will hold the model and a function createModel which will contain the code which is needed to create and compile the machine learning model:

var model;

function createModel() {
    // Insert the following pieces of code here
}

Let’s first create the sequential model instance as already learned in episode 1 of this series and insert the following code in function createModel:

createLogEntry('Create model ...');
model = tf.sequential();
createLogEntry('Model created');

Additionally we’re making use of a function named createLogEntry. This function is will be implemented later on and is used to output text messages to the Log Output area.

Adding The First Layer

First, let’s add a two-dimensional convolutional layer by using the following code:

createLogEntry('Add layers ...');
model.add(tf.layers.conv2d({
  inputShape: [28, 28, 1],
  kernelSize: 5,
  filters: 8,
  strides: 1,
  activation: 'relu',
  kernelInitializer: 'VarianceScaling'
}));

The layer is created via tf.layers.conv2d. The configure the layer a configuration object is passed as a parameter to this method. The new layer is added to the model by passing it into the call of the method model.add.

The configuration object which is passed to conv2d is containing six configuration properties in total:

  • inputShape: This is the shape of the input data of the first layer. The MNIST data is containing images of shape 28×28-pixels. The color of the pixels can just be black and write, so we’re assigning the shape [28, 28, 1] here.
  • kernelSize: The kernelSize value if the size of the filter window of the convolutional layer which is applied to the input data. We’re using the value 5 here to define a square filter windows of 5×5 pixels.
  • filters: This is the number of filter windows (of size kernelSize) which are applied to the input data.
  • strides: This value specifies by how many pixels the filter window is sliding over the input image.
  • activation: The activation function which is applied to the data once the filter windows have been applied. Here we’re using the Reactified Linear Unit (ReLU) funtion, which is a very common activation function in machine learning.
  • kernelInitializer: We’re using VarianceScaling (which is a common initializer) to initialize the model weight’s.

Adding The Second Layer

The next layer we’re going to add to our neural network model is a two dimensional max pooling layer. We’re using that layer to down-sample the image so it is half the size of the input from the previous layer by defining the max pooling layer in the following way:

model.add(tf.layers.maxPooling2d({
  poolSize: [2, 2],
  strides: [2, 2]
}));

The layer is configured by passing over a configuration object with two configuration properties:

  • poolSize: This is the size of the sliding windows (2×2 pixels) which is applied to the input.
  • strides: This value specifies by how many pixels the filter window is sliding over the input image.

Since both values are set to 2,2, the pooling windows is completely non-overlapping. As a result this will cut the size of the input from the previous layer in half.

Adding Another Convolutional Layer

A common pattern in convolutional neural network models used for image recognition is to repeat the first convolutional layer and the second max pooling layer. So let’s add again a two dimensional convolutional layer as the third layer in our model:

model.add(tf.layers.conv2d({
  kernelSize: 5,
  filters: 16,
  strides: 1,
  activation: 'relu',
  kernelInitializer: 'VarianceScaling'
}));

This time we do not need to define the input shape because the shape is determined by the output shape of the previous layer automatically.

Adding Another MaxPooling Layer

The fourth layer is again a max pooling layer to further down-sample the result:

model.add(tf.layers.maxPooling2d({
  poolSize: [2, 2],
  strides: [2, 2]
}));

Adding A Flatten Layer

Having repeated the pattern of a convolutional layer and a max pooling layer a second time brings us now to the point to add a flatten layer as the fifth layer in our model:

model.add(tf.layers.flatten());

This layer will flatten the output from the previous layer to a vector.

Adding A Dense Layer (Fully Connected Layer) For Performing The Final Classification

The final layer which is added to out model is a dense layer (fully connected layer). This layer will perform the final classification:

model.add(tf.layers.dense({
  units: 10,
  kernelInitializer: 'VarianceScaling',
  activation: 'softmax'
}));
createLogEntry('Layers created');

The dense layer configuration consists of the following properties:

  • units: The size of the output. As we’d like to do a 10-class classification to predict digitis between zero and nine (0-9) we’re setting the value to 10.
  • kernelInitializer: Set to VarianceScaling
  • activation: The activation function which is used for classification. The softmax activation function creates a propability distribution over the 10 classes.

Compiling The Model

All needed layers have been added to the model. Before we’re going to train the model with MNIST data sets we need to make sure that the model is compiled:

createLogEntry('Start compiling ...');
model.compile({
    optimizer: tf.train.sgd(0.15),
    loss: 'categoricalCrossentropy'
});
createLogEntry('Compiled');

The object which is passed to the call of model.compile is containing two properties:

  • optimizer: The convolutional neural network model will make use of a SGD (Stochastic Gradient Descent) optimizer with a learning rate of 0.15.
  • loss: As loss function we choose categoricalCrossentropy which is often used for classification tasks.

Loading The Data And Train The Model

Let’s start training the model with MNIST data sets of handwritten digits. To access the MNIST data from a remote server we’re using the MnistData class from the project https://github.com/tensorflow/tfjs-examples/tree/master/mnist. To make that class available just download the file data.js from that repository and insert that file in our project directory. In index.js use the following import statement to make the MnistData class available:

import {MnistData} from './data';

The data should be kept in a variable named data. A load function is added to our application to load the data by calling the MnistData method load:

let data;
async function load() {
    createLogEntry('Loading MNIST data ...');
    data = new MnistData();
    await data.load();
    createLogEntry('Data loaded successfully');
}

With the MNIST data records available we’re now ready to prepare for training. Let’s first define two constants:

const BATCH_SIZE = 64;
const TRAIN_BATCHES = 150;

The training will not be performed in one operation. Instead we’ll perform the training in batches of data. The size of the batch and the number of batches to be trained is defined by those constants. The training logic is encapsulated in function train:

async function train() {
    createLogEntry('Start training ...');
    for (let i = 0; i < TRAIN_BATCHES; i++) {
        const batch = tf.tidy(() => {
            const batch = data.nextTrainBatch(BATCH_SIZE);
            batch.xs = batch.xs.reshape([BATCH_SIZE, 28, 28, 1]);
            return batch;
        }); 
        
        await model.fit(
            batch.xs, batch.labels, {batchSize: BATCH_SIZE, epochs: 1}
        );

        tf.dispose(batch);

        await tf.nextFrame();
    }
    createLogEntry('Training complete');
}

Implementing the User Interface

In the next step let’s add the HTML / CSS code which is needed to implement the user interface of our application in index.html:

<html>
<body>
    <style>
        .prediction-canvas{
            width: 100px;
            margin: 20px;
        }
        .prediction-div{
            display: inline-block;
            margin: 10px;
        }
    </style>
    <div class="container" style="padding-top: 20px">
        <div class="card">
            <div class="card-header">
                <strong>TensorFlow.js Demo - Handwriting Recognition</strong>
            </div>
            <div class="card-body">
                <div class="card">
                    <div class="card-body">
                        <h5 class="card-title">Log Output:</h5>
                        <div id="log"></div>
                    </div>
                </div>
                <br>
                <div class="card">
                    <div class="card-body">
                        <h5 class="card-title">Predict</h5>
                        <button type="button" class="btn btn-primary" id="selectTestDataButton" disabled>Please wait until model is ready ...</button>
                        <div id="predictionResult"></div>
                    </div>
                </div>
            </div>
        </div>
    </div>

    <script src="./index.js"></script>
</body>
</html>

Here we’re making use of various Bootstrap CSS classes.

For the output which is written to the log output area a <div> element is added as a placeholder with ID log. The button which the user can use to perform a prediction based on a randomly selected MINST data set gets assigned the id selectTestDataButton.

The output area for the prediction result is the <div> element with ID predictionResult.

The createLogEntry function has already been used several times to output messages in the log area. Now let’s add the missing implementation of that function in index.js as well:

function createLogEntry(entry) {
    document.getElementById('log').innerHTML += '<br>' + entry;
}

Finally let’s bring everything in order and implement function main to call createModel, load and train.

async function main() {
    createModel();
    await load();
    await train();
    document.getElementById('selectTestDataButton').disabled = false;
    document.getElementById('selectTestDataButton').innerText = "Randomly Select Test Data And Predict";
}

main();

Furthermore we’re making sure that the button is enabled after the training procedure has been performed successfully.

Prediction

Let’s move on to the final task and add the code which is needed to perform the predict based on our trained convolutional neural network. Therefore we’re adding the predict function in the following way:

async function predict(batch) {
    tf.tidy(() => {
        const input_value = Array.from(batch.labels.argMax(1).dataSync());
        
        const div = document.createElement('div');
        div.className = 'prediction-div';

        const output = model.predict(batch.xs.reshape([-1, 28, 28, 1]));
        const prediction_value = Array.from(output.argMax(1).dataSync());

        const image = batch.xs.slice([0,0], [1, batch.xs.shape[1]]);
        
        const canvas = document.createElement('canvas');
        canvas.className = 'prediction-canvas';
        draw(image.flatten(), canvas);

        const label = document.createElement('div');
        label.innerHTML = 'Original Value: ' + input_value;
        label.innerHTML += '<br>Prediction Value: ' + prediction_value;
        console.log(prediction_value + '-' + input_value);
        if (prediction_value - input_value == 0) {
            label.innerHTML += '<br>Value recognized successfully!';
        } else {
            label.innerHTML += '<br>Recognition failed!';
        }
            
        div.appendChild(canvas);
        div.appendChild(label);
        document.getElementById('predictionResult').appendChild(div);
    });
}

Part of the output is the image of the handwritten digit. The draw the image we’re making use of the custom draw function. The implementation of that function needs to be added to index.js as well:

function draw(image, canvas) {
    const [width, height] = [28, 28];
    canvas.width = width;
    canvas.height = height;
    const ctx = canvas.getContext('2d');
    const imageData = new ImageData(width, height);
    const data = image.dataSync();
    for (let i = 0; i < height * width; ++i) {
      const j = i * 4;
      imageData.data[j + 0] = data[i] * 255;
      imageData.data[j + 1] = data[i] * 255;
      imageData.data[j + 2] = data[i] * 255;
      imageData.data[j + 3] = 255;
    }
    ctx.putImageData(imageData, 0, 0);
}

Finally we need to add the click event handler function for the selectTestDataButton:

document.getElementById('selectTestDataButton').addEventListener('click', async (el, ev) => {
    const batch = data.nextTestBatch(1);
    await predict(batch);
});

Inside this function we’re using method nextTestBatch from the MnistData class to retrieve a batch of test data of size 1 (which means that only one data set is included). Next we’re calling the asynchronous predict function by using the keyword await and passing of the test data set.

TensorFlow Online Courses

Learn how to use Google’s Deep Learning Framework – TensorFlow – with the best online courses available. Become a Deep Learning Guru today!

  • Understand how Neural Networks Work
  • Use TensorFlow for Classification and Regression Tasks

  • Use TensorFlow for Image Classification with Convolutional Neural Networks

Check out the top TensorFlow developer online courses!

Go To Courses

Using and writing about best practices and latest technologies in web design & development is my passion.