Getting through Deep Learning – CNNs (part 1)

The number of available open source libraries making Deep learning easier to use is spreading fast as hype continuous to build. However, without understanding the background principles, it just feels like poking around a black box.

In this post (or several, most likely) will try to give an introduction to Convolution Neural Networks (CNNs). Note that, for the sake of brevity, I assume that you already know the basics about Neural Networks. If not, I would suggest you go through the following introduction.

This post is part of a tutorial series:

  1. Getting through Deep Learning – CNNs (part 1)
  2. Getting through Deep Learning – TensorFlow intro (part 2)
  3. Getting through Deep Learning – TensorFlow intro (part 3)

Disclaimer: this post uses images and formulas from distinct sources. I would suggest to have a look over the complete list of sources at the end of the post, as usual.


In 1958 and 1959 David H. Hubel and Torsten Wiesel performed a series of experiments, whereby they concluded that many neurons in the visual cortex focus on a limited region in the vision field.

This insight provided the notion of a local receptive field – a narrow sub-region of what is available in the whole visual field which serves as input – thus giving rise for a different architecture than the previously fully connected neural network architecture.

Basics – Convolution Layer

The first thing to realize is that Convolution networks are simply the application of “mini-neural networks” to segments of input space. In the case of images, that results in that neurons in the first convolutional layer are not connected to every single pixel in their Receiptive Field (RF).  The following image (source) shows an illustration of how a a a convolution layer is built using an image from the famous MNIST dataset – whereby the goal consists in identifyying the digits from handwritten numbers pictures.



Continue reading “Getting through Deep Learning – CNNs (part 1)”

Check out OpenFace

If you’re interested in using machine learning (ML) on image and video datasets, then you might be interested in heaving a look on a relatively new project called OpenFace (first released in October 2015), with  Brandon Amos, Ludwiczuk Bartosz and Mahadev Satyanarayanan as authors.

TL;DR: For the impatient

  • Pitch me: Open source project (aka free for you to use) developed inside Carnegie Mellon University  for face recognition with deep neural networks, with a Python API
  • What do I get from it: improved accuracy and reduced training time
  • Need to see to believe (and so one should)? You can start playing with it with Docker, and check the provided demos

What about it

Even though face recognition research has already started since the 1970s, it is still far from stagnant. The usual strategy for solving the problem has been divided into three main steps; given an image with a set of faces, first run face detection algorithm to isolate the faces from the rest, then preprocess this cropped part to reduce the high dimensionality, and finally classification. However, what makes this whole process so challenging is that many factors can create noise around this process, such as images can be taken from different angles, different lighting conditions, the face itself also suffers changes throughout time (for example due to age or style), etc.

Now one important fact to point out is that state of the art top performing algorithms are using convolutional neural networks. OpenFace is inspired by Facebook’s DeepFace and (mainly) Google’s FaceNet systems. The performance smack down that the authors present using the “Labeled Faces in the wild” dataset (LFW) for eveluation, and achieved some interesting results.

Another interesting point is that, as the authors state, the implementation is tuned for using the model in mobile devices, so the  “[…] key design consideration is a system that gives high accuracy with low training and prediction times“.

Note: In case you are wondering what’s the difference to OpenBiometrics (OpenBR). As stated by the authors of OpenFace in HackerNews, the main difference lies on the approach taken – deep convolutional networks – and could potentially be integrated into OpenBR’s pipeline.

Internal Guts

As you might imagine (as any image/video processing package), dependencies are complex and time consuming, so prepare yourself for some dependencies troubleshooting in case your machine is still new to this world.

The project’s API is written in Python 2 – entry point here – given its dependencies on OpenCV and DLib. OpenCV provides the computer vision base, DLib enhances OpenCV face detection ability, numpy for matrix algebra operations and scikit-learn for classification operations.

For training the convolutional network openface uses Torch, Lua and Luajit which is written in Lua programming language. In this case, Torch allows the neural networks to be executed either in CPU or CUDA enabled GPUs.

The following illustration was extracted from the recent technical report “OpenFace: A general-purpose face recognition

library with mobile applications“, by the authors, and provides interesting insight:


So important to note is that you do have the option to use already pretrained models (which use the CASIA-WebFace and FaceScrub databases) to help with face detection, which you can find in the models directory. The provided bash script downloads them.

Where to get started

To setup either locally or with Docker you can check the provided documentation.

Finally, you might also be interested in having a look at other projects using deep neural networks for face recognition:  Visual Geometry Group (VGG) Face Descriptor, and Lightened Convolutional Neural Networks (CNNs)