An Introduction to Convolutional Neural Networks and Deep Learning with Caffe


Neural Networks (NN) technology is one of the most used approaches in modern Artificial Intelligence (AI). It has been applied successfully to solving such problems as forecasting, adaptive control, recognition classification, and many others.

An artificial NN is a simple model of a biological brain. It consists of elements called neurons. An artificial neuron is just a simple math model of biological neuron. Because an artificial NN is modeled after the biological brain, it has similar conceptual properties such as the capability of learning.

Convolutional Neural Networks (CNN) and Deep Learning (DL) are related branches of NN computing that have been developed in recent years. CNN is a neural network with a special structure that was designed as a model of a human vision system (HVS). Thus, CNNs are most suitable for solving problems of computer vision, such as object recognition and classification of images and video data. They have also been used successfully for speech recognition and text translation.

The increasing popularity of DL technology has influenced the development of many new CNN programming frameworks. The most popular frameworks are Caffe, TensorFlow, Theano, Torch and Keras.

This article provides an introduction to using CNN and DL technology with the Caffe framework. It describes how to create a simple CNN, train it for recognizing digits on images and use the trained CNN for digit recognition. We’ll show you an example application that automatically launches the learning process for Caffe and recognizes images with the trained CNN.

Setting up the Caffe framework

Caffe is a free, open-source framework for CNN and DL. The latest version can be downloadedhere. Following instructions on the community page, you can build the framework from the provided source code.

Only the built binaries are required for training a CNN with Caffe. The main file is caffe.exe. This is the executable file for launching the process of training and testing CNNs. For simplifying the process of using Caffe we recommend copying all built binaries to a working directory, for example, D:Caffe(working)bin.

Another tool required for using a trained CNN in the example application is OpenCVSharp. This is a computer vision framework for the .NET Framework based on OpenCV. The latest release of OpenCVSharp can be found here. The release contains a simple installer that sets up all required binaries on the machine. I’m using the version for .NET Framework 4.6.1 in the example application.

The Problem of Object Recognition

Object recognition is a common task in computer vision. As CNN and DL technology is specially designed for solving such problems, we will use it as the example for demonstrating a Caffe application. The goal of object recognition is to identify an object in an image (this can be a photo or one frame from a video). This example will consider the problem of digits recognition from images. Let’s suppose we have many images with digits, written in different forms and with different fonts, even handwritten.

The task of recognition is to identify the digit on any such image. For solving the task we will follow a common scheme:

  • design the special structure of the CNN
  • collect the set of training images with digits
  • train the CNN using the set of collected training images
  • test the CNN to check its accuracy

Creating the CNN

Designing the CNN’s structure is the most complicated part of using DL technology. The structure of the NN directly affects the precision of image recognition.

A CNN consists of several layers. Each layer is, in fact, a filter that processes input data, extracting specific features of objects. There are several layer types used in CNNs. The most frequently used are:

  • Convolutional
  • ooling
  • ormalizing
  • Fully connected

Convolutional layers are the main ones responsible for feature extraction. The common structure of a CNN is the following: several succeeding convolutional layers with pooling and normalization, after which there are a couple of fully connected layers (perceptron).

The Caffe framework uses text files with the predefined format for defining the CNN’s structure. Each layer must be described in the file with its unique name. Depending on the layer type, specific values must be assigned for the layer’s properties. For example, here is the description of a convolution layer:

  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    bias_filter {
      type: "constant"
      value: 0

Other layers for the CNN can be specified with the same description differing only with the parameters specified.

The design of the CNN’s layers and their parameters is out of the scope of this article. For clarity we’ll just show the CNN structure that will be used for solving the stated problem of digit recognition:

This CNN’s structure will be used in the example application and full source of the network available with the download.

Training and Testing the CNN

The next step is training the CNN we just designed. We will use a simple utility application written with C# and WPF to launch Caffe and provide it with all the required information.

Here we suppose that we have a set of images with digits. The set must be divided into two parts: one part used for training the NN and another for testing it. In our example, we will use about one hundred images for each digit on the training stage and twenty images for each digit on the testing stage. The images are organized by folders. The training folder contains ten subfolders, one for each digit. The testing folder is organized in the same way.

To launch the Caffe framework for training it requires text files with full paths to the images and values for the digits on each of them. The utility application automatically creates the files and provides the data to Caffe as follows:

D:DigitsLearning198.png 1
D:DigitsLearning199.png 1
D:DigitsLearning2 .png 2
D:DigitsLearning21.png 2

The last requirement for launching Caffe is the solver description. It is a text file with parameters for the training process, including training and testing data information:

					net: "digits_learn_test.prototxt"
test_iter: 1000
test_interval: 1000
test_initialization: false
display: 1000
average_loss: 200
base_lr: 0.001
lr_policy: "inv"
power: 0.85
gamma: 0.0001
max_iter: 10000
momentum: 0.9
weight_decay: 0.0002
snapshot: 1000
snapshot_prefix: "digits"
solver_mode: CPU

We can now launch the training process. Here is the C# code used in the utility application:

					// Caffe parameters
string CaffeFile = Path.Combine(CaffeWorkingFolder, "caffe.exe");
string CaffeSolver = Path.Combine(launcherFolder, DigitsSolverFile);
string CaffeParam = "train --solver="+ CaffeSolver;
// Launching Caffe for training
Process caffe = new Process()
    StartInfo = new ProcessStartInfo()
        FileName = CaffeFile,
        Arguments = CaffeParam

This code provides data for launching Caffe, starts the training process and waits until the process has completed. The training process outputs its progress to the console like this:

The main output value here is the ‘accuracy’. If the value increases with iterations up to 1.0, then the learning process converges and the CNN will provide precise results of recognition. Every thousand iterations, the trained CNN is saved to the directory of the application. When the process has finished, the saved models can be used for recognizing digits.

Recognizing digits can be done using the OpenCVSharp library. Here is the C# code sample from the utility application for recognizing digit on one image:

					int digit = -1;
// recognizing digit on the image
OpenCvSharp.Dnn.Net dNet = Net.ReadNetFromCaffe(caffeProto, caffeModel);
Mat img = Cv2.ImRead(digitImage, ImreadModes.Grayscale);
using (var inputBlob = CvDnn.BlobFromImage(img, 1, new Size(28, 28)))
    dNet.SetInput(inputBlob, "Data");
    Mat prob = dNet.Forward("Prob");
    Mat probMat = prob.Reshape(1, 1);
    OpenCvSharp.Point classNumber;
    double classProb;
    Cv2.MinMaxLoc(probMat, out _, out classProb, out _, out classNumber);
    digit = classNumber.X;


In the code, a CNN is created using the provided structure and trained Caffe model. Then the NN is used for calculating output for specified image. Testing the CNN for many images gave the precision of digit recognition about 95%.

The results of testing the trained CNN show that the neural network can be successfully used for solving the stated problem. The accuracy value of 95% is good for many applications. Improving accuracy can be done in several ways. The simplest one is increasing the number of images for training. Another way is designing another structure of CNN that is more suitable for the problem.


This article gave you a brief introduction to CNNs and DL, technology successfully used for solving such problems as speech recognition, text translation, visual object recognition and classification. There are many frameworks for creating, training and using CNNs in software applications. I demonstrated Caffe here, and it’s a good example of the process of using CNNs for solving a relatively simple computer vision problem. We designed and created the special structure for the CNN, trained the CNN using a simple utility application and tested its accuracy.

To learn more about CNNs and Caffe, take a look at the Caffe tutorials and the related Caffe presentations available on the Caffe community site.

If you’re interested in developing expert technical content that performs, let’s have a conversation today.

Share on facebook
Share on twitter
Share on linkedin
Share on reddit
Share on email


If you work in a tech space and aren’t sure if we cover you, hit the button below to get in touch with us. Tell us a little about your content goals or your project, and we’ll reach back within 2 business days. 

Share via
Copy link
Powered by Social Snap