Big Data and Machine Learning (ML) technologies gives companies the ability to develop sophisticated, optimized AI models to build amazing new applications and services such as real-time language translators (which use inputs to learn various languages), and facial recognition programs trained on huge databases of image data.

That’s all well and good for big companies, but how can smaller businesses — or even individuals like you and me — take advantage of ML? The barrier to entry has included knowledge of Neural Networks, Decision Trees, and other ML algorithms. Until recently, that meant a lot of effort to learn, and even more effort to put together the development tools, code and trained algorithms.

Well you’re in for some good news! Google recently introduced a new Firebase SDK called ML Kit that provides an easy way for you to use ML technology in your app. In this article we’re going to create an example app that allows users to take a picture, hand it off to ML Kit to do face detection, and play with the data that ML Kit will provide, in order to outline the features of the face. You’ll see how easy it is to harness ML technologies in your own apps.

Introducing ML Kit

Firebase is a development platform for mobile and web applications that provides easy access to Google infrastructure to help build applications quickly. ML Kit (currently in beta) provides code and APIs that help you build ML capabilities into mobile applications.

Say you want to create your own ML-powered application. You would have to create your own ML model to process your inputs (images in this case), which is a long and involved process. For example, let’s say you want to create an ML model to recognize an image of an apple:

  1. Figure out how to represent an image. Images can be thought of as a 2D array of pixels, so potentially, we could convert our image to a 2D array of RGB values that we can later feed into our ML algorithms.
  2. Gather data, which in this case is images of apples that we will use to train and test our ML model.
  3. Prepare our data. You might want to go through all of the images that you have collected and make sure that the images are clear, have different types of apples, and make sure they’re in different positions, so that we can create variance in our data set.
  4. For simplicity sake, we’ll split our data set in half, one half for training our model and the other half for testing the accuracy of our model.
  5. Now the hardest part. We need to select (or invent) our own ML model that will process our images using our own algorithm to identify features of the image that might help identify it as an object. For example, detecting red pixels and a round object can be a good indicator if an image contains an apple.
  6. Start training our model with our images and so the model will gradually build up an “equation” based off of your data that you can use to determine if any image is an apple.

A lot of time is spent on steps 5 and 6, where we try to fine tune our model to identify better features and then test to see how it affects our accuracy. What we talked about above is a very simple, high-level list of the work involved in creating our model. There are a lot of other steps involved in creating an accurate and reliable ML model.

ML Kit comes pre-packaged with access to ML functionality including text recognition within images, face detection, barcode and QR code scanning, object detection, labeling and tracking, language identification and translation, and Smart Reply (you’ve probably seen this in Gmail). It also provides AutoML model inference and custom model inference that allows you to train your own ML models.

For most features, the SDK uses your phone to apply image recognition. Some features, however, also have the option to make a request to Google’s Cloud API to handle more complex processing.

Understanding Face Recognition

In this article we’re going to use the ML Kit object detection features to implement a face detection app. Before we begin, however, we should differentiate face detection and facial recognition.

Face detection is pretty straightforward. It just means that our ML model identifies something in an image as a human face.

Facial recognition takes that to the next level and allows us to identify individual faces and facial features. Facial recognition is far more complex, as we now have to create an ML model that recognizes faces and recognizes details of specific facial features of the model to recognize an individual person. There is a lot of work involved to create the training data, and more work to implement all of the feature detail recognition.

ML Kit provides a broad selection of face detection capabilities. In this article we’ll specifically explore the face contour detection features. Aside from just identifying the existence of faces in an image, ML Kit’s face detection also includes the ability to detect the locations of specific facial features (eyes, ears, cheeks, nose, and mouth), recognize expressions, track faces between frames in video, and do real-time detection in video streams. To learn more see the Face Detection guide in the Firebase documentation.

Hardware and Software Requirements

Now that we know a bit about machine learning, let’s look into our hardware and software. In the app that we’re building we’ll just be doing face detection. However, we’ll be doing it completely on a device using an Arm CPU-powered smartphone. Since we’re doing it locally, no data will be sent to Google Cloud’s API services.

Running the code directly on your phone provides high performance gains. If the code was run on Google Cloud, it could bottleneck due to network bandwidth and latency limitations. Another reason to run it locally is that it is better for user privacy, since no details about the faces are transmitted off of the device.

We’ll be creating our app in Android Studio using Kotlin, and we’ll be testing it on a Samsung Galaxy S10. We chose Kotlin because it allows us to leverage underlying Java capabilities with much more concise syntax.

I chose the Galaxy S10 for our testing phone. While this technology should work with any processor, for the best performance you’ll want a device with at least a Cortex-A75 CPU or, even better, the Cortex-A76 CPU. While the Cortex-A75 is already a great performer, the Cortex-A76 is more efficient and up to four times faster for machine learning workloads.

In addition to creating ML-optimized hardware, Arm has also been working with Google to optimize ML performance on Arm-powered Android devices. Arm’s software engineers have been hard at work integrating the Arm Compute Library and Arm NN backbone with the Android NN API. ML Kit is built on Tensorflow Lite, which uses Android NN under the hood.

Create an Android Studio Project

Now that you know a bit more about the hardware optimization that’s being used, let’s get back to the software side of things, and finally start creating our app! To use ML Kit, we need to install the SDK.

First, let’s create our Android Studio Project. I’ll be calling the new project ML Kit

Tutorial and it’ll be using Kotlin, not Java.

Now create a Firebase project that your app will connect to in order to access ML Kit.

Register your app to your Firebase project. You need to set your application ID, which can be found in your build.gradle that is located in the app module.

Next you’ll be asked to download a google-services.json file and add it to the root directory of your app module in Project view. This JSON file is part of the security process, in order to ensure that your app has permission to access your Firebase project.

Now we can start adding dependencies to the build.gradle files located on the root of our application and the app module.

The first dependency, which we want to add google-services to the build.gradle file of the root directory is shown here:

dependencies {
    …
    classpath 'com.google.gms:google-services:4.2.0'
    // NOTE: Do not place your application dependencies here; 
    // they belong in the individual module build.gradle files
}

Next, in the build.gradle at the app level, we need to add our google-services and the ML Kit SDK services that we want to use. In this case, the services are the face models.

dependencies {
    …
    implementation 'com.google.firebase:firebase-core:16.0.9'
    implementation 'com.google.firebase:firebase-ml-vision:20.0.0'
    implementation 'com.google.firebase:firebase-ml-vision-face-model:17.0.2'
}

apply plugin: 'com.google.gms.google-services'

Finally, after having edited both of our gradle files, we need to re-sync so gradle will download all of the dependencies you just added above.

After this, we now have everything we need installed to create a face detection app!

Creating the App

To build the app, we will have five steps when in use:

  1. Click a button to enable our app to take a picture to use.
  2. Add the picture into the app.
  3. Click another button to use ML Kit to detect where the face is located in the image.
  4. Show the image with a graphic overlay of dots showing where ML Kit thinks the face is located.
  5. Display a message showing the likelihood that ML Kit thinks the person is smiling.

At the end we’ll have something like this:

For this app we want to get images with Google’s Camera API. For the sake of simplicity, instead of creating your own camera app, I opted to use the built-in Android camera app take the picture.

This sample app is comprised of four files: Activity_main.xml, MainActivity.kt, GraphicOverlay.java and FaceContourGraphic.kt. For simplicity, the latter two files are borrowed from the Firebase Quickstart Samples for Android repository on GitHub. (Note that the code versions I’m using for this example were updated in the repository on March 14 2019. Since it’s possible future updates to the repository could introduce breaking changes, I’ll provide links to the exact versions below.)

Activity_main.xml sets up the application.

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout
        xmlns:android="http://schemas.android.com/apk/res/android"
        xmlns:tools="http://schemas.android.com/tools"
        android:layout_width="match_parent"
        android:layout_height="match_parent"
        tools:context=".MainActivity">

    <TextView
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:layout_centerHorizontal="true"
            android:id="@+id/happiness"
            android:layout_alignParentBottom="true"
            android:layout_marginBottom="82dp"
    />
    <ImageView
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            tools:layout_editor_absoluteY="27dp"
            tools:layout_editor_absoluteX="78dp"
            android:id="@+id/imageView"/>
    <com.example.mlkittutorial.GraphicOverlay
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:id="@+id/graphicOverlay"
            android:layout_alignParentStart="true"
            android:layout_alignParentTop="true"
            android:layout_marginStart="0dp"
            android:layout_marginTop="0dp"/>
    <Button
            android:text="Take Picture"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            tools:layout_editor_absoluteY="421dp"
            android:onClick="takePicture"
            tools:layout_editor_absoluteX="10dp"
            android:id="@+id/takePicture"
            android:visibility="visible"
            android:enabled="true"
            android:layout_marginBottom="16dp"
            android:layout_alignParentStart="true"
            android:layout_marginStart="61dp"
            android:layout_alignParentBottom="true" />
    <Button
            android:text="Detect Face"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:layout_alignParentEnd="true"
            android:layout_marginEnd="61dp"
            android:layout_alignParentBottom="true"
            android:id="@+id/detectFace"
            android:layout_marginBottom="16dp"
            android:onClick="detectFace"
            android:visibility="visible"
            android:enabled="false"/>
</RelativeLayout>

No need to worry too much about the numbers, I mostly eye-balled the location where I put the UI, so you’re free to modify them if you want.

One thing to note is that we created our own custom view called GraphicOverlay. We’ll talk more about this view later, but it is responsible for drawing the coordinates of the user’s facial features.

MainActivity.kt provides the code for the application.

package com.example.mlkittutorial

import android.content.Intent
import android.content.res.Resources
import android.graphics.Bitmap
import android.support.v7.app.AppCompatActivity
import android.os.Bundle
import android.provider.MediaStore
import android.view.View
import com.google.firebase.ml.vision.FirebaseVision
import com.google.firebase.ml.vision.common.FirebaseVisionImage
import com.google.firebase.ml.vision.face.FirebaseVisionFace
import com.google.firebase.ml.vision.face.FirebaseVisionFaceDetectorOptions
import kotlinx.android.synthetic.main.activity_main.*

class MainActivity : AppCompatActivity() {
    private val requestImageCapture = 1
    private var cameraImage: Bitmap? = null

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
    }

    /** Receive the result from the camera app */
    override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
        if (requestCode == requestImageCapture && resultCode == RESULT_OK && data != null && data.extras != null) {
            val imageBitmap = data.extras.get("data") as Bitmap

            // Instead of creating a new file in the user's device to get a full scale image
            // resize our smaller imageBitMap to fit the screen
            val width = Resources.getSystem().displayMetrics.widthPixels
            val height = width / imageBitmap.width * imageBitmap.height
            cameraImage = Bitmap.createScaledBitmap(imageBitmap, width, height, false)

            // Display the image and enable our ML facial detection button
            imageView.setImageBitmap(cameraImage)
            detectFace.isEnabled = true
        }
    }

    /** Callback for the take picture button */
    fun takePicture(view: View) {
        // Take an image using an existing camera app
        Intent(MediaStore.ACTION_IMAGE_CAPTURE).also { takePictureIntent ->
            takePictureIntent.resolveActivity(packageManager)?.also {
                startActivityForResult(takePictureIntent, requestImageCapture)
                happiness.text = ""
                graphicOverlay.clear()
            }
        }
    }

    /** Callback for the detect face button */
    fun detectFace(view: View) {
        // Build the options for face detector SDK
        if (cameraImage != null) {
            val image = FirebaseVisionImage.fromBitmap(cameraImage as Bitmap)
            val builder = FirebaseVisionFaceDetectorOptions.Builder()
            builder.setContourMode(FirebaseVisionFaceDetectorOptions.ALL_CONTOURS)
            builder.setClassificationMode(FirebaseVisionFaceDetectorOptions.ALL_CLASSIFICATIONS)
            val options = builder.build()

            // Send our image to be detected by the SDK
            val detector = FirebaseVision.getInstance().getVisionFaceDetector(options)
            detector.detectInImage(image).addOnSuccessListener { faces ->
                displayImage(faces)
            }
        }
    }

    /** Draw a graphic overlay on top of our image */
    private fun displayImage(faces: List<FirebaseVisionFace>) {
        graphicOverlay.clear()
        if (faces.isNotEmpty()) {
            // We will only draw an overlay on the first face
            val face = faces[0]
            val faceGraphic = FaceContourGraphic(graphicOverlay, face)
            graphicOverlay.add(faceGraphic)
            happiness.text = "Smile Probability: " + (face.smilingProbability * 100) + "%"
        } else {
            happiness.text = "No face detected"
        }
    }
}

The comments should be self-sufficient. However, the core logic of the code starts with takePicture():

  1. When the user clicks the Take Picture button, we run the code in takePicture(), where we fire an intent to launch Android’s camera app. This will take a picture and send the data back to us.
  2. We get the image back in onActivityResult(), where we resize the image to fit on our screen and enable the Detect Face button.
  3. When the user clicks on the Detect Face button, we run detectFace(), where we take our previous image and then we send it to ML Kit to detect the user’s face. When the face detection succeeds, we get a list of face information that we process in displayImage().
  4. In displaImage(), things get a little bit tricky. We only take the first face in our image and create a Graphic using a class called FaceContourGraphic that we add to our GraphicOverlay (more on both of these later) to draw where our facial features are located.

To finish this app, we need to draw an overlay on top of our camera image. ML Kit returns a Face object that gives contour points in 2D coordinates where we can draw where the user’s facial feature is located.

While we now have the information of where the user’s facial features are located, ML Kit does not offer a solution to actually draw these points. Luckily for us, in this ML Kit example, there is some example code that we can leverage to draw these contour points on our image.

The first key to solving the puzzle is GraphicOverlay.java from the Firebase Android sample repository. This class is just a view container that will draw the graphics that we can add on top of our camera image.

The actual coordinate drawing is done in FaceContourGraphic.kt, also from the Firebase sample repository. This class takes in the Face object the ML Kit provides and then, using the coordinate points, draws the contour points on the canvas of GraphicOverlay.

With this, we now have a fully functioning app that takes images from a camera and then uses the data provided to us from the ML Kit, to draw the contour points of our facial features on top of the image!

You might also notice that the execution of this in the app is super quick, taking only a few seconds at most. Impressive!

Conclusion

In this day and age, being able to leverage ML capabilities in apps can give you a great competitive edge. Unfortunately, not everyone has the skills needed to be able to take advantage to incorporate trained ML models into their AI-powered apps.

Google is helping smaller companies fill in this gap by offering a suite of tools that enables developers to use ML on common scenarios, such as image and text recognition.

By using the ML Kit, developers save small companies and individuals massive amounts of time and money that would otherwise be spent on making their own ML Model. The process of gathering data, implementing an algorithm, and then training your model can take up to months, but by using ML Kit, you can get results in as little as 20 minutes!

How to work with us

  • Contact us to set up a call.
  • We will analyze your needs and recommend a content contract solution.
  • Sign on with ContentLab.
  • We deliver topic-curated, deeply technical content to you.

To get started, complete the form to the right to schedule a call with us.

Content Science Blog

Our blog devoted to the fundamentals of producing content and marketing to developers.

Ask a Developer Video Series

Our video series where developers answer questions about what makes for useful technical content.


Send this to a friend