Tuesday 28 June 2022

Creating Custom Obstacle Detection System Using TensorFlow and TensorFlow Lite for Mobile Robot


This post is a continuation of a project that I initiated in 2020 'Implementing custom CNN with DIY machine vision module'. In that post I created a small convolutional neural network (CNN) that is used to analyze a grayscale image and determine if there is object on the lower half of the image. The system is used as vision-based obstacle detection for small robots. At that time (2020) small single-board computers such as Raspberry Pi Zero do not have sufficient computing power to support TensorFlow Lite and popular machine vision package OpenCV at sufficient frame rate. However, with the arrival of Raspberry Pi Zero 2 in October 2021, which uses quad core ARM Cortex-A53 microprocessor (as opposed to single core in Raspberry Pi Zero), it is now possible to run convolutional neural network on Pi Zero 2 with frame rate of 5 frames-per-second or higher. This article aims to describe the steps in exporting the CNN in my 2020 post to TensorFlow Lite (We shall now use the term TF Lite for brevity for now onward), and deploying it to run on Raspberry Pi Zero 2 or similar embedded computers (Pi 4B, Jetson etc). Note that every thing you need can be found on tensorflow.org and other online resources, I just summarize all essential steps in this post.

There are a number of steps we need to take care of to make our CNN run on an embedded computer:

1. Converting the CNN model into TF Lite format so that it uses less resources and run faster at the expense of slight impact to the accuracy. Save the TFLite model on the computer hard disk.

2. Install the TFLite runtime (a bare-minimum library to perform inference with the neural network).

3. Setup a camera and codes to acquire image frame from the camera. For this we can use the OpenCV library, or the Pygame library.

4. Perform the necessary pre-processing on the image frame, for instance rescale, crop, normalize and turn it into tensor or numpy array.

5. Feeding the input to TF Lite runtime and accessing the output.

The details for the above steps will be described below. Here is a video showing how the system performs. Here I am using Raspberry Pi 3A+. It is already hard to find now in 2022 and have slightly higher computing power than Raspberry Pi Zero 2.

Video 1 - Demonstration of the system.



1. Converting TensorFlow model to TFLite Format

Suppose we have already trained our neural network to sufficient accuracy. Before we can convert our TensorFlow model to TF Lite format, we need to save the model onto the computer hard disk. At this writing we can save our CNN in 3 formats:

  • The older H5 format.
  • High-level TensorFlow SavedModel format (used by Keras).
  • Low-level TensorFlow SavedModel format (used by TensorFlow API).

Things like the model architecture, weights, compilation info, optimizer setting and state will be saved (we can also select a sub-set of these). Here we will be using the SavedModel formats (both low and high level) as this is recommended by TensorFlow. The python codes below show how this is done for SavedModel format. Assuming model is our TensorFlow model.

# Save model in Keras (high-level) SavedModel format:
model.save("./Exported_model_keras/")

# Save model in tensorflow (low-level) SavedModel format:
tf.saved_model.save(model,"./Exported_model_tf")

Once we have saved the TensorFlow model to hard disk, the following python scripts are used to convert the model from TensofFlow to TF Lite. At the time of writing, the library to convert TensorFlow model to TF Lite requires that the original TensorFlow model be saved in low-level SavedModel format.

import os
import tensorflow as tf

SAVED_MODEL_DIR2 = '.\Exported_model_keras’    # Point to Tensorflow (high-level) SavedModel folder.
SAVED_MODEL_DIR = '.\Exported_model_tf'             # Point to Tensorflow (low-level) SavedModel folder.
EXPORT_MODEL_DIR = '.\TfLite_model'                    # Directory to store Tensorflow lite model.
# Path and filename of export model.
PATH_TO_EXPORT_MODEL = os.path.join(EXPORT_MODEL_DIR,'model.tflite’)
 
mymodel = tf.keras.models.load_model(SAVED_MODEL_DIR2) # NOTE: 14/6/2022 Somehow I kept getting
                                                       # error from python interpreter when this
                                                       # line is not executed.
converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_DIR)
 
tflite_model = converter.convert()

# Save the model.
with open(PATH_TO_EXPORT_MODEL, 'wb') as f:
  f.write(tflite_model)

In the code above, once converted the TF Lite model will be saved as a file named model.tflite in the following path:

.\TfLite_model\model.tflite

2. Installing TF Lite Runtime on Raspberry Pi

I have tested this on Raspberry Pi OS Buster and Bullseye. All the official TensorFlow guide for TF Lite can be found on:
https://www.tensorflow.org/lite
https://www.tensorflow.org/lite/guide/python

To install the TF Lite runtime on Raspberry Pi, we should update our Raspberry Pi OS to the latest version via the terminal:

sudo apt update

sudo apt full-upgrade 

After that we just type the following command in the terminal:

python3 –m pip install tflite-runtime

There is a difference between the typical sudo apt update and sudo apt full-upgrade which is explained here. In the case of Raspbian Buster I discovered that an older version of TF Lite runtime will be installed if I just use sudo apt update. Unfortunately, at the time of writing this, there is no way to determine the version of TF Lite runtime except by uninstalling it and observing the message generated!

3. Loading the TF Lite Model

The python codes below illustrate how to load the TF Lite model. Here we assume that the TF Lite model has been successfully converted from TensorFlow model and saved as model.tflite in the folder ./TfLite_model. The input and output specifications of TF Lite model are contained in the object my_signature.Thus, after loading the TF Lite interpreter, we get the signature of the model.

TFLITE_MODEL_DIR = './TfLite_model'
PATH_TO_TFLITE_MODEL = os.path.join(TFLITE_MODEL_DIR,'model.tflite')

import tflite_runtime.interpreter as tflite       # Use tflite runtime instead of TensorFlow.

interpreter = tflite.Interpreter(PATH_TO_TFLITE_MODEL)

# There is only 1 signature defined in the model, so it will return it by default.
# If there are multiple signatures then we can pass the name.
my_signature = interpreter.get_signature_runner()

# Optional, show the format for input.
input_details = interpreter.get_input_details()
# input_details is a dictionary containing the details of the input to this neural network.
print(input_details[0])
print(input_details[0]['shape'])
# Now print the signature input and output names so that we can call these later.
print(interpreter.get_signature_list())

4. Full Example With OpenCV

The codes below shows the full implementation of the TF Lite model. Here we use OpenCV library to manage the interface to the camera. This code can be used on either a computer with TensorFlow framework or Raspberry Pi with TF Lite runtime, just uncomment the relevant code sections. Of course, both the computer and Raspberry Pi must have OpenCV library installed in the python environment. OpenCV version 4.1.0 or newer is suitable.

import os
import numpy as np
import cv2

TFLITE_MODEL_DIR = '.\TfLite_model'
PATH_TO_TFLITE_MODEL = os.path.join(TFLITE_MODEL_DIR,'model.tflite')

_SHOW_COLOR_IMAGE = False

#Set the width and height of the input image in pixels.
_imgwidth = 160
_imgheight = 120

#Set the region of interest start point and size.
#Note: The coordinate (0,0) starts at top left hand corner of the image frame.
_roi_startx = 30
_roi_starty = 71
_roi_width = 100
_roi_height = 37


# --- For PC --- [Uncomment as necessary]
import tensorflow as tf
interpreter = tf.lite.Interpreter(PATH_TO_TFLITE_MODEL) # Load the TFLite model in TFLite Interpreter
 

# --- For Raspberry Pi ---
#import tflite_runtime.interpreter as tflite # Use tflite runtime instead of TensorFlow.
#interpreter = tflite.Interpreter(PATH_TO_TFLITE_MODEL)

# There is only 1 signature defined in the model,
# so it will return it by default.
# If there are multiple signatures then we can pass the name.
my_signature = interpreter.get_signature_runner()

# Optional, show the format for input.
input_details = interpreter.get_input_details()
# input_details is a dictionary containing the details of the input
# to this neural network.
print(input_details[0])
print(input_details[0]['shape'])
# Now print the signature input and output names.
print(interpreter.get_signature_list())

video = cv2.VideoCapture(0) # Open a camera connected to the computer.
video.set(3,2*_imgwidth)   # Set the resolution output from the camera.
video.set(4,2*_imgheight)  #

# Calculate the corners for all rectangules that we are going to draw on the image.
pointROIrec1 = (2*_roi_startx,2*_roi_starty)
pointROIrec2 = (2*(_roi_startx + _roi_width),2*(_roi_starty + _roi_height))

interval = np.floor(_roi_width/3)
interval2 = np.floor(2*_roi_width/3)
# Rectangle for label1 (object on left)
pointL1rec1 = (2*(_roi_startx+4),2*(_roi_starty+4))
pointL1rec2 = (2*(_roi_startx +int(interval)-4),2*(_roi_starty + _roi_height-4))
# Rectangle for label2 (object on right)
pointL2rec1 = (2*(_roi_startx+4+int(interval2)),2*(_roi_starty+4))
pointL2rec2 = (2*(_roi_startx+_roi_width-4),2*(_roi_starty+_roi_height-4))
# Rectangle for label3 (object in front)
pointL3rec1 = (2*(_roi_startx+4+int(interval)),2*(_roi_starty+4))
pointL3rec2 = (2*(_roi_startx+int(interval2)-4),2*(_roi_starty+_roi_height-4))
# Rectangle for label4 (object blocking front)
pointL4rec1 = (2*(_roi_startx+4),2*(_roi_starty+4))
pointL4rec2 = (2*(_roi_startx + _roi_width-4),2*(_roi_starty + _roi_height-4))

print(pointL1rec1,pointL1rec1)
if not video.isOpened():            # Check if video source is available.
    print("Cannot open camera or file")
    exit()
    
while True:                         # This is same as while (1) in C.
    successFlag, img = video.read() # Read 1 image frame from video.
    
    if not successFlag:             # Check if image frame is correctly read.
        print("Can't receive frame (stream end?). Exiting ...")    
        break
    
    imggray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)         # Convert to grayscale.
    imggrayresize = cv2.resize(imggray,None,fx=0.5,fy=0.5)  # Resize to 160x120 pixels
    
    # Crop out region-of-interest (ROI)
    imggrayresizecrop = imggrayresize[_roi_starty:_roi_starty+_roi_height,_roi_startx:_roi_startx+_roi_width]

    # Normalize each pixel value to floating point, between 0.0 to +1.0
    # NOTE: This must follows the original mean and standard deviation
    # values used in the TF model. Need to refer to the model pipeline.
    # In Tensorflow, the normalization is done by the detection_model.preprocess(image)
    # method. In TensorFlow lite we have to do this explicitly.
    imggrayresizecropnorm = imggrayresizecrop/256.0                 # Normalized to 32-bits floating points  


    #test = np.expand_dims(imgpgrayresizecropnorm,(0,-1)) # change the shape from (37,100) to (1,37,100,1),
                                              # to meet the requirement of tflite interpreter
                                              # input format. Also datatype is float32, see
                                              # the output of print(input_details[0])
    # --- Method 1 using tf.convert_to_tensor to make a tensor from the numpy array ---
    #input_tensor = tf.convert_to_tensor(test, dtype=tf.float32)

    # --- Method 2 to prepare the input, only using numpy ---
    input_tensor = np.asarray(np.expand_dims(imggrayresizecropnorm,(0,-1)), dtype = np.float32)

    output = my_signature(conv2d_input = input_tensor)  # Perform inference on the input. The input and
                                                    # output names can
                                                    # be obtained from interpreter.get_signature_list()

    output1 = np.squeeze(output['dense_1'])         # Remove 1 dimension from the output. The output
                                                    # parameters are packed into a dictionary. With
                                                    # the name 'dense_1' to access the output layer.
    result = np.argmax(output1)

    if _SHOW_COLOR_IMAGE == True:
         # Draw ROI border on image
        cv2.rectangle(img,pointROIrec1,pointROIrec2,(255,0,0), thickness=2)    
        # Draw rectangle for Label 1 to 4 in ROI    
        if result == 1:
            cv2.rectangle(img,pointL1rec1,pointL1rec2,(255,255,0), thickness=2)
        elif result == 2:
            cv2.rectangle(img,pointL2rec1,pointL2rec2,(255,255,0), thickness=2)
        elif result == 3:
            cv2.rectangle(img,pointL3rec1,pointL3rec2,(255,255,0), thickness=2)
        elif result == 4:
            cv2.rectangle(img,pointL4rec1,pointL4rec2,(255,255,0), thickness=2)       
            
        cv2.imshow("Video",img)           # Display the image frame.
    else:
        # Draw ROI border on image
        cv2.rectangle(imggray,pointROIrec1,pointROIrec2,255, thickness=2)    
        # Draw rectangle for Label 1 to 4 in ROI    
        if result == 1:
            cv2.rectangle(imggray,pointL1rec1,pointL1rec2,255, thickness=2)
        elif result == 2:
            cv2.rectangle(imggray,pointL2rec1,pointL2rec2,255, thickness=2)
        elif result == 3:
            cv2.rectangle(imggray,pointL3rec1,pointL3rec2,255, thickness=2)
        elif result == 4:
            cv2.rectangle(imggray,pointL4rec1,pointL4rec2,255, thickness=2)
    
        cv2.imshow("Video",imggray)           # Display the image frame.

    if cv2.waitKey(1) & 0xFF == ord('q'): # Note: built-in function ord() scans the
          break                           # keyboard for 1 msec, returns the integer value                                            
                                          # of a unicode character. Here we compare user key
                                          # press with 'q'
# When everything done, release the capture resources.
video.release()                                          
cv2.destroyAllWindows()

A demonstration of the code in action. Note that the original image is in color and 640x480 pixels. The algorithm resizes each image frame into 160x120 pixels, then convert the color channels into grayscale. A subset of this grayscale image, as delineated by the variables  

_roi_startx = 30
_roi_starty = 71
_roi_width = 100
_roi_height = 37

Set the region-of-interest (ROI) where analysis will be carried out by the neural network. This ROI location and size can be adjusted, provided we retrain the network every time we adjust the ROI parameters. We can plot the image captured by the camera as in color or grayscale, by setting the variable _SHOW_COLOR_IMAGE to True or False in the code.  Figure 1 below shows the file structure of the system as setup inside Raspberry Pi. 

 


 

 

 

 

 

Figure 1 - Here our TF Lite model is stored in sub-folder "TfLite_model", while the python codes using the TF Lite model to perform inferencing on the camera images resides in current folder.

 

Video 2 - Using OpenCV as the frontend interface to capture camera images.


 

 

5. Full Example with Pygame

In addition to using OpenCV libraries to interface to Raspberry Pi on-board camera, we can also use other python libraries for this purpose. An alternative python library that I found suitable to replace OpenCV is the Pygame. Installing Pygame is just a single line of instruction in the terminal. So here is another version of implementing the TF Lite model with Pygame as the camera interface.

 
import os
import numpy as np
import pygame
from pygame import camera
from pygame import display

TFLITE_MODEL_DIR = './TfLite_model'
PATH_TO_TFLITE_MODEL = os.path.join(TFLITE_MODEL_DIR,'model.tflite')

# Original image size
_imgwidth_ori = 640
_imgheight_ori = 480
#_imgwidth_ori = 320
#_imgheight_ori = 240

# Set the width and height of the input image in pixels for tensorflow pipeline.
_imgwidth = 160
_imgheight = 120

# Set the region of interest start point and size.
# Note: The coordinate (0,0) starts at top left hand corner of the image frame.
_roi_startx = 30
_roi_starty = 71
_roi_width = 100
_roi_height = 37


pygame.init() # This initialize pygame, including the display as well.
camera.init()
mycam = camera.Camera(camera.list_cameras()[0],(_imgwidth_ori,_imgheight_ori),'HSV')
mycam.start()
screen = display.set_mode((_imgwidth_ori,_imgheight_ori))
display.set_caption("cam")

rescale_level = int(_imgwidth_ori/_imgwidth) # Scale to reduce the image size.
nrow = int(_imgheight_ori/rescale_level)
ncol = int(_imgwidth_ori/rescale_level)
averaging_coeff = 1.0/rescale_level

Ave_row = np.zeros((nrow,_imgheight_ori),dtype=float)
Ave_col = np.zeros((_imgwidth_ori,ncol),dtype=float)
for row in range(nrow):
    for index in range(rescale_level):
        Ave_row[row,rescale_level*row+index] = averaging_coeff
   
for col in range(ncol):
    for index in range(rescale_level):
        Ave_col[rescale_level*col+index,col] = averaging_coeff

# Codes to calculate the coordinates for rectangles and other structures
# that will be superimposed on the display screen as user feedback.
pointROIstart = (rescale_level*_roi_startx,rescale_level*_roi_starty)
pointROIsize = (rescale_level*_roi_width,rescale_level*_roi_height)
pgrectROI = pygame.Rect(pointROIstart,pointROIsize)

interval = np.floor(_roi_width/3)
interval2 = np.floor(2*_roi_width/3)
# Rectangle for label1 (object on left)
pointL1start = (rescale_level*(_roi_startx+4),rescale_level*(_roi_starty+4))
pointL1size = (rescale_level*(int(interval)-8),rescale_level*(_roi_height-8))
pgrectL1 = pygame.Rect(pointL1start,pointL1size)
# Rectangle for label2 (object on right)
pointL2start = (rescale_level*(_roi_startx+4+int(interval2)),rescale_level*(_roi_starty+4))
pointL2size = (rescale_level*(int(interval)-8),rescale_level*(_roi_height-8))
pgrectL2 = pygame.Rect(pointL2start,pointL2size)
# Rectangle for label3 (object in front)
pointL3start = (rescale_level*(_roi_startx+4+int(interval)),rescale_level*(_roi_starty+4))
pointL3size = (rescale_level*(int(interval)-8),rescale_level*(_roi_height-8))
pgrectL3 = pygame.Rect(pointL3start,pointL3size)
# Rectangle for label4 (object blocking front)
pointL4start = (rescale_level*(_roi_startx+4),rescale_level*(_roi_starty+4))
pointL4size = (rescale_level*(_roi_width-8),rescale_level*(_roi_height-8))
pgrectL4 = pygame.Rect(pointL4start,pointL4size)

# --- For PC ---
import tensorflow as tf
interpreter = tf.lite.Interpreter(PATH_TO_TFLITE_MODEL) # Load the TFLite model in TFLite Interpreter
# === For Raspberry Pi ---
#import tflite_runtime.interpreter as tflite # Use tflite runtime instead of TensorFlow.
#interpreter = tflite.Interpreter(PATH_TO_TFLITE_MODEL)

# There is only 1 signature defined in the model,
# so it will return it by default.
# If there are multiple signatures then we can pass the name.
my_signature = interpreter.get_signature_runner()

# Optional, show the format for input.
input_details = interpreter.get_input_details()
# input_details is a dictionary containing the details of the input
# to this neural network.
print(input_details[0])
print(input_details[0]['shape'])
# Now print the signature input and output names.
print(interpreter.get_signature_list())

        
    
is_running = True
while is_running:
    img = mycam.get_image()                             # Note: the return from get_image() is an object
                                                        # called Surface in pygame. This is a 2D array with the
                                                        # element being a 32-bits unsigned integer for the pixel
                                                        # where the 8-bits RGB (default format) components are
                                                        # coded as follows:
                                                        # pixel_value = (Rx256x256) + (Bx256) + R
                                                        # The multiply by 256 corresponds to left shift 8-bits.
    imgnp = np.asarray(pygame.surfarray.array3d(img),dtype=np.uint32)   # Convert 2d surface into 3D array, with the last index
                                                        # points to the color component.
    imgI = imgnp[:,:,2]                                 # Extract the V component.
    
    imgIt = np.transpose(imgI)                          # Flip the image array to the correct orientation.
    imgIresize = np.matmul(imgIt,Ave_col)               # Perform image resizing using averaging method.
                                                        # To speed up the process, instead of using dual for-loop,
                                                        # we use numpy matrix multiplication method. Here we
    imgIresize = np.matmul(Ave_row,imgIresize)          # multiply the image matrix on left and right hand side
                                                        # This performs averaging allow the row and column while
                                                        # reducing the width and height of the original image matrix.
    # Crop out region-of-interest (ROI)
    imggrayresizecrop = imgIresize[_roi_starty:_roi_starty+_roi_height,_roi_startx:_roi_startx+_roi_width]
    # Normalize each pixel value to floating point, between 0.0 to +1.0
    # NOTE: This must follows the original mean and standard deviation
    # values used in the TF model. Need to refer to the model pipeline.
    # In Tensorflow, the normalization is done by the detection_model.preprocess(image)
    # method. In TensorFlow lite we have to do this explicitly.
    imggrayresizecropnorm = imggrayresizecrop/256.0                 # Normalized to 32-bits floating points
    # --- Method 1 using tf.convert_to_tensor to make a tensor from the numpy array ---
    #input_tensor = tf.convert_to_tensor(test, dtype=tf.float32)

    # --- Method 2 to prepare the input, only using numpy ---
    input_tensor = np.asarray(np.expand_dims(imggrayresizecropnorm,(0,-1)), dtype = np.float32)

    output = my_signature(conv2d_input = input_tensor)  # Perform inference on the input. The input and
                                                    # output names can
                                                    # be obtained from interpreter.get_signature_list()

    output1 = np.squeeze(output['dense_1'])         # Remove 1 dimension from the output. The output
                                                    # parameters are packed into a dictionary. With
                                                    # the name 'dense_1' to access the output layer.
    result = np.argmax(output1)    
    print(result)
    
    imgnp[:,:,0] = imgI                             # Create a gray-scale image array by duplicating the luminance V
    imgnp[:,:,1] = imgI                             # values on channel 0 and channel 1 of the 3D image array.
    pygame.surfarray.blit_array(screen,imgnp)       # Copy 3D image array to display surface using block transfer.
                              
    # Draw the ROI border on the screen.
    pygame.draw.rect(screen,(0,0,255),pgrectROI,width=rescale_level)
    # Draw rectangle for Label 1 to 4 in ROI    
    if result == 1:
        pygame.draw.rect(screen,(255,255,0),pgrectL1,width=rescale_level)
    elif result == 2:
        pygame.draw.rect(screen,(255,255,0),pgrectL2,width=rescale_level)
    elif result == 3:
        pygame.draw.rect(screen,(255,255,0),pgrectL3,width=rescale_level)
    elif result == 4:
        pygame.draw.rect(screen,(255,255,0),pgrectL4,width=rescale_level)
        
    display.update()                                    # This will create a window and display the image.
    #display.flip()
    for event in pygame.event.get():  # Just close the window and a QUIT even will be generated
        if event.type == pygame.QUIT:
            is_running = False
mycam.stop()
pygame.quit()

A demonstration of the code in action. Note that the original image is in color and 640x480 pixels. The algorithm resizes each image frame into 160x120 pixels, then convert the color channels into grayscale. A subset of this grayscale image, as delineated by the variables  

_roi_startx = 30
_roi_starty = 71
_roi_width = 100
_roi_height = 37

 
Video 3 - Using Pygame as the frontend interface to capture the camera images. We also set the region-of-interest (ROI) where analysis will be carried out by the neural network.