Face mask detector using FaceNet — Live streaming

Make it live, have more fun!

4 min readFeb 5, 2021

In this last blog, we will walk you through how to put everything we have done into a live streaming version. After all this hard work, you can see yourself being recognized on your webcam!

Preparation

The package we will use is imutils. Here are the essential packages for this section.

from imutils.video import VideoStream
from imutils.video import FPS
import numpy as np
import imutils
import pickle
import cv2
import os

The basic idea is we use VideoStream and FPS to open a pointer to the live stream and start the FPS timer. During the loop of each frame of the live stream, we will extract faces, classify them in terms of wearing masks, wearing masks correctness, and recognize them if they are in our database. Given this project outline, we need to save our previous models as pickle files. Here are the models we need:

face align/face detection: to extract the faces in an image. Previously, we use MTCNN embedded in the FaceNet. However, this MTCNN using TensorFlow could not compatible with imutils’s FPS. Maybe due to some threads issue. So, here I just download another pre-trained model from OpenCV to serve the same purpose. I am happy to hear any possible solution for my problem of using MTCNN in living stram.
Embedding model: 20180402–114759.pb, the pre-trained model we used in the face mask detection section.
Recognizer: the classification model to tell whether the face is masked/bare, wearing mask correct or not, and who it is based on our database.
Encoder: a pickle file to match the classified label with the actual names of that class. For example 1: masked face, 0: bare face, etc.

Load all models we need

args = {}args["detector"] = "./model/face_detection_model"
args["embedding_model"] = "./model/20180402-114759.pb"
args["recognizer"] = "./model/NN_model.json"
args["le"] = "./model/le.pickle"
args["confidence"] = 0.5# load our serialized face detector from diskprint("[INFO] loading face detector...")protoPath = os.path.sep.join([args["detector"], "deploy.prototxt"])
modelPath = os.path.sep.join([args["detector"],
"res10_300x300_ssd_iter_140000.caffemodel"])
detector = cv2.dnn.readNetFromCaffe(protoPath, modelPath)# load our serialized face embedding model from diskprint("[INFO] loading face recognizer...")recognizer = nn.TFNN.load(args["recognizer"])
le = pickle.loads(open(args["le"], "rb").read())

This is all you need to begin this living stream journey!

Open a video stream

First, initialize the video stream, then allow the camera sensor to warm up. Then, start the FPS throughput estimator.

print("[INFO] starting video stream...")vs = VideoStream(src=0).start()
time.sleep(2.0)fps = FPS().start()

Next, we loop through all the frames captured, treat them as a single image as we did in the previous sections.

Extract Faces

To capture the images and extract faces from them:

cnt = 0while True:
    cnt += 1# grab the frame from the threaded video stream 
    frame = vs.read()# resize the frame to have a width of 600 pixels (while maintaining the aspect ratio), and then grab the image dimensionsframe = imutils.resize(frame, width=600)
(h, w) = frame.shape[:2]# construct a blob from the imageimageBlob = cv2.dnn.blobFromImage(
cv2.resize(frame, (300, 300)), 1.0, (300, 300),
(104.0, 177.0, 123.0), swapRB=False, crop=False)# apply OpenCV's deep learning-based face detector to localize faces in the input imagedetector.setInput(imageBlob)
detections = detector.forward()

Add labels

Loop the detections to filter out not qualified faces and apply embedding and classification models.

# loop over the detectionsfor i in range(0, detections.shape[2]):# extract the confidence (i.e., probability) associated with the prediction
    confidence = detections[0, 0, i, 2]# filter out weak detections
    if confidence > args["confidence"]:# compute the (x, y)-coordinates of the bounding box for the face
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")# extract the face ROI
        face = frame[startY:endY, startX:endX]
        (fH, fW) = face.shape[:2]# ensure the face width and height are sufficiently large
        if fW < 20 or fH < 20:
              continue# through our face embedding model to obtain the 128-d quantification of the face
        img = resize.resize_addframe(face, 160, 160)
        cv2.imwrite("./image/"+str(cnt)+".jpg", img)
        print("embedding start: ", time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
        vec = embedding.main(img, args["embedding_model"], 160)# perform classification to recognize the face
         proba, pred_label = recognizer.predict_label(vec)
         name = le.classes_[int(pred_label)]# draw the bounding box of the face along with the associated probability
         text = "{}: {:.2f}%".format(name, float(proba) * 100)
         y = startY - 10 if startY - 10 > 10 else startY + 10
         cv2.rectangle(frame, (startX, startY), (endX, endY),(0, 0, 255), 2)
         cv2.putText(frame, text, (startX, y),
         cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

Update and Output

Finally, we update the label to the output frame. Now you can see yourself recognized in the live video!

# update the FPS counter
    fps.update()# show the output frame
    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF# if the `q` key was pressed, break from the loop
    if key == ord("q"):
        break# stop the timer and display FPS information
fps.stop()
print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

Find the original code here.