Sign in

Face mask detector using FaceNet — Live streaming

Make it live, have more fun!

In this last blog, we will walk you through how to put everything we have done into a live streaming version. After all this hard work, you can see yourself being recognized on your webcam!


from import VideoStream
from import FPS
import numpy as np
import imutils
import pickle
import cv2
import os

The basic idea is we use VideoStream and FPS to open a pointer to the live stream and start the FPS timer. During the loop of each frame of the live stream, we will extract faces, classify them in terms of wearing masks, wearing masks correctness, and recognize them if they are in our database. Given this project outline, we need to save our previous models as pickle files. Here are the models we need:

  • face align/face detection: to extract the faces in an image. Previously, we use MTCNN embedded in the FaceNet. However, this MTCNN using TensorFlow could not compatible with imutils’s FPS. Maybe due to some threads issue. So, here I just download another pre-trained model from OpenCV to serve the same purpose. I am happy to hear any possible solution for my problem of using MTCNN in living stram.
  • Embedding model: 20180402–114759.pb, the pre-trained model we used in the face mask detection section.
  • Recognizer: the classification model to tell whether the face is masked/bare, wearing mask correct or not, and who it is based on our database.
  • Encoder: a pickle file to match the classified label with the actual names of that class. For example 1: masked face, 0: bare face, etc.

Load all models we need

args = {}args["detector"] = "./model/face_detection_model"
args["embedding_model"] = "./model/20180402-114759.pb"
args["recognizer"] = "./model/NN_model.json"
args["le"] = "./model/le.pickle"
args["confidence"] = 0.5
# load our serialized face detector from diskprint("[INFO] loading face detector...")protoPath = os.path.sep.join([args["detector"], "deploy.prototxt"])
modelPath = os.path.sep.join([args["detector"],
detector = cv2.dnn.readNetFromCaffe(protoPath, modelPath)
# load our serialized face embedding model from diskprint("[INFO] loading face recognizer...")recognizer = nn.TFNN.load(args["recognizer"])
le = pickle.loads(open(args["le"], "rb").read())

This is all you need to begin this living stream journey!

Open a video stream

print("[INFO] starting video stream...")vs = VideoStream(src=0).start()
fps = FPS().start()

Next, we loop through all the frames captured, treat them as a single image as we did in the previous sections.

Extract Faces

cnt = 0while True:
cnt += 1
# grab the frame from the threaded video stream
frame =
# resize the frame to have a width of 600 pixels (while maintaining the aspect ratio), and then grab the image dimensionsframe = imutils.resize(frame, width=600)
(h, w) = frame.shape[:2]
# construct a blob from the imageimageBlob = cv2.dnn.blobFromImage(
cv2.resize(frame, (300, 300)), 1.0, (300, 300),
(104.0, 177.0, 123.0), swapRB=False, crop=False)
# apply OpenCV's deep learning-based face detector to localize faces in the input imagedetector.setInput(imageBlob)
detections = detector.forward()

Add labels

# loop over the detectionsfor i in range(0, detections.shape[2]):# extract the confidence (i.e., probability) associated with the prediction
confidence = detections[0, 0, i, 2]
# filter out weak detections
if confidence > args["confidence"]:
# compute the (x, y)-coordinates of the bounding box for the face
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
# extract the face ROI
face = frame[startY:endY, startX:endX]
(fH, fW) = face.shape[:2]
# ensure the face width and height are sufficiently large
if fW < 20 or fH < 20:
# through our face embedding model to obtain the 128-d quantification of the face
img = resize.resize_addframe(face, 160, 160)
cv2.imwrite("./image/"+str(cnt)+".jpg", img)
print("embedding start: ", time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
vec = embedding.main(img, args["embedding_model"], 160)
# perform classification to recognize the face
proba, pred_label = recognizer.predict_label(vec)
name = le.classes_[int(pred_label)]
# draw the bounding box of the face along with the associated probability
text = "{}: {:.2f}%".format(name, float(proba) * 100)
y = startY - 10 if startY - 10 > 10 else startY + 10
cv2.rectangle(frame, (startX, startY), (endX, endY),(0, 0, 255), 2)
cv2.putText(frame, text, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

Update and Output

# update the FPS counter
# show the output frame
cv2.imshow("Frame", frame)
key = cv2.waitKey(1) & 0xFF
# if the `q` key was pressed, break from the loop
if key == ord("q"):
# stop the timer and display FPS information
print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))
# do a bit of cleanup

Find the original code here.

The above demo video shows how it works. It can also capture multiple faces in a single shot.

Have fun with it!