Sign in

Face mask detector using FaceNet

build a CNN model to classify if a person is wearing a mask or not

Living in the post-pandemic era, people are required to wear masks when they go shopping, to hospital, and even to work. We should carry the responsibility to the whole society while the business should also adopt more scrutiny operation to monitor the mask-wearing behavior. An accurate face mask detector with wearing guidance may not only enhance the entrance control in buildings, shopping malls, and groceries but also invoke people’s awareness of the severity of this dead-virus if not yet.

This project will walk you through the process of building a face mask detector with FaceNet. We will build the face mask detector followed by either bare face recognition or wearing-correctness check based on the result of the detector. The whole frame could work on batch processing as well as a live stream which is more practical in real-world scenarios.

The whole project workflow is shown below:

Face mask detector

Data set

Since the project contains several sections, we will use different datasets to serve various purposes. To make it easy, I will only introduce the dataset used in the face mask detector part and leave the rest to the next blog.

The dataset[1] we are using is from the Github: Real-World-Masked-Face-Dataset, which contains 459 people with 6K masked face images and 90K unmasked face images.

Step 1: deal with imbalanced data

As the data is highly imbalanced, we augment the masked faces by mirroring, gray scaling, and coloring the original images to increase the minor class of the dataset to 10 times. After augmentation, there are a total number of 150,907 photos, 60% are face photos and 40% are masked ones.

# gray scaled,red, green,yellow the image as well as the flipped image
def
augment(img, path, file):
flip = tf.image.flip_left_right(img)
gray = tf.image.rgb_to_grayscale(img)
red = tf.image.adjust_hue(img,0.9)
green = tf.image.adjust_hue(img, 0.3)
yellow = tf.image.adjust_hue(img, 0.1)
flip_gray = tf.image.rgb_to_grayscale(flip)
flip_red = tf.image.adjust_hue(flip,0.9)
flip_green = tf.image.adjust_hue(flip,0.3)
flip_yellow = tf.image.adjust_hue(flip,0.1)
au_img = [flip, gray, red, green, yellow, flip_gray, flip_red, flip_green, flip_yellow]
au_name = ["flip", "gray", "red", "green", "yellow", "flip_gray", "flip_red", "flip_green",
"flip_yellow"]
for idx, image in enumerate(au_img):
tf.keras.preprocessing.image.save_img(path + file + "_%s.jpg"%au_name[idx], image, data_format= "channels_last")
# apply the function to all folders

folder = "drive/My Drive/5500_data/self-built-masked-face-recognition-dataset/"
subfolders = os.listdir(folder + "AFDB_masked_face_dataset")
length = len(os.listdir(folder + "AFDB_masked_face_dataset"))
for i in tqdm(range(length)):
subfolder = subfolders[i]
files = os.listdir(folder + "AFDB_masked_face_dataset/" + subfolder)
path = folder + "AFDB_masked_face_dataset/" + subfolder + "/"
for f in files:
img = cv.imread(path + f)
img = cv.cvtColor(img, cv.COLOR_BGR2RGB)
augment(img, path, f)

step 2: add labels and split the test-train set

The final step before we jump into the modeling part is to add labels to our data and split them into test and training sets.

df = {"label": [], "image_path": []}def image_info2df(folder_path, label):
folder1 = os.listdir(folder_path)
for i in tqdm(range(len(folder1))):
subf = folder1[i]
for f in os.listdir(folder_path + subf):
df["label"].append(label)
df["image_path"].append(folder_path + subf + "/" + f)
# apply the function to bare face and masked dataset
image_info2df(outputpath_1, "mask")
image_info2df(outputpath_2, "face")
label_path = pd.DataFrame.from_dict(df)

Now we have a label_path dictionary to store the label and path for each image. Then, we can simply shuffle the index of each row of label_path to get the test and training split (75% and 25%).

import random 
loc = list(range(data.shape[0]))
random.seed(42)
random.shuffle(loc)
train_loc = loc[:113250] # first 75%
test_loc = loc[113250:]
train_set = data.iloc[train_loc,]

step 3: modeling

Taking advantage of the existing CNN architecture, we adopt the FaceNet[2] implemented with TensorFlow. The package requires the size of input images to be 160*160, which is why we use it as our output image size parameter.

MTCNN

This open-source package includes an MTCNN section that could align the face in the image if there is one and deal with the multi-face detection as well. The details could be found here[3]. As to be coupled with the FaceNet, the MTCNN here will resize the aligned faces to 160*160.

Taken from the original paper
Taken from the original paper
Taken from the original paper

FaceNet

The original FaceNet architecture could be found here[4]. The core idea of the algorithm is to use the triplet loss which takes three samples from the dataset and make the distance of the negative pair as far as possible while making the positive sample as near as possible. The triplet loss which the model is training on is the total loss of the positive and negative pairs.

Taken from the original paper

To use the package we need first align the faces in those images. Run the code below in the terminal after installing the package and other dependencies required. (see the details here) We can have the aligned faces saved in the target folder.

python path_to_align_code\align_dataset_mtcnn.py path_to_input_data\lfw path_to_target_folder\aligned_lfw --image_size 160 --margin 32 --random_order --gpu_memory_fraction 0.25

Replace the three directories in the command with:

  1. Your align_dataset_mtcnn.py location
  2. Your original data location
  3. Target location for processed images

The above command will give you the aligned faces from each image and we can pass all those faces to the FaceNet frame. Run the code below in the terminal to get the embeddings. (see the details here)

python path_to_facenetpackage/facenet/src/classifier.py --test_data_dir "path_to_test_data" --output_path "output_path" TRAIN "path_to_original_data" "code/facenet/src/models/20180402-114759.pb" 

Now, we have the image embeddings and we can further pass the embeddings to any classifier you familiar with to get the wheel running.

Classifier

Here I will use the very basic and simple one — Logistic regression to get a benchmark result for this project.

from sklearn.linear_model import LogisticRegression
import sklearn.metrics as metrics
X = train_embeddings.values
y_train = train_labels.values
X_test = test_embeddings.values
y_test = test_labels.values
logreg = LogisticRegression(random_state=0).fit(X, y_train)y_hat = logreg.predict(X_test)
y_prob = logreg.predict_proba(X_test)
acc = logreg.score(X_test, y_test)
print('training accuracy: ',logreg.score(X, y_train))
print('test accuracy: ', acc)

— the output:

training accuracy:  0.952119107052438
test accuracy: 0.9499305332905846

The accuracy of the test set is about 95% with logistic regression. Comparing the accuracy of the training and test set, we can see that the main issue here is the bias which means we can further increase the complexity of the model by adopting other commonly used classifiers such as SVM, Neural Network, etc.

[1]https://github.com/X-zhangyang/Real-World-Masked-Face-Dataset

[2]https://github.com/davidsandberg/facenet

[3]https://kpzhang93.github.io/MTCNN_face_detection_alignment/paper/spl.pdf

[4]https://arxiv.org/abs/1503.03832