Head Pose Estimation using Python



Original Source Here

Implementation

Install the libraries

Before we get into the implementation, the first step is to install the libraries. In this case, we will install the OpenCV and the Mediapipe library by using pip. On your terminal, please write this command:

pip install opencv-python
pip install mediapipe

Load the libraries

After we install the libraries, the next step is to load the libraries into our code. We will import the NumPy, OpenCV, and Mediapipe libraries. Please add this line of code:

Initialize objects

After we load the libraries, the next step is to initialize several objects. There are two objects that we initialize. They are:

  • The FaceMesh object from the Mediapipe library. This object will detect faces and also detect keypoints from one or more faces.
  • The VideoCapture object from the OpenCV library. This object will be used for retrieving images from the webcam. We set a parameter on the object with 0 for retrieving images from the webcam.

Please add these lines of code:

Capture the image

Now we have initialized the objects. The next step is to capture the image from the webcam. Please add this line of code for doing that:

Process the image

After we capture the image, the next step is to process the image. For your information, the OpenCV and the Mediapipe library read their image differently.

On the OpenCV library, the image is in BGR color space. Meanwhile, the mediapipe library needs an image with RGB color space.

Therefore, we need to convert the color space to RGB first, apply face landmark detection, then convert it back to BGR color space.

Please add these lines of code (Be careful with the indentation):

Retrieve the 2D and the 3D coordinates

After we process the image, the next step is to retrieve the keypoint coordinates. For your information, the mediapipe’s face landmark detection algorithm catches around 468 keypoints from a face. Each keypoint is on 3D coordinates.

For head pose estimation, we don’t have to use all the keypoints. Instead, we choose 6 points that at least can represent a face. Those points are on the edge of the eyes, the nose, the chin, and the edge of the mouth.

For accessing those points, we refer to the index that has been used on the BlazeFace model. I’ve already marked the index. Here is the picture of it:

The image is retrieved from TensorFlow’s GitHub repository and has been edited by the author.

Now let’s extract those keypoints. For 2D coordinates, we will take only the x and y-axis coordinates. And for 3D coordinates, we retrieve all of the axes.

But before we extract those keypoints, we have to multiply the x-axis with the image’s width. Also, we multiply the y-axis with the image’s height.

Also, we will take the nose coordinates. We do that to display the projection of our nose on the image space.

Now let’s add these lines of code, and be careful with the indentation:

Get the camera matrix

Now we have the 2D and 3D coordinates of our face keypoints. The next step is to get the camera matrix. Let’s recall the camera matrix once again.

As you can see from above, we need to have several parameters. The first one is the focal point. We can get the focal point (fx and fy) by taking the width of the image.

The second parameter that we will take is the skew parameter. The parameter has the symbol of gamma. For this parameter, we set the value to 0.

The third parameter is the center coordinate of our image. We will set the u0 with the image’s width set the v0 with the image’s height.

Now let’s create the matrix by generating a NumPy array. Now let’s add these lines of code:

Apply the PnP problem

Before we apply the PnP problem, we need to add another matrix. It’s a distance matrix. This matrix only contains zero, and it has a shape of 4×1. Now let’s create the matrix by adding this line of code:

We have all the inputs, ranging from the 2D coordinates, the 3D coordinates, the camera parameters matrix, and the empty distance matrix. Let’s apply the PnP to our problem by adding this line of code:

Convert the rotational vector into a matrix

From this process, now we have the translational vector and the rotational vector. Wait, the rotational section is not in a matrix format. We cannot retrieve the rotation angle with it.

Don’t worry. We can convert the vector into the matrix by using the cv2.Rodrigues function.

Now let’s add this line of code:

Get the angles

Now we have the rotational matrix. Now let’s retrieve the rotational angle on each axis. For doing that, we can use the cv2.RQDecomp3x3 function for extracting the angles.

Now let’s add this line of code:

Catch the head’s direction and display the result

The last step that we have to do now is to determine where our head is heading. We display the direction and create a line to see the projection of our nose on the image space.

Now let’s add these lines of code:

By combining all of those lines of code, the result will look like this:

The GIF is captured by the author.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: