Boy or Girl? A Machine Learning Web App to Detect Gender from Name*EzRDCtJlXZyVeTVm

Original Source Here

Boy or Girl? A Machine Learning Web App to Detect Gender from Name

Find out a name’s likely gender using Natural Language Processing in Tensorflow, Plotly Dash, and Heroku.

Photo by Dainis Graveris on Unsplash

Choosing a name for your child is one of the most stressful decisions you’ll have to make as a new parent. Especially for a data-driven guy like me, having to decide on a name without any prior data about my child’s character and preferences is a nightmare come true!

Since my first name starts with “Marie,” I’ve gone through countless experiences of people addressing me as “Miss” over emails and text only to be disappointed to realize that I’m actually a guy when we finally meet or talk 😜. So, when my wife and I were researching names for our baby girl, an important question we asked ourselves was:

Will people be able to identify that the name refers to a girl and not a boy?

It turns out we can use Machine Learning to help us check if potential names would be associate more with boys or girls! To check out the app I’ve built to do exactly this, please head over to

Gif video of the “boyorgirl” app. Image by Author

The rest of this post talks about the technical details, including

  1. Obtaining a name to the gender training dataset
  2. Preprocessing the names to make them compatible with Machine Learning (ML) models
  3. Developing a Natural Language Processing (NLP) ML model to read in a name and output if it’s a boy’s name or a girl’s name
  4. Building a simple web app for people to interact with the model
  5. Publishing the app on the internet

The Architecture of the Solution

Application architecture. Image by Author

Obtaining a Name to Gender Training dataset

To train any Machine Learning model, we need a large quantity of labeled data. In this case, we need a large number of names and the associated gender of that name. Luckily, Google Cloud’s Bigquery has a free open dataset called USA_NAMES [Link] that “contains all names from Social Security card applications for births that occurred in the United States.” The dataset contains roughly 35000 names and the associated gender, which works very well for our model.

Dataset snippet. Image by Author

Data Preprocessing

Human names are textual data, while ML models can only work with numeric data. To convert our text into a numeric representation, we’ll do the following steps.

Name encoding. Image by Author
  1. Lowercase the name since each character’s case doesn’t convey any information about a person’s gender.
  2. Split each character: The basic idea of the ML model we’re building is to read characters in a name to identify patterns that could indicate masculinity or femininity. Thus we split the name into each character.
  3. Pad names with empty spaces until a max of 50 characters ensures the ML model sees the same length for all the names.
  4. Encode each character to a unique number since ML models can only work with numbers. In this case, we encode ‘ ’ (space) to 0, ‘a’ to 1, ‘b’ to 2, and so on.
  5. Encode each gender to a unique number since ML models can only work with numbers. In this case, we encode ‘F’ to 0 and ‘M’ to 1.
Dataset snippet after preprocessing. Image by Author

NLP ML Model

When we read a name, we identify the probable gender of that name by the sequence in which characters appear in that name. For example, “Stephen” is most likely a boy’s name but “Stephanie” is likely a girl’s name. To mimic the way we humans identify the gender of a name, we construct a simple Bidirectional LSTM model using the tensorflow.keras API.

Model Architecture

  1. Embedding layer: to “embed” each input character’s encoded number into a dense 256 dimension vector. The choice of embedding_dim is a hyperparameter that can be tuned to get the desired accuracy.
  2. Bidirectional LSTM layer: read the sequence of character embeddings from the previous step and output a single vector representing that sequence. The values for units and dropouts are hyperparameters as well.
  3. Final Dense layer: to output a single value close to 0 for ‘F’ or close to 1 for ‘M’ since this is the encoding we used in the preprocessing step.

Training the Model

We’ll use the standard tensorflow.keras training pipeline as below

  1. Instantiate the model using the function we wrote in the model architecture step.
  2. Split the data into 80% training and 20% validation.
  3. Call with EarlyStopping callback to stop training once the model starts to overfit.
  4. Save the trained model to reuse while serving the web app.
  5. Plot the training and validation accuracies to visually check the model performance.
Training Accuracies. Image by Author

Web App

Now that we have the trained model with good accuracy, we can create a Plotly Dash [Link] web app [] to get input names from a user, load the model (only once during app startup), use the model to predict the gender on the input names, and visualize the results back on the web app. The below code snippet only shows the model inference part of the web app. The full Plotly Dash web app code, including the model load, text box input, table output, and interactive bar plot output, is available on my GitHub repository.

Sample inference results. Image by Author

Publish on the Internet

The final step is to publish our new app on the internet for everyone worldwide to interact with. After a little bit of research, I decided to use Heroku to deploy the app for the following reasons.

  1. Free!!!
  2. A straightforward deployment process
  3. Max 500MB memory is sufficient for my small custom model.

Deploy a mono-repo

The steps to deploy an app to Heroku are well documented on the Heroku website [Link]. I made custom changes to support my mono-repo format, which I’ve documented in my Github repo [Link]. The main changes to support mono-repos are

  • Add the following buildpack
heroku buildpacks:add -a <app> -i 1
  • Add the following configs
heroku config:set -a <app> PROCFILE=relative/path/to/app/Procfile

heroku config:set -a <app> APP_BASE=relative/path/to/app

Specify the number of workers

Important! One major gotcha that took me some time to figure out is that Heroku starts two workers by default. So if your app’s size is more than 250MB, the two workers combined would exceed the 500MB limit that Heroku sets for free apps. Since I don’t expect the traffic to my app to be large enough to need two workers, I easily solved this issue by specifying only one worker by using the -w 1 flag in the Procfile.

web: gunicorn -w 1 serve:server

Setting up a ping service

Free tier Heroku apps go to sleep after 30 mins of inactivity. Thus I used a free ping service called Kaffeine [Link] to ping my app once every 10mins. This will ensure zero downtime for the app.

Subsequently, I upgraded from Free tier to Hobby tier (mainly for Free SSL on my custom domain), ensuring the app never sleeps, so the ping service is no longer relevant for my app.

Adding a custom domain

Finally, I purchased a cheap domain from Namecheap [Link] and pointed the domain to my Heroku app following instructions from here [Link].


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: