Original Source Here
We need problems for learning
Working as a backend software engineer in a big company made me notice that one of the busiest departments of our company is human resources. The company is growing and it makes the HR team be engaged in the hiring process which includes reviewing CVs and deciding if the candidates match the job or not.
Reviewing a CV is not a hard task for old HR team members since they know job specifications and CTO expectations and can find a good candidate in a short time, however, it might be challenging for new team members who are not familiar with companies expectations and should spend more time and effort to find a proper person and their decision might not be always accurate. The question is can we use AI and machine learning to automate this process and help new members with their decisions? Are we able to create a solution using these techniques for our problem?
Hopefully, the answer is YES! We can use AI and deep learning to solve this problem if we create a dataset from CVs and design an algorithm to review and help us make decisions. I asked the HR team to hand me as many CVs as they can and also ask them about important skills regarding each development department and prioritized skills into three categories required, essential and additional. Experience and employee age are examples of important factors for all departments, for example, an employee who has experience in an area has more chance than an employee who just knows different skills and many companies prefer to work with younger employees. For the other companies, different skills might be important like military service or visa.
The machine learning process consists of two important parts, gathering data and making a structure for the algorithm. The first part, collecting data, might be easy, however, creating a structure for the dataset and cleaning data for the algorithm is a bit tricky and difficult.
As can be seen in the picture, there are three critical factors in marking a CV as good, according to our technical department requirement, that are age, experience, and a total skill point. Dataset also contains an accept field which tells us whether the candidate has a chance or not.
Find The Best AI algorithm
There are many algorithms in the AI world, each suitable for a specific kind of problem and since the type of my data is Linear Regression, I decided to use Artificial Neural networks (ANNs), also known as Neural networks (NNs), which are computing systems inspired bu the biological neural networks that constitute animal brains.
Dive into codes
Importing required AI libraries and building the skeleton of a project is the first step in almost all AI projects. These libraries can be imported using the code snippet below.
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with an extensive collection of high-level mathematical functions to operate on these arrays.
After importing libraries we should import datasets for analyzing data and making an AI model for predicting.
We don’t need names and last names and we only need to get the 3rd to 6th columns of data. According to the dataset, the last column shows the employment status of the candidate and we store this data in the Y value for AI training.
Splitting the dataset
To make an AI model, we need test data and a part of the dataset can be used for this purpose, accordingly, I decided to use 20% of data stored in the dataset as test data and the model works with the rest of 80%.
Data should be standardized and to achieve this, it is required to remove the mean and scale to unit variance. Sklearn library is a tool that can help us to standardize test and train values. Fit_transform is a function that fits and transforms train data and transform function and standardizes test data by centering and scaling values.
Initializing the ANN
Initialization starts on the first hidden layer of ANN, where I add function and use a class named Dense. To call this class, three parameters are passed. The first parameter is units that shows the dimensions of the output space. I preferred to pass 9 for this parameter and if you are wondering why 9 and not 3 or 6, I would say this is an experience related issue, although there are three critical inputs and the best value for unit might be 3 or 6, you should test different values to get the best accuracy.
The second parameter is activation and I choosed ReLU activation method(Rectified Linear Activation Function) which is linear function that will output the input directly if it is positive, otherwise the output is zero. The third parameter is input_dim, and according to my dataset the value should be 3.
The second hidden layer is as the same as the first one, with a slight difference, we don’t need to pass input_dim for this layer.
For the output layer, I am using 1 as a value for units since the classification is binary and employee is either accepted, 1, or not-accepted, 0. Also, I used sigmoid function because it exists between 0 to 1 and it is especially used for models where we have to predict the probability as an output and since the probability of anything exists only between 0 to 1, sigmoid seems to be the right choice.
Compiling the ANN
Now, it is time to compile the ANN with compile method. I used the default optimizer, adam, and binary_crossentropy for loss since my dataset is binary. I also choosed accuracy as value for metrics parameter.
The next step is making fit data and start training models based on the test and train chunks of dataset. I set batch_size to 32 and epochs to 1000 to the best accuracy on the training model.
As can be seen I can get 97% accuracy for the ANN model. So if the human resource department uses this model, it can easily decide about new CVs. Like any other ANN model, this model needs to be updated with new data and datasets too in order to match new department requirements. For instance, a department may change its requirement and technology becomes less important, therefore it should be removed from new CVs when entering data.
Life gets boring when we are forced to do some routine tasks over and over and at some points it becomes annoying, specially if it is a data related work and requires our attention to review data and make a decision. Computer science and specifically, artificial intelligence and machine learning are able to solve this problem for us and make our life much easier. If we have the knowledge of creating proper dataset and choosing right functions and methods, we can model almost any real life problems and let the machine help us to make the best decision.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot