Build a Toxic Comments Classifier in Python



Original Source Here

We will use Python programming for this task and will use Jupyter Notebook for running our code snippets.

Step 1: Defining Python Libraries

We are using nltk for text processing and cleaning. We have text data but machine only understands numeric data, therefore, we will use word2vec models to convert text data to numeric vectors.

Finally, to train out the text model, we will use Long Short Term Memory (LSTM).

Step 2: Loading Train and Test Data

Pandas library can help us load our dataset for model training.

train=pd.read_csv("train.csv")
test=pd.read_csv("test.csv",encoding="ISO-8859-1")
Data Screenshot

Step 3: Defining Text Cleaning and Processing Steps

Our task is to remove all the special characters, duplicate words, convert all the words to small case, and remove all the links and unwanted text.

NLTK provides us the list of stop words that we can import as follows:

stop_words = set(stopwords.words('english'))

Let’s now make a function to remove all the stop words.

def CleanText(comment):
comment=processQues(comment)
comment = nltk.word_tokenize(comment)
w = [wd for wd in comment if wd not in stop_words]
return " ".join(w)

After defining the function, it’s time to apply them to our data frame.

train["comment_text"]=train["comment_text"].apply(CleanText)
test["comment_text"]=test["comment_text"].apply(CleanText)
Text Cleaning Result

Step 4: Defining word2vec Model

We will be using a pretrained word2vec model to convert our text to numeric vectors.

Word index

Step 5: Utilizing GloVe word2vec Model

After defining our word2vec model, we can utilize the same models according to our text data.

Finally, we will get a 100-sized vector for all the words in our dataset.

Step 6: Defining Our Model for Data Training

We will use a sequential model in Keras with LSTM layers followed by a dense layer for our final output.

Step 7: Training Our Defined Model

model.fit(X_train, y, batch_size=32, epochs=10, validation_split=0.2, verbose=2)
Model Training

Step 8: Saving and Loading Our Defined Model for Inference

We can save our model using a single line of Python code as shown below:

model.save("model_final.h5")
print("Saved model to disk")

Now, to load the saved model, we can use the load_model function in Keras.

#Load Model
from tensorflow.keras.models import load_model
model_loaded = load_model('model_final.h5')
Loaded model summary

Conclusion

Well, that’s all for this article. We have covered a ready-to-deploy Keras model for classifying toxic comments.

I hope you liked the article. Stay tuned for upcoming articles.

Thanks for the reading!

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: