Original Source Here
We will use Python programming for this task and will use Jupyter Notebook for running our code snippets.
Step 1: Defining Python Libraries
We are using
nltk for text processing and cleaning. We have text data but machine only understands numeric data, therefore, we will use word2vec models to convert text data to numeric vectors.
Finally, to train out the text model, we will use Long Short Term Memory (LSTM).
Step 2: Loading Train and Test Data
Pandas library can help us load our dataset for model training.
Step 3: Defining Text Cleaning and Processing Steps
Our task is to remove all the special characters, duplicate words, convert all the words to small case, and remove all the links and unwanted text.
NLTK provides us the list of stop words that we can import as follows:
stop_words = set(stopwords.words('english'))
Let’s now make a function to remove all the stop words.
comment = nltk.word_tokenize(comment)
w = [wd for wd in comment if wd not in stop_words]
return " ".join(w)
After defining the function, it’s time to apply them to our data frame.
Step 4: Defining word2vec Model
We will be using a pretrained word2vec model to convert our text to numeric vectors.
Step 5: Utilizing GloVe word2vec Model
After defining our word2vec model, we can utilize the same models according to our text data.
Finally, we will get a 100-sized vector for all the words in our dataset.
Step 6: Defining Our Model for Data Training
We will use a sequential model in Keras with LSTM layers followed by a dense layer for our final output.
Step 7: Training Our Defined Model
model.fit(X_train, y, batch_size=32, epochs=10, validation_split=0.2, verbose=2)
Step 8: Saving and Loading Our Defined Model for Inference
We can save our model using a single line of Python code as shown below:
print("Saved model to disk")
Now, to load the saved model, we can use the
load_model function in Keras.
from tensorflow.keras.models import load_model
model_loaded = load_model('model_final.h5')
Well, that’s all for this article. We have covered a ready-to-deploy Keras model for classifying toxic comments.
I hope you liked the article. Stay tuned for upcoming articles.
Thanks for the reading!
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot