Original Source Here
Building an All-In-One Audio Analysis Toolkit in Python
Language forms the basis of every conversation between humans. Due to this, the field of Natural Language Processing (or NLP for short) undoubtedly holds immense potential in assisting humans with their day-to-day lives.
In simple words, the domain of NLP comprises a set of techniques that aim to comprehend human language data and accomplish a downstream task.
NLP techniques encompass numerous areas such as Question Answering (QA), Named Entity Recognition (NER), Text Summarization, Natural Language Generation (NLG), and many more.
While most of the prior research and development in NLP has primarily focused on applying various techniques, specifically over “textual” data, in recent times, the community has witnessed a tremendous adoption of speech-based interaction, veering machine learning engineers to experiment and innovate in the speech space as well.
Therefore, in this blog, I will demonstrate an all-encompassing audio analysis application in Streamlit that takes an audio file as input and:
1. Transcribes the audio
2. Performs sentiment analysis on the audio
3. Summarizes the audio
4. Identifies named entities mentioned in the audio
5. Extracts broad ideas from the audio
To achieve this, we will use the AssemblyAI API to transcribe the audio file and Streamlit to build the web application in Python.
The image below depicts what this application will look like once it is ready.
Let’s begin 🚀!
Before building the application, it will be better to highlight the workflow of our application and how it will function.
A high-level diagrammatic overview of the application is depicted in the diagram below:
The Streamlit web application will first take an audio file as input, as described above.
Next, we will upload it to AssemblyAI’s server to obtain a URL for the audio file. Once the URL is available, we shall create a POST request to the transcription endpoint of AssemblyAI and specify the downstream task we wish to perform on the input audio.
Lastly, we will create a GET request to retrieve the transcription results from AssemblyAI and display them on our streamlit application.
This section will highlight some prerequisites/dependencies for building the audio toolkit.
#1 Install Streamlit
Building web applications in Streamlit requires installing the Streamlit python package locally.
#2 Get the AssemblyAI API Access Token
To access the transcription services of AssemblyAI, you should obtain an API access token from their website. For this project, let’s define it as
#3 Import Dependencies
Lastly, we will import the python libraries that we will be required in this project.
With this, we are ready to build our audio analysis web application.
Building the Streamlit Application
Next, let’s proceed with building the web application in Streamlit.
Our application, as discussed above, will comprise four steps. These are:
1. Uploading the file to AssemblyAI
2. Sending the Audio for transcription through a POST request
3. Retrieving the transcription results with a GET request
4. Displaying the results in the web application
To achieve this, we shall define four different methods, each dedicated to one of the four objectives above.
However, before we proceed, we should declare the headers for our request and define the transcription endpoints of AssemblyAI.
- Method 1:
The objective of this method is to accept the audio file obtained from the user and upload it to AssemblyAI to obtain a URL for the file.
Note that it is not necessary to upload the audio file to AssemblyAI as long as you can access it via a URL. Therefore, if the audio file is already accessible with a URL, you can skip implementing this method.
The implementation of
upload_audio() method is shown below:
The function accepts the
audio_file as an argument and creates a POST request at the
upload_endpoint of AssemblyAI. We fetch the
upload_url from the JSON response returned by AssemblyAI.
- Method 2:
As the name suggests, this method will accept the URL of the audio file obtained from
upload_audio() method above and send it for transcription to AssemblyAI.
In the JSON object above, we specify the URL of the audio and the downstream services we wish to invoke at AssemblyAI’s transcription endpoint.
For this project, these services include sentiment analysis, topic detection, summarization, entity recognition, and identifying all the speakers in the file.
After creating a POST request at the
transcription_endpoint, we return the
transcription_id returned by AssemblyAI, which we can later use to fetch the transcription results.
- Method 3:
The penultimate step is to retrieve the transcription results from AssemblyAI. To achieve this, we must create a GET request this time and provide the unique identifier (
transcription_id) received from AssemblyAI in the previous step.
The implementation is demonstrated below:
As the transcription time depends on the duration of the input audio file, we have defined a while loop to create repeated GET requests until the
status of our request changes to
completed or the transcription request indicates an
The transcription response received for a particular audio file is shown below:
- Method 4:
The final method in this application is to print the results obtained from AssemblyAI on the Streamlit application.
To avoid clutter and textual chaos on the application’s front-end, we shall encapsulate each of the services within a Sreamlit expander.
The keys from the transcription response that are pertinent to this project are:
text: This contains the transcription text of the audio.
iab_categories_result: The value corresponding to this key is a list of topics identified in the audio file.
chapters: This key indicates the summary of the audio file as different chapters.
sentiment_analysis_results: As the name suggests, this key holds the sentence-wise summary of the audio file.
entities: Lastly, this key stores the entities identified in the audio file.
Integrating the Functions in Main Method
As the final step in building our Streamlit application, we integrate the functions defined above in the
First, we create a file uploader for the user to upload the audio file.
Once the audio file is available, we send it to Method 1 (
upload_audio), followed by transcribing the audio (
transcribe) and retrieving the results (
get_transcription_result), and we finally display the results (
print_results) to the user on the Streamlit application.
Executing the Application
Our audio analysis application is ready, and now it’s time to run it!
To do so, open a new terminal session. Next, navigate to your working directory and execute the following command after replacing
file-name.py with the name of your python file:
streamlit run file-name.py
The uploader above asks you to upload an audio file. Once you do that, the functions defined above will be executed sequentially to generate the final results.
The transcription results on the uploaded file are shown below:
In this section, we will discuss the results obtained from the transcription models of AssemblyAI.
A part of the transcription of the input audio is shown in the image below.
The broad topics discussed in the entire audio by the speaker(s) are shown in the image below.
To generate a summary, AssemblyAI’s transcription services first break the audio into different chapters and then summarizes each chapter individually.
The summary of the input audio file is shown below.
AssemblyAI classifies each sentence into three categories of sentiments —
The sentiment of the first three sentences in the audio is shown below. They were precisely classified as
Neutral by the transcription module.
Finally, the entities identified in the audio and their corresponding entity tags are shown below.
To conclude, in this post, we built a comprehensive audio application to analyze audio files using the AssemblyAI API and Streamlit.
Specifically, I demonstrated how to perform various downstream NLP tasks on the input audio, such as transcription, summarization, sentiment analysis, entity detection, and topic classification.
Thanks for reading!
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot