Original Source Here
4 Amazing Python Libraries That You Should Try Right Now
These are some of the coolest less known Python libraries that you probably don’t know about, but should
I’m a big fan of Python libraries, and I really enjoy them. In fact, some of my most popular blogs are about Python libraries. They have the power to turn hours of work and countless lines of code into two or three, and the best part: they are free. However, most people focus on those most popular libraries and ignore that dozens of thousands of Python libraries can accomplish results that you would take hours to do manually. While I have a deep respect for all the brave people there who like to do things manually, which has its benefits, sometimes we need to make it more efficient.
The problem is that trying new libraries can be time-consuming. You need to install, test it out, read the documentation, etc., and not always get what you wanted. As I mentioned in the beginning, I love writing about new Python libraries and sharing what I learned about them. Today I will go over some of the most remarkable libraries that I have discovered recently. With no more introduction, let’s jump into these super cool libraries.
What it does: Data analysis and exploration library
How easy it is to use: Very Easy
Who should use it: Everyone on this planet!
Short story. I was writing a blog that would show 15 Python libraries that I learned about recently, and Bamboolib would be the first one I would write about. The focus of that blog was to introduce some libraries quickly, but Bamboolib impressed me so much that it felt wrong not to give the attention it deserves.
Their creators say that Bamboolib is a data analysis tool where you can explore the data and get insights without having to code, which would allow people without any coding experience to enjoy the wonders that Python can do with data analysis. On top of that, it can save the time of data analysts and data scientists out there because it doesn’t need any prior coding experience.
Looking at their website, it seems like there is a free and a paid version, where the paid version focuses on companies. The good news is that I tested the free version and I didn’t see find any limitations. The other thing is that Bamboolib was created by the same guys who made PyForest, a library I talked about here. Now, let’s check what check all the things that we can accomplish with Bamboolib.
I recommend the use of an environment to test Bamboolib. Still, if you don’t care about it, you can install Bamboolib just by typing
pip install — upgrade bamboolib — user in your terminal. Then, you will need to install the extension to Jupyter Notebook by typing
python -m bamboolib install_nbextensions, and you should be good to go. If you prefer creating an environment, copy and paste the following code in your Jupyter Notebook. After a few moments, you will be ready to start exploring Bamboolib.
# Create conda environment
!conda create -n bamboolib python=3.7 -y# Activate the environment
!conda activate bamboolib# Add the IPython kernel to Jupyter
!conda install jupyter -y
!conda install ipykernel -y
!python -m ipykernel install — user — name bamboolib# Run this if you use Jupyterlab
!conda install jupyterlab -y# Install bamboolib …
!pip install — upgrade bamboolib — user# Jupyter Notebook extensions
!python -m bamboolib install_nbextensions# Run this if you use Jupyterlab: JupyterLab extensions
!python -m bamboolib install_labextensions
Now, let’s import Bamboolib, Pandas, and the famous Titanic dataset to explore Bamboolib a bit.
# Import bamboolib, Pandas, and the Titanic dataset
import bamboolib as bam
import pandas as pd
df = pd.read_csv(bam.titanic_csv)
Ok, now we are ready to start using Bamboolib. All you need to do to type df, and you will see multiple options to begin exploring the dataset.
From here, Bamboolib has a lot of similarities with other low-code data exploration libraries. However, two features are game-changing. The first one is that you can easily filter, join, concat, group by, change data types, rename a column, and much more.
The second thing is that you can see the code and reuse it whenever way you want. Thus, this will be a lifesaver if you are learning Python or just working on a project where you need to use the code but want to save some time.
And how does it work? All you need to do is choose the transformation you want to do, select the columns, the aggregations, if necessary, and Bamboolib will do all the rest. I’m doing a group by transformation using the columns
Age as you can see below. Then, I will check the
Count the values. You will also see in the cell at the top that a code shows up. You can use that table anywhere you want, even if you don’t have the Bamboolib installed. Pretty awesome, right?
Take a moment to see the code below. There is some advanced code here. Even advanced Pandas users can learn a few things with this code and improve their skills.
You can also create visualizations using Bamboolib, and again, you can see the code. There are multiple ways to edit your plots, and you can create very sophisticated data visualizations with it. Another cool thing is that it uses Plotly Express, which has some outstanding-looking charts.
And finally, the Explore DataFrame function brings allows you to explore the dataset with a few clicks. You can even explore the features individually. Why would we type countless lines of code to get simple information if we can click a few times and get everything we need?
I could spend hours talking about Bamboolib, but we need to move to the next library. I highly recommend you explore this library. Bamboolib deserves its own blog, and I will work on it soon, but for now, enjoy exploring it.
TensorFlow Data Validation
What it does: Makes data exploration for machine learning more intuitive
How easy it is to use: Easy
Who should use it: For those who need to a quick look at a dataset stats.
TensorFlow Data Validation is a dataset that makes data exploration when creating machine learning models easier. Although it is not as powerful as Bamboolib, it’s worth knowing about it. It’s the solution that TensorFlow found to make exploring the data less stressful and save users from typing a few lines of code. You can check for missing data, outliers, data anomalies, redundant features, features with little or no unique predictive information, labels treated as features, and more. This is what TensorFlow says on its website:
TensorFlow Data Validation identifies anomalies in training and serving data, and can automatically create a schema by examining the data. The component can be configured to detect different classes of anomalies in the data.
If I have convinced you to give it a try, you can install it by typing
pip install tensorflow-data-validation in your terminal. Once the installation is done, we are ready to start using it. We will need Pandas to import the data. I will be using (again!) the Titanic dataset.
# Import Tensorflow Data Validation
import tensorflow_data_validation as tfdv# Import Pandas
import pandas as pd# Import Titanic dataset
df = pd.read_csv('train.csv')
Ok, now we are good to go. To see TFDV in action, type the following code, and once you run the cell, you will see that TFDV will return a nice-looking descriptive statistics table.
stats = tfdv.generate_statistics_from_dataframe(df)
We can see in the image above that, with one click, we got a table with a lot of information. Let’s now focus on the image below to see what kind of information we got.
We can see here that we got descriptive statistics for each of the features. TFDV separates numeric features from categorical features. We can also see that it highlights some important information, such as a high number of missing data. In this case, it seems like Age has 19.87% missing values. We can also see features with a high number of zeros. For example, it makes sense that Survived has a high number since 0 means that the passenger did not survive. We can also see mean, standard deviation, minimum, median, and maximum values. On the right, some charts make it very easy to see the data distribution, which I will discuss a bit more.
Looking at the categorical features, we can see missing, unique, and most frequent values. There is also the average string length.
As I promised, let’s talk a bit more about data visualization. If you click on to expand, you will extend the data visualizations and explore the dataset. It’s important to note that the main focus of TFDV is to analyze the dataset in preparation to run a machine learning model, and for this, TFDV works well.
There are some more cool features that you can explore, and TFDV has excellent documentation. As I mentioned before, it’s not as powerful as Bamboolib, but it’s more straightforward to use.
What it does: Creates fake data
How easy it is to use: Easy
Who should use it: For those who need to create fake data, like fake names and fake addresses for projects
Faker has a very straightforward name. It creates fake data for projects, such as name, addresses, phone number, job title, IP address, Social Security Number, and text. If you are working on a project that only has IDs, you can make it a little more personal and intuitive, creating fake names. As we worry more about privacy, this is a nice way to work with actual data without exposing people’s privacy in our projects. Even for school, a dataset would look way more realistic if we had more information about it.
To install it, you can type pip install Faker in your terminal. Here is a demonstration of some cool features:
# Import Faker
from faker import Faker
from faker.providers import internet
fake = Faker()# Create fake name
# Create fake address
# Create fake job title
# Create fake SSN
# Create fake phone number
# Create fake time
# Create fake text
There are some other cool features, such as changing the results’ language, which allows you to do all the cool stuff that I just showed you in other languages. I speak Portuguese, and the names and addresses seem real enough.
Faker has a well done documentation and, if you are interested, you should check it out. They have so much more fake information that they can create, so keep it in mind for your next project.
What it does: Downloads Kaggle datasets to the same folder as your Jupyter Notebook
How easy it is to use: Easy
Who is this for?: Everyone.
Let’s say you are starting a project to practice your data analysis and machine learning skills. Where do you start? Most people go to Kaggle, find an exciting dataset, download the file, find the file in the Downloads folder, and drag the file to the folder where the notebook you are working on is. Quite a few steps, right? What if there were a better way? Well, that’s what OpenDataSets solves.
OpenDataSets allows us to download the dataset from a notebook. It will create a folder with the dataset in the same folder where your notebook saves some time. Cool, right?
To use it, you need to type
pip install opendataset in your terminal. Then, you need to import it to the notebook by typing
import opendatasets as od, and you are good to go. Kaggle will ask for your credentials, which you can quickly get on your Kaggle profile page. In the example below, I want to download the famous heart attack dataset.
First, let’s install OpenDataSets. You can type
pip install opendatasets, and you should be good to go. Then, in your notebook, you can copy and paste the following code:
import opendatasets as od
Now, we can add the dataset URL on Kaggle.
import opendatasets as odod.download("https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset")
The folder on the left of the image didn’t have the folder with the heart attack dataset. However, as soon as I run the code, it downloads the dataset for me. You can see that the dataset comes unzipped. It could not be any easier.
As we saw in this blog, libraries can have the power to make some time-consuming libraries into simple tasks. It’s amazing how much we can accomplish with one or two lines of code. Even people without much code experience can perform complex data analysis. Of course, they can’t substitute the work of a professional who study Python, but they can be used to improve even long-term professionals skills.
It’s nice to remember that Python libraries should be seen as a way to improve our work. I don’t recommend you relying only on these libraries, but use them as an add-on. Let me know if you decide to test any of this libraries or if you have any recommendations that I could test. Happy coding!
You might also like…
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot