Original Source Here
The DataFrame Type Conversions You Should Know as a Pandas User
The Pandas DataFrames have been widely popular among Machine Learning Engineers and Data Scientists for various tabular data analysis, management, and processing tasks.
While the Pandas library is self-sufficient for numerous use-cases and various other Python libraries provide inherent support for Pandas such as Matplotlib, Plotly, Sklearn, etc., there might be situations where you may need to convert your Pandas DataFrame to other supported Datatypes (or Data Structures) in Python.
Even if not, I firmly believe the awareness about these conversions can be helpful in the general usage of Pandas DataFrames.
Therefore, in this post, I will demonstrate different ways to convert a Pandas DataFrame to other widely-used datatypes by developers in the Python community.
The highlight of the article is mentioned below:
Let’s begin 🚀!
Understanding the Pandas DataFrame
Before we proceed with various type-conversions of a Pandas DataFrame, let’s briefly understand this data structure.
Simply put, a Pandas DataFrame is a tabular data structure residing inside a Python environment.
It can proficiently perform a wide variety of tabular data operations such as filtering operations, I/O operations, data grouping and aggregation, table joins, column distribution methods, rolling window analysis, and many more.
Of course, one can perform the above operations only when they have a Pandas DataFrame loaded in an existing Python environment/session.
One of the most rudimentary techniques to create a DataFrame is using the
pd.DataFrame() method as demonstrated below.
First, we import the required libraries.
Next, we create a DataFrame
df from a list of lists
data using the
pd.DataFrame() method as follows:
We can verify the class of the DataFrame using the
type() method in Python:
You can read about various techniques to create a Pandas DataFrame in my previous blog.
Converting to NumPy Array
First and foremost, let’s understand how you can convert a Pandas data object to a NumPy array.
Here, we shall consider the following DataFrame:
You can use the
values attribute to convert a Pandas DataFrame to a NumPy array.
We can verify the data type of the
result object, which, indeed, is a NumPy array.
Another function available in Pandas is the
Note: Pandas official documentation recommends using the
valuesattribute discussed in Method 1. (Source: here)
Lastly, you can also use the elemental method of NumPy —
np.array() to convert a Pandas DataFrame to a NumPy array as follows:
If you want to learn about various methods to create NumPy arrays, you can find my blog below:
Converting to Python List
Next, we shall learn some methods to convert a Pandas DataFrame to a Python list.
Unfortunately, Pandas does not offer a direct method to convert a Pandas DataFrame to a Python list.
Therefore, to achieve this, we should first convert the DataFrame to a NumPy array, followed by the conversion to a list using the
tolist() method in NumPy.
As demonstrated above, the approach first converts the DataFrame to a NumPy array using the
values attribute discussed in the previous section, post which, we use the
tolist() method of NumPy.
Converting to Dictionary
Another popular conversion of the Pandas DataFrame is generating a Python dictionary from it.
As a quick recap, we are using the following DataFrame in this blog:
In Pandas, we can convert a DataFrame to a dictionary using the
to_dict() method. Below, we’ll discuss the various formats of the Python dictionary that we can generate using this method.
These formats primarily vary on the type of key-value pairs returned the method. The structure of the dictionary is determined by the
orient parameter of the
orient='dict' (which is also the default value of the parameter), the method returns a nested dictionary, in which the keys of the outer dictionary are the name of the columns, and the keys of the inner dictionary are index values.
A diagrammatic illustration of the default behavior (
orient='dict') is shown below:
The code block below demonstrates the output of the
In contrast to having nested dictionaries as in Method 1, you can generate a dictionary from a DataFrame with
key as the column name and the
value being the column represented as a list.
You can achieve this by passing
orient=”list” to the
This is depicted in the diagram below:
The corresponding implementation is shown below:
Another interesting way of generating a dictionary using the
to_dict() method is by specifying the parameter
The dictionary returned has three key-value pairs. These are:
1. 'index': The value holds the index of the DataFrame as a Python list.
2. 'columns': This is also a list which specifies the name of the columns.
3. 'data': The value of this parameter is a list of list which represents the rows of the DataFrame. The value of this key is the same as what we discussed in 'Converting to Python List' section.
The output of this conversion is shown below:
Additionally, this method provides four more representations to obtain a dictionary from a DataFrame.
orient='index’. You can read about them in the official documentation here. Additionally, this answer on StackOverflow is an excellent resource for learning about them.
To conclude, in this post, I demonstrated various ways to convert a Pandas DataFrame to different Python Data objects.
More specifically, I discussed the conversion of a Pandas DataFrame to a NumPy array, Python List, and a Dictionary.
Note that various other data classes (or data types/frameworks etc.) support conversion to and from a Pandas, such as a DataTable DataFrame, Dask DataFrame, and Spark DataFrame, which I will demonstrate in another post.
Thanks for reading!
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot