The DataFrame Type Conversions You Should Know as a Pandas User

https://miro.medium.com/max/1200/0*erUy0b5pFe4xKAKS

Original Source Here

The DataFrame Type Conversions You Should Know as a Pandas User

Photo by Chris Lawton on Unsplash

The Pandas DataFrames have been widely popular among Machine Learning Engineers and Data Scientists for various tabular data analysis, management, and processing tasks.

Tabular data with three columns and three rows (Image by Author)

While the Pandas library is self-sufficient for numerous use-cases and various other Python libraries provide inherent support for Pandas such as Matplotlib, Plotly, Sklearn, etc., there might be situations where you may need to convert your Pandas DataFrame to other supported Datatypes (or Data Structures) in Python.

Even if not, I firmly believe the awareness about these conversions can be helpful in the general usage of Pandas DataFrames.

Therefore, in this post, I will demonstrate different ways to convert a Pandas DataFrame to other widely-used datatypes by developers in the Python community.

The highlight of the article is mentioned below:

· Understanding the Pandas DataFrame
·
Converting to NumPy Array
·
Converting to Python List
·
Converting to Dictionary
·
Conclusion

Let’s begin 🚀!

Understanding the Pandas DataFrame

Before we proceed with various type-conversions of a Pandas DataFrame, let’s briefly understand this data structure.

Simply put, a Pandas DataFrame is a tabular data structure residing inside a Python environment.

Tabular data with three columns and three rows (Image by Author)

It can proficiently perform a wide variety of tabular data operations such as filtering operations, I/O operations, data grouping and aggregation, table joins, column distribution methods, rolling window analysis, and many more.

Of course, one can perform the above operations only when they have a Pandas DataFrame loaded in an existing Python environment/session.

One of the most rudimentary techniques to create a DataFrame is using the pd.DataFrame() method as demonstrated below.

First, we import the required libraries.

Next, we create a DataFrame df from a list of lists data using the pd.DataFrame() method as follows:

We can verify the class of the DataFrame using the type() method in Python:

You can read about various techniques to create a Pandas DataFrame in my previous blog.

Converting to NumPy Array

First and foremost, let’s understand how you can convert a Pandas data object to a NumPy array.

Here, we shall consider the following DataFrame:

Method 1:

You can use the values attribute to convert a Pandas DataFrame to a NumPy array.

We can verify the data type of the result object, which, indeed, is a NumPy array.

Method 2:

Another function available in Pandas is the to_numpy() method.

Note: Pandas official documentation recommends using the df.to_numpy() over the values attribute discussed in Method 1. (Source: here)

Method 3:

Lastly, you can also use the elemental method of NumPy — np.array() to convert a Pandas DataFrame to a NumPy array as follows:

If you want to learn about various methods to create NumPy arrays, you can find my blog below:

Converting to Python List

Next, we shall learn some methods to convert a Pandas DataFrame to a Python list.

Unfortunately, Pandas does not offer a direct method to convert a Pandas DataFrame to a Python list.

Therefore, to achieve this, we should first convert the DataFrame to a NumPy array, followed by the conversion to a list using the tolist() method in NumPy.

As demonstrated above, the approach first converts the DataFrame to a NumPy array using the values attribute discussed in the previous section, post which, we use the tolist() method of NumPy.

Converting to Dictionary

Another popular conversion of the Pandas DataFrame is generating a Python dictionary from it.

As a quick recap, we are using the following DataFrame in this blog:

In Pandas, we can convert a DataFrame to a dictionary using the to_dict() method. Below, we’ll discuss the various formats of the Python dictionary that we can generate using this method.

These formats primarily vary on the type of key-value pairs returned the method. The structure of the dictionary is determined by the orient parameter of the to_dict() method.

Method 1:

With orient='dict' (which is also the default value of the parameter), the method returns a nested dictionary, in which the keys of the outer dictionary are the name of the columns, and the keys of the inner dictionary are index values.

A diagrammatic illustration of the default behavior (orient='dict') is shown below:

Pandas DataFrame to a Python Dictionary (Image by Author)

The code block below demonstrates the output of the to_dict() method.

Method 2:

In contrast to having nested dictionaries as in Method 1, you can generate a dictionary from a DataFrame with key as the column name and the value being the column represented as a list.

You can achieve this by passing orient=”list” to the to_dict() method.

This is depicted in the diagram below:

Pandas DataFrame to a Python Dictionary (Image by Author)

The corresponding implementation is shown below:

Method 3:

Another interesting way of generating a dictionary using the to_dict() method is by specifying the parameter orient=”split”.

The dictionary returned has three key-value pairs. These are:

1. 'index': The value holds the index of the DataFrame as a Python list.
2. 'columns': This is also a list which specifies the name of the columns.
3. 'data': The value of this parameter is a list of list which represents the rows of the DataFrame. The value of this key is the same as what we discussed in 'Converting to Python List' section.
Pandas DataFrame to a Python Dictionary (Image by Author)

The output of this conversion is shown below:

Additionally, this method provides four more representations to obtain a dictionary from a DataFrame.

These areorient='series’, orient='tight,orient='records, and orient='index. You can read about them in the official documentation here. Additionally, this answer on StackOverflow is an excellent resource for learning about them.

Conclusion

To conclude, in this post, I demonstrated various ways to convert a Pandas DataFrame to different Python Data objects.

More specifically, I discussed the conversion of a Pandas DataFrame to a NumPy array, Python List, and a Dictionary.

Note that various other data classes (or data types/frameworks etc.) support conversion to and from a Pandas, such as a DataTable DataFrame, Dask DataFrame, and Spark DataFrame, which I will demonstrate in another post.

Thanks for reading!

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: