Stop Using CSVs for Storage — This File Format Is 150 Times Faster

Original Source Here

How to work with Feather in Python?

Let’s start simple by importing libraries and creating a relatively large dataset. You’ll need Feather, Numpy, and Pandas to follow along. The dataset will have five columns and 10M rows of random numbers:

import feather
import numpy as np
import pandas as pd

np.random.seed = 42
df_size = 10_000_000

df = pd.DataFrame({
'a': np.random.rand(df_size),
'b': np.random.rand(df_size),
'c': np.random.rand(df_size),
'd': np.random.rand(df_size),
'e': np.random.rand(df_size)

Here’s how the dataset looks like:

Image 1 — Random dummy dataset (image by author)

Let’s save it locally next. You can use the following command to save the DataFrame to a Feather format with Pandas:


And here’s how to do the same with the Feather library:

feather.write_dataframe(df, '1M.feather')

Not much of a difference. Both files are saved locally now. You can read them either with Pandas or with the dedicated library. Here’s the syntax for Pandas first:

df = pd.read_feather('1M.feather')

Change it to the following if you’re using the Feather library:

df = feather.read_dataframe('1M.feather')

And that covers everything you should know. The following section covers the comparison with CSV file format — in file size, read, and write times.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: