3 Tricky Data Analysis Questions and How to Solve Them with Pandas



Original Source Here

Question 1

The price column contains some missing values. How can we replace these missing values with the average price of the product? Please note that we should not fill the missing values with the average value of the price column. The missing values for apples need to be filled with the average price of apples, and so on.

Solution 1

There are multiple ways of doing this operation. One of the most practical options is to use the groupby function inside the fillna function.

Let’s first check the average price of each product.

grocery.groupby('product_description')['price'].mean()# output
product_description
apple 2.077778
butter-0.25 11.400000
cucumber 4.532857
grape 4.400000
milk-1.5 6.078571
onion 2.150714
orange 2.714286
plum 4.389655
tomato 3.121034
yogurt-1 6.693103
Name: price, dtype: float64

We cannot use this line of code inside the fillna function though. Instead, we will change it slightly by adding the transform method.

grocery["price_new"] = grocery['price'].fillna(
grocery.groupby('product_description')['price'].transform("mean")
)

Let’s check the results by comparing the missing values in the price column with the values in the new column.

grocery[grocery["price"].isna()].head()

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: