Original Source Here
The last example is about one of the most frequently used Pandas functions, the groupby. We use it to compare the distinct values or categories in a column based on the values in a different column.
SQL provides the group by statement for this task. Just like with Pandas, we can create nested groups with SQL. We just need to pass the column names to be used for grouping.
Once the groups are created, we can apply aggregate functions on the columns we are interested in.
Let’s group the houses according to the categories in the type and rooms columns. Then, we can calculate the following values for each category:
- Maximum price
- Average distance
- Number of houses in that group
max(price/1000000) as max_price,
cast(avg(distance) as decimal(5,2)) as avg_distance,
count(1) as house_count FROM melb WHERE rooms <= 4 GROUP BY type, rooms
ORDER BY type, rooms;
In order to simplify the result set, I have only included the houses that have 4 or less rooms. The results are ordered by the type and rooms columns.
The Pandas code that performs this task is as follows:
melb[melb.Rooms <= 4]\
.groupby(["Type", "Rooms"], as_index=False).agg(
max_price = ("Price", "max"),
avg_distance = ("Distance", "mean"),
house_count = ("Distance", "count")
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot