我有一个非常大的数据集,名为 bin_df。
使用 pandas 和以下代码,我为每个组分配了小计“总计”:
bin_df = df[df["category"].isin(model.BINARY_CATEGORY_VALUES)]
bin_category_mime_type_count_df = (
bin_df.groupby(["category", "mime_type"])["mime_type"]
.count()
.reset_index(name="Count")
)
bin_category_mime_type_count_df 的输出:
category mime_type Count
1 application/x-executable 19
1 application/x-pie-executable 395
1 application/x-sharedlib 1
2 application/x-sharedlib 755
3 application/x-sharedlib 1
6 application/x-object 129
然后:
bin_category_total_count_df = (
bin_category_mime_type_count_df.groupby(["category", "mime_type"])[
"Count"
]
.sum()
.unstack()
)
bin_category_total_count_df = (
bin_category_total_count_df.assign(
Total=bin_category_total_count_df.sum(1)
)
.stack()
.to_frame("Count")
)
bin_category_total_count_df["Count"] = (
bin_category_total_count_df["Count"].astype("Int64").fillna(0)
)
这会产生以下(默认情况下按类别排序):
Count
category mime_type
1 application/x-executable 19
application/x-pie-executable 395
application/x-sharedlib 1
Total 415
2 application/x-sharedlib 755
Total 755
3 application/x-sharedlib 1
Total 1
6 application/x-object 129
Total 129
我希望它按“总计”排序,然后在类别中我希望它按 mime_type 计数排序:
Count
category mime_type
2 application/x-sharedlib 755
Total 755
1 application/x-pie-executable 395
application/x-executable 19
application/x-sharedlib 1
Total 415
6 application/x-object 129
Total 129
3 application/x-sharedlib 1
Total 1
应该使用哪种函数我看看能得到想要的结果吗?
可以使用排序值对数据框进行排序,并结合自定义排序函数来实现这一点。以下是操作方法:
# 定义自定义排序函数
def custom_sort(row):
if row.name[1] == 'Total': # 检查 mime_type 是否为 'Total'
return (row['Count'], 1) # 对于 'Total' 行,按 'Count' 降序排序,然后按 mime_type 升序排序('Total' 出现在前面)
else:
return (row['Count'], 0) # 对于其他行,按 'Count' 降序排序,然后按 mime_type 降序排序
# 按“Total”列降序排序,然后按自定义排序函数排序
bin_category_total_count_df = bin_category_total_count_df.sort_values(by=['Count'], ascending=False).groupby(level=0, group_keys=False).apply(lambda x: x.sort_values(by=['Count'], ascending=False, key=custom_sort)).reset_index()
这首先会按每个类别(“Total”行)的总计数对数据框进行排序,然后使用
custom_sort
函数在每个类别中对“mime_type”计数进行排序。
custom_sort
函数会优先考虑“Total”行,然后按计数降序对其他 mime 类型进行排序。
此代码应用于的示例数据框后,将生成以下输出:
category mime_type Count
2 application/x-sharedlib 755
1 Total 755
0 application/x-pie-executable 395
3 application/x-executable 19
4 application/x-sharedlib 1
5 Total 415
6 application/x-object 129
7 Total 129
8 application/x-sharedlib 1
9 Total 1
此输出现已按所需的“Total”列排序,并在每个类别中按“Count”列排序。
标签:python,pandas,group-by From: 78795740