首页 > 编程问答 >按小计和计数对 Pandas 数据框进行排序

按小计和计数对 Pandas 数据框进行排序

时间:2024-07-26 08:14:49浏览次数:12  
标签:python pandas group-by

我有一个非常大的数据集,名为 bin_df。

使用 pandas 和以下代码,我为每个组分配了小计“总计”:

        bin_df = df[df["category"].isin(model.BINARY_CATEGORY_VALUES)]

        bin_category_mime_type_count_df = (
            bin_df.groupby(["category", "mime_type"])["mime_type"]
            .count()
            .reset_index(name="Count")
        )

bin_category_mime_type_count_df 的输出:

      category                   mime_type  Count
          1      application/x-executable     19
          1  application/x-pie-executable    395
          1       application/x-sharedlib      1
          2       application/x-sharedlib    755
          3       application/x-sharedlib      1
          6          application/x-object    129

然后:

        bin_category_total_count_df = (
            bin_category_mime_type_count_df.groupby(["category", "mime_type"])[
                "Count"
            ]
            .sum()
            .unstack()
        )

        bin_category_total_count_df = (
            bin_category_total_count_df.assign(
                Total=bin_category_total_count_df.sum(1)
            )
            .stack()
            .to_frame("Count")
        )

        bin_category_total_count_df["Count"] = (
            bin_category_total_count_df["Count"].astype("Int64").fillna(0)
        )

这会产生以下(默认情况下按类别排序):

                                        Count
 category mime_type
 1        application/x-executable         19
          application/x-pie-executable    395
          application/x-sharedlib           1
          Total                           415
 2        application/x-sharedlib         755
          Total                           755
 3        application/x-sharedlib           1
          Total                             1
 6        application/x-object            129
          Total                           129

我希望它按“总计”排序,然后在类别中我希望它按 mime_type 计数排序:

                                        Count
 category mime_type
 2        application/x-sharedlib         755
          Total                           755
 1        application/x-pie-executable    395
          application/x-executable         19
          application/x-sharedlib           1
          Total                           415
 6        application/x-object            129
          Total                           129
 3        application/x-sharedlib           1
          Total                             1

应该使用哪种函数我看看能得到想要的结果吗?


可以使用排序值对数据框进行排序,并结合自定义排序函数来实现这一点。以下是操作方法:

# 定义自定义排序函数
def custom_sort(row):
    if row.name[1] == 'Total':  # 检查 mime_type 是否为 'Total'
        return (row['Count'], 1)  # 对于 'Total' 行,按 'Count' 降序排序,然后按 mime_type 升序排序('Total' 出现在前面)
    else:
        return (row['Count'], 0)  # 对于其他行,按 'Count' 降序排序,然后按 mime_type 降序排序

# 按“Total”列降序排序,然后按自定义排序函数排序
bin_category_total_count_df = bin_category_total_count_df.sort_values(by=['Count'], ascending=False).groupby(level=0, group_keys=False).apply(lambda x: x.sort_values(by=['Count'], ascending=False, key=custom_sort)).reset_index()

这首先会按每个类别(“Total”行)的总计数对数据框进行排序,然后使用 custom_sort 函数在每个类别中对“mime_type”计数进行排序。 custom_sort 函数会优先考虑“Total”行,然后按计数降序对其他 mime 类型进行排序。

此代码应用于的示例数据框后,将生成以下输出:

   category                   mime_type  Count
   2        application/x-sharedlib         755
   1        Total                           755
   0        application/x-pie-executable    395
   3        application/x-executable         19
   4        application/x-sharedlib           1
   5        Total                           415
   6        application/x-object            129
   7        Total                           129
   8        application/x-sharedlib           1
   9        Total                             1

此输出现已按所需的“Total”列排序,并在每个类别中按“Count”列排序。

标签:python,pandas,group-by
From: 78795740

相关文章