语法格式
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=_NoDefault.no_default, squeeze=_NoDefault.no_default, observed=False, dropna=True)
常用的几个参数解释:
- by: 可接受映射、函数、标签或标签列表。用于确定分组。
- axis: 接受0(index)或1(columns),表示按行分或按列分。默认按行分。
- level: 接受整数、level名,或序列,默认为None。不能与by选项同时使用。
- as_index: 接受布尔值。默认值为True,表示整合输出时返回以group标签为索引的对象。
- dropna: 布尔值。默认为True,表示删除NA
代码示例
import pandas as pd
#数据框
d1 = [[3,"negative",2,1],[4,None,1,2],[5,"positive",0,2],[6,"positive",2,3],[3,"positive",6,4]]
df1 = pd.DataFrame(d1, columns=["xuhao","result","value1","value2"], index=["a","b","c","a","b"])
print(df1)
# 使用Pandas的groupby()函数按数据框一列分组
groups1 = df1.groupby(['result']).mean()
print(groups1)
groups1_1 = df1.groupby(['result'],dropna=False).mean()
print(groups1_1)
# 使用Pandas的groupby()函数按数据框两列分组
groups2 = df1.groupby(["xuhao",'result']).mean()
print(groups2)
# 使用Pandas的groupby()函数按数据框两列分组,并只求其中一列的均值
groups3 = df1.groupby(["xuhao",'result'])["value1"].mean()
print(groups3)
#将as_index设置为False,使 groupby的结果不以组标签为索引
groups4 = df1.groupby(["xuhao",'result'], as_index=False).mean()
print(groups4)
#按照行索引分组
groups5 = df1.groupby(level=0).mean()
print(groups5)
#当使用.apply()时,group keys默认为True
输出结果
#df1
xuhao result value1 value2
a 3 negative 2 1
b 4 None 1 2
c 5 positive 0 2
a 6 positive 2 3
b 3 positive 6 4
#groups1
xuhao value1 value2
result
negative 3.000000 2.000000 1.0
positive 4.666667 2.666667 3.0
#groups1_1
xuhao value1 value2
result
negative 3.000000 2.000000 1.0
positive 4.666667 2.666667 3.0
NaN 4.000000 1.000000 2.0
#groups2
value1 value2
xuhao result
3 negative 2.0 1.0
positive 6.0 4.0
5 positive 0.0 2.0
6 positive 2.0 3.0
#groups3
xuhao result
3 negative 2.0
positive 6.0
5 positive 0.0
6 positive 2.0
Name: value1, dtype: float64
#groups4
xuhao result value1 value2
0 3 negative 2.0 1.0
1 3 positive 6.0 4.0
2 5 positive 0.0 2.0
3 6 positive 2.0 3.0
#groups5
xuhao value1 value2
a 4.5 2.0 2.0
b 3.5 3.5 3.0
c 5.0 0.0 2.0
注:df.groupby() 返回一系列键值对,print()仅能看到分组结果的数据类型,将分组结果利用list()转换成了list或利用for循环可看到具体内容。
groupby对象操作函数
编写脚本
import pandas as pd
import numpy as np
#数据框
d1 = [[3,"negative",2],[4,"negative",6],[11,"positive",0],[12,"positive",2]]
df1 = pd.DataFrame(d1, columns=["xuhao","result","value"])
print(df1)
#describe()查看每组的统计信息,包括组内样本数、平均值、中位数、方差、最大值和最小值等
group1 = df1.groupby("result").describe()
#group1 = df1.groupby("result").describe()["value"] #仅查看value列
print(group1)
#agg()聚合操作,包括min, max, sum, mean, median, std, var和count
#group2 = df1.groupby("result").agg("mean")
#group2 = df1.groupby("result").agg("mean")["value"] #仅查看value列
group2 = df1.groupby("result")["value"].agg("mean") #
print(group2)
group3 = df1.groupby("result").agg({"xuhao":"sum","value":"mean"})#计算不同列的不同指标
print(group3)
#transform()将计算得到的值直接追加到数据框的最后一列
df1["mean_value"] = df1.groupby("result")["value"].transform("mean")
print(df1)
#apply函数按特定方式计算各组数据,也可自定义函数
group4 = df1.groupby("result").apply(np.mean)
print(group4)
输出结果
#df1
xuhao result value
0 3 negative 2
1 4 negative 6
2 11 positive 0
3 12 positive 2
#group1
xuhao value
count mean std min 25% 50% 75% max count mean std min 25% 50% 75% max
result
negative 2.0 3.5 0.707107 3.0 3.25 3.5 3.75 4.0 2.0 4.0 2.828427 2.0 3.0 4.0 5.0 6.0
positive 2.0 11.5 0.707107 11.0 11.25 11.5 11.75 12.0 2.0 1.0 1.414214 0.0 0.5 1.0 1.5 2.0
#group2
result
negative 4.0
positive 1.0
Name: value, dtype: float64
#group3
xuhao value
result
negative 7 4.0
positive 23 1.0
#df1
xuhao result value mean_value
0 3 negative 2 4.0
1 4 negative 6 4.0
2 11 positive 0 1.0
3 12 positive 2 1.0
#group4
xuhao value mean_value
result
negative 3.5 4.0 4.0
positive 11.5 1.0 1.0
标签:映射器,positive,df1,DataFrame,result,2.0,groupby,mean From: https://www.cnblogs.com/chaimy/p/17352337.html