Pandas 的透视表函数主要为 pivot()
和 pivot_table()
,主要的功能为对 DataFrame 的行和列进行重新组合来重塑数据。
一、pivot 函数
pivot 函数只能对数据进行重塑,不能进行聚合
1. 数据准备
import pandas as pd
df1 = pd.DataFrame({
'department_id': [1, 2, 3, 3, 1, 1],
'month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Mar'],
'revenue': [8000, 9000, 10000, 7000, 8000, 6000,]
})
df2 = pd.DataFrame({
'department_id': [1, 1, 2, 3, 3, 1, 2],
'month': ['Jan', 'Jan', 'Jan', 'Jan', 'Feb', 'Mar', 'Feb'],
'revenue': [8000, 9000, 10000, 7000, 8000, 6000, 10000]
})
print(df1)
print(df2)
df1
department_id month revenue
0 1 Jan 8000
1 2 Jan 9000
2 3 Jan 10000
3 3 Feb 7000
4 1 Feb 8000
5 1 Mar 6000
df2
department_id month revenue
0 1 Jan 8000
1 1 Jan 9000
2 2 Jan 10000
3 3 Jan 7000
4 3 Feb 8000
5 1 Mar 6000
6 2 Feb 10000
2. 基本语法
"""
调用方式一:pd.pivot(df, index=, columns=, values=)
调用方式二:df.pivot(index=, columns=, values=)
参数:
1. index:指定作为行索引的列
2. columns:指定作为列索引的列
3. values:指定作为数据值的列
"""
dp = df1.pivot(index="department_id", columns="month", values="revenue")
# dp = pd.pivot(df1, index="department_id", columns="month", values="revenue")
print(dp)
# dp2 = df2.pivot(index="department_id", columns="month", values="revenue")
# error,Index contains duplicate entries, cannot reshape,存在重复数据,pivot 不能聚合,所以报错
month Feb Jan Mar
department_id
1 8000.0 8000.0 6000.0
2 NaN 9000.0 NaN
3 7000.0 10000.0 NaN
二、pivot_table 函数
pivot_table 函数既能对数据进行重塑,也可以进行聚合
1. 数据准备
import pandas as pd
df1 = pd.DataFrame({
'department_id': [1, 2, 3, 3, 1, 1],
'month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Mar'],
'revenue': [8000, 9000, 10000, 7000, 8000, 6000,]
})
df2 = pd.DataFrame({
'department_id': [1, 1, 2, 3, 3, 1, 2],
'month': ['Jan', 'Jan', 'Jan', 'Jan', 'Feb', 'Mar', 'Feb'],
'revenue': [8000, 9000, 10000, 7000, 8000, 6000, 10000]
})
print(df1)
print(df2)
df1
department_id month revenue
0 1 Jan 8000
1 2 Jan 9000
2 3 Jan 10000
3 3 Feb 7000
4 1 Feb 8000
5 1 Mar 6000
df2
department_id month revenue
0 1 Jan 8000
1 1 Jan 9000
2 2 Jan 10000
3 3 Jan 7000
4 3 Feb 8000
5 1 Mar 6000
6 2 Feb 10000
2. 基本语法
"""
调用方式一:pd.pivot_table(df, index=, columns=, values=, aggfunc=)
调用方式二:df.pivot_table(index=, columns=, values=, aggfunc=)
参数:
1. index:指定作为行索引的列
2. columns:指定作为列索引的列
3. values:指定作为数据值的列
4. aggfunc:指定聚合的函数
"""
dp = pd.pivot_table(df1, index="department_id", columns="month", values="revenue")
# dp = df1.pivot_table(index="department_id", columns="month", values="revenue")
print(dp)
month Feb Jan Mar
department_id
1 8000.0 8000.0 6000.0
2 NaN 9000.0 NaN
3 7000.0 10000.0 NaN
# pivot_table 函数会对重复数据进行聚合,默认是 mean 函数
dp2 = pd.pivot_table(df2, index="department_id", columns="month", values="revenue")
print(dp2)
month Feb Jan Mar
department_id
1 NaN 8500.0 6000.0
2 10000.0 10000.0 NaN
3 8000.0 7000.0 NaN
# 使用 aggfunc 参数指定聚合函数
dp3 = pd.pivot_table(df2, index="department_id", columns="month", values="revenue", aggfunc="sum")
print(dp3)
month Feb Jan Mar
department_id
1 NaN 17000.0 6000.0
2 10000.0 10000.0 NaN
3 8000.0 7000.0 NaN
标签:Feb,透视,id,Jan,详解,department,pivot,month,Pandas
From: https://blog.csdn.net/weixin_44480009/article/details/142306055