- 学习:知识的初次邂逅
- 复习:知识的温故知新
- 练习:知识的实践应用
目录
一,原题力扣链接
二,题干
表:
Players
+----------------+---------+ | Column Name | Type | +----------------+---------+ | player_id | int | | player_name | varchar | +----------------+---------+ player_id 是这个表的主键(具有唯一值的列) 这个表的每一行给出一个网球运动员的 ID 和 姓名表:
Championships
+---------------+---------+ | Column Name | Type | +---------------+---------+ | year | int | | Wimbledon | int | | Fr_open | int | | US_open | int | | Au_open | int | +---------------+---------+ year 是这个表的主键(具有唯一值的列) 该表的每一行都包含在每场大满贯网球比赛中赢得比赛的球员的 ID编写解决方案,找出每一个球员赢得大满贯比赛的次数。结果不包含没有赢得比赛的球员的ID 。
结果集 无顺序要求 。
结果的格式,如下所示。
示例 1:
输入: Players 表: +-----------+-------------+ | player_id | player_name | +-----------+-------------+ | 1 | Nadal | | 2 | Federer | | 3 | Novak | +-----------+-------------+ Championships 表: +------+-----------+---------+---------+---------+ | year | Wimbledon | Fr_open | US_open | Au_open | +------+-----------+---------+---------+---------+ | 2018 | 1 | 1 | 1 | 1 | | 2019 | 1 | 1 | 2 | 2 | | 2020 | 2 | 1 | 2 | 2 | +------+-----------+---------+---------+---------+ 输出: +-----------+-------------+-------------------+ | player_id | player_name | grand_slams_count | +-----------+-------------+-------------------+ | 2 | Federer | 5 | | 1 | Nadal | 7 | +-----------+-------------+-------------------+ 解释: Player 1 (Nadal) 获得了 7 次大满贯:其中温网 2 次(2018, 2019), 法国公开赛 3 次 (2018, 2019, 2020), 美国公开赛 1 次 (2018)以及澳网公开赛 1 次 (2018) 。 Player 2 (Federer) 获得了 5 次大满贯:其中温网 1 次 (2020), 美国公开赛 2 次 (2019, 2020) 以及澳网公开赛 2 次 (2019, 2020) 。 Player 3 (Novak) 没有赢得,因此不包含在结果集中。
三,建表语句
import pandas as pd
data = [[1, 'Nadal'], [2, 'Federer'], [3, 'Novak']]
players = pd.DataFrame(data, columns=['player_id', 'player_name']).astype({'player_id':'Int64', 'player_name':'object'})
data = [[2018, 1, 1, 1, 1], [2019, 1, 1, 2, 2], [2020, 2, 1, 2, 2]]
championships = pd.DataFrame(data, columns=['year', 'Wimbledon', 'Fr_open', 'US_open', 'Au_open']).astype({'year':'Int64', 'Wimbledon':'Int64', 'Fr_open':'Int64', 'US_open':'Int64', 'Au_open':'Int64'})
四,分析
第一步,分别以赛事聚合,求每个赛事的次数 最后union合并
df1 = championships.groupby('Wimbledon')['Wimbledon'].count()
df2 = championships.groupby('Fr_open')['Fr_open'].count()
df3 = championships.groupby('US_open')['US_open'].count()
df4 = championships.groupby('Au_open')['Au_open'].count()
df5 = pd.concat([df1,df2,df3,df4])
第二步,对这个表 以球员id分组 sum 次数
期间把 series对象转为 datafarme对象
df6 = df5.to_frame()
df6['player_id'] = df6.index
df6['次数'] = df6[0]
df7 = df6.groupby(['player_id'])['次数'].sum()
第三步: 合并2个表 并且修改表名 使用marge 内连接
df8 = pd.merge(players,df7,how='inner',on='player_id')
df9 = df8.rename(columns={'次数':'grand_slams_count'})
四,Pandas解答:
import pandas as pd
def grand_slam_titles(players: pd.DataFrame, championships: pd.DataFrame) -> pd.DataFrame:
df1 = championships.groupby('Wimbledon')['Wimbledon'].count()
df2 = championships.groupby('Fr_open')['Fr_open'].count()
df3 = championships.groupby('US_open')['US_open'].count()
df4 = championships.groupby('Au_open')['Au_open'].count()
df5 = pd.concat([df1,df2,df3,df4])
df6 = df5.to_frame()
df6['player_id'] = df6.index
df6['次数'] = df6[0]
df7 = df6.groupby(['player_id'])['次数'].sum()
df8 = pd.merge(players,df7,how='inner',on='player_id')
df9 = df8.rename(columns={'次数':'grand_slams_count'})
return df9
五,验证
六,总结
标签:count,df6,Python,1783,player,数分,pd,open,id From: https://blog.csdn.net/qq_55006020/article/details/142392584
- Pandas中分组聚合的运用 API:groupby
- Pandas中纵向合并的运用 API:concat
- Pandas中series转frame对象的运用 API: to_frame
- Pandas中合并两个表的运用 API:merge
- Pandas中修改列名的运用 API: rename
- python中函数的运用