开窗函数
开窗函数对一组值进行操作,它不像普通聚合函数那样需要使用GROUP BY子句对数据进行分组,能够在同一行中同时返回基础行的列和聚合列
开窗函数的语法形式为:函数 + over(partition by <分组用列> order by <排序用列>),表示对数据集按照分组用列进行分区,并且并且对每个分区按照函数聚合计算,最终将计算结果按照排序用列排序后返回到该行。括号中的两个关键词partition by 和order by 可以只出现一个
排序开窗函数
① row_number() -- 相同值排名顺延,返回结果1、2、3、4
② rank() -- 相同结果排名相同,后续排名不连续,返回结果为 1、2、2、4
③ dense_rank() -- 相同结果排名相同,后续排名顺延,返回结果为 1、2、2、3
④ ntile(n) -- 分组排名,将数据分为n组并返回对应组号1、2......n
应用场景:
1、获取排名
2、根据排名获取top榜单
3、环比排名变化
1选品库,需要每个品牌销量top3的商品榜单
select yw_goods_id,brand_id,pay_amt_1m,pay_goods_num_1m,
row_number() over (partition by brand_id order by pay_amt_1m Desc) as rk
from dw.goods_item_extends_df
where audit_status in ("1","4")
根据 brand_id 分组,依据pay_amt_1m倒序,取rk<=3
2、选品库,品牌榜销量排行榜
select brand_id ,
max(brand_name),
sum(pay_amt_1m) as gmv,
sum(pay_goods_num_1m) as pay_goods_num,
rank() over (order by sum(pay_amt_1m) Desc ) as rk
from dw.goods_item_extends_df
GROUP BY brand_id
order by 可以是字段也可以是
3、品牌榜销量排行榜,每个品牌取销量 top3 商品
select *from
(select brand_id ,
max(brand_name),
sum(pay_amt_1m) as gmv,
sum(pay_goods_num_1m) as pay_goods_num,
rank() over (order by sum(pay_amt_1m) Desc ) as rk
from dw.goods_item_extends_df
GROUP BY brand_id
order by gmv Desc) t1
left join
(select *from
(select brand_id,yw_goods_id,pay_amt_1m,pay_goods_num_1m,
row_number() over (partition by brand_id order by pay_amt_1m Desc) as od
from dw.goods_item_extends_df
where audit_status in ("1","4")) tt
where tt.od <=3 )t2
on t1.brand_id=t2.brand_id
order by t1.rk
① sum() -- 分组求和
② count() -- 分组求总数
③ min() -- 分组求最小值
④ max() -- 分组求最大值
⑤ avg() --分组求均值
select goods_id, cate1_id, cate1_name,pay_goods_num,
sum(pay_goods_num) over(partition by cate1_id,cate1_name order by pay_goods_num desc) as sum1
,sum(pay_goods_num) over(partition by cate1_id,cate1_name ) as sum2
,sum(pay_goods_num) over(partition by Null ) as sum3
from app.kudu_dhu_replay_live_room_goods_merge_mii
where room_id='419629909341880320';
聚合1:
sum(pay_goods_num) over(partition by cate1_id,cate1_name order by pay_goods_num desc)
根据cate1_id,cate1_name分组,将分组中的pay_goods_num依次聚合
聚合2:
sum(pay_goods_num) over(partition by cate1_id,cate1_name )
根据cate1_id,cate1_name分组,将分组中的pay_goods_num整体聚合
聚合1与聚合2两者的差异:
order by 默认统计范围是 rows between unbounded preceding and current row
也就是取当前行数据与当前行之前的数据运算
sum(pay_goods_num) over(partition by cate1_id,cate1_name order by pay_goods_num desc rows between unbounded preceding and unbounded following) as sum
聚合3:
sum(pay_goods_num) over(partition by Null ) as sum3
聚合4:
sum(pay_goods_num) over() as sum3
不分组,将pay_goods_num整体聚合
select goods_id, cate1_id, cate1_name,pay_goods_num
, sum(pay_goods_num) over(partition by cate1_id,cate1_name order by pay_goods_num desc) as sum
, sum(pay_goods_num) over(partition by cate1_id,cate1_name ) as sum2
, sum(pay_goods_num) over(partition by Null ) as sum3
--分组求总数
, count(goods_id) over(partition by cate1_id,cate1_name order by pay_goods_num desc) as count2
, count(goods_id) over(partition by cate1_id,cate1_name ) as count2
, count(goods_id) over(partition by null) as count3
--分组求最小
, min(pay_gmv) over(partition by cate1_id,cate1_name order by pay_gmv desc) as min_pay_gmv1
, min(pay_gmv) over(partition by cate1_id,cate1_name ) as min_pay_gmv2
, min(pay_gmv) over(partition by null ) as min_pay_gmv3
--分组求最大
, max(pay_gmv) over(partition by cate1_id,cate1_name order by pay_gmv desc) as max_pay_gmv1
, max(pay_gmv) over(partition by cate1_id,cate1_name ) as max_pay_gmv2
, max(pay_gmv) over(partition by null) as max_pay_gmv3
--分组求平均
, avg(pay_gmv) over(partition by cate1_id,cate1_name order by pay_gmv desc) as avg1
, avg(pay_gmv) over(partition by cate1_id,cate1_name ) as avg2
, avg(pay_gmv) over(partition by null ) as avg3
from app.kudu_dhu_replay_live_room_goods_merge_mii
where room_id='419629909341880320';
应用场景:
1、数据占比
2、每一条数据较平均值比较
电虎-商品有效GMV占比
lag(字段名,n,0) -- 移位开窗函数,表示返回向上第n行指定字段对应数据。其中n代表向上偏移n行,0代表若偏移行数超出表范围则返回0也可以改成其他值,若不写则默认null
lead(字段名,n,0) -- 移位开窗函数,与lag()相反,表示返回向下第n行指定字段对应数据
电虎直播复盘付款GMV 计算时间范围是:本场次的开播时间到下一场次的开播时间(没有下一场取当前时间):
1.根据 live_begin_time 倒序,获取目标场次的排名rk、开播时间live_begin_time
2.判断目标场次的排名rk,若rk=1,取当前时间,
3.判断目标场次的排名rk,若rk>1,取当前时间,取 live_begin_time 倒序 排名rk=rk-1 的数据live_begin_time(下一场的开播时间)
移位开窗函数 Plain Text 复制代码 select *from ( select live_custom_room_info_id,live_begin_time,live_end_time, lag(live_begin_time,1,now()) over(partition by null order by live_begin_time Desc) as next_begin_time from ods.kudu_eshop_live_custom_room_info_rt where user_no='417857046838104064' ) t1 where t1.live_custom_room_info_id='419388740100018176'; 首尾开窗函数
First_Value(),用法是根据Partition By对数据进行分区,如果忽略Partition By ,那么默认整块数据一个区域,然后根据Order By 进行排序,取出第一个值。
last_value() -- 取分组内排序后,截止到当前行,最后一个值 首尾开窗函数
select goods_id, cate1_id, cate1_name,pay_goods_num,pay_gmv ,first_value(pay_gmv) over(partition by cate1_id,cate1_name order by pay_gmv desc ) as fisrt1 ,first_value(pay_gmv) over() as fisrt2 ,last_value(pay_gmv) over(partition by cate1_id,cate1_name order by pay_gmv desc ) as last1 ,last_value(pay_gmv) over() as last2 from app.kudu_dhu_replay_live_room_goods_merge_mii where room_id='419629909341880320'; 标签:goods,函数,pay,over,partition,开窗,cate1,id From: https://www.cnblogs.com/chenjulin/p/17510773.html