首页 > 数据库 >MySQL窗口函数介绍详情

MySQL窗口函数介绍详情

时间:2022-12-05 18:00:10浏览次数:57  
标签:... 窗口 name pay over MySQL 详情 time user

一、窗口函数基本点

(一)定义

窗口的概念非常重要,它可以理解为记录集合,窗口函数也就是在满足某种条件的记录集 合上执行的特殊函数。对于每条记录都要在此窗口内执行函数,窗口大小都是固定的,这种属于静态窗口;不同的记录对应着不同的窗口,这种动态变化的窗口叫滑动窗口。

(二)基本用法

函数名([expr]) over(partition by <要分列的组> order by <要排序的列> rows between <数据范围>)

sum(...A...) over(partition by ...B... order by ...C... rows between ...D1... and ...D2...)

avg(...A...) over(partition by ...B... order by ...C... rows between ...D1... and ...D2...)

A: 需要被加工的字段名称 B: 分组的字段名称 C: 排序的字段名称 D: 计算的行数范围

窗口从句的说明

rows between 2 preceding and current row # 取当前行和前面两行
rows between unbounded preceding and current row # 包括本行和之前所有的行
rows between current row and unbounded following # 包括本行和之后所有的行
rows between 3 preceding and current row # 包括本行和前面三行
rows between 3 preceding and 1 following # 从前面三行和下面一行,总共五行
# 当缺少窗口从句条件,窗口规范默认是rows between unbounded preceding and current row.
# 当order by和窗口从句都缺失, 窗口规范默认是 rows between unbounded preceding and unbounded following

(三)分类

二、窗口函数需求实现

数据说明

数据表格部分明细

(一)累计计算函数:sum(...) avg(...) max(...) min(...)

1、累计求和:sum(...) over(...)

(1)累计求和-1年数据求和:需求- 查询出2019年每月的支付总额和当年累积支付总额

SELECT
	a.MONTH,
	sum_pay,
	sum( a.sum_pay ) over ( ORDER BY a.MONTH)   #ORDER BY MONTH
FROM
	( SELECT MONTH ( pay_time ) MONTH, sum( pay_amount ) sum_pay FROM trade_data 
         WHERE YEAR ( pay_time ) = 2019 GROUP BY MONTH) a
#子查询的聚合函数重命名
#group by分组:统计每个月支付总额
# 当缺少窗口从句条件,窗口规范默认是rows between unbounded preceding and current row.

(2)累计求和-多年数据求和:需求-查询出2018-2019年每月的支付总额和当年累积支付总额

SELECT
	a.YEAR,
	a.MONTH,
	pay_amount,
	sum( a.pay_amount ) over ( PARTITION BY a.YEAR ORDER BY a.MONTH ) 
FROM
	(
	SELECT YEAR
		( pay_time ) YEAR,
		MONTH ( pay_time ) MONTH,
		sum( pay_amount ) pay_amount 
	FROM
		trade_data 
	WHERE
		YEAR ( pay_time ) = 2018 
		OR YEAR ( pay_time ) = 2019 
	GROUP BY
		YEAR,
MONTH 
	) a
#where里不能用别名,group by里可用别名

2、移动平均:avg(...) over(...)

需求:查询出2019年每个月的近三月移动平均支付金额

SELECT
	a.MONTH,
	a.pay_amount,
	avg( pay_amount ) over ( ORDER BY a.MONTH rows BETWEEN 2 preceding AND current ROW ) 
FROM
	(
	SELECT YEAR
		( pay_time ) YEAR,
		MONTH ( pay_time ) MONTH,
		sum( pay_amount ) pay_amount 
	FROM
		trade_data 
	WHERE
		YEAR ( pay_time ) = 2019 
	GROUP BY
MONTH 
	) a

3、最大最小值:min(...) over / max(...)over()

需求:查询出每四个月的最大月总支付金额

SELECT
	a.MONTH,
	a.pay_amount,
	max( a.pay_amount ) over ( ORDER BY a.MONTH rows BETWEEN 3 preceding AND current ROW ) max_amount 
FROM
	( SELECT substr( pay_time, 1, 7 ) MONTH, sum( pay_amount ) pay_amount FROM trade_data GROUP BY MONTH ) a
#substr

4、percent_rank( )over(...)累计百分比

计算分区或结果集中行的百分位数排名

需求:查询年份交易额的累计百分比,按照交易额升序排序

SELECT
	user_name,
	YEAR ( pay_time ),
	sum( pay_amount ) pay_amount,
	round( percent_rank ( ) over ( PARTITION BY YEAR ( pay_time ) ORDER BY sum( pay_amount ) ), 2 ) percent_rank1 
FROM
	trade_data 
GROUP BY
	user_name

需求:30%的累计和

SELECT
	a.time1,
	a.percent_rank1,
	a.pay_amount,
	sum( a.pay_amount ) over ( rows BETWEEN unbounded preceding AND current ROW ) AS sum_sum 
FROM
	(
	SELECT
		SUBSTR( pay_time, 1, 7 ) time1,
		sum( pay_amount ) pay_amount,
		round( percent_rank ( ) over ( ORDER BY sum( pay_amount ) ), 2 ) percent_rank1 
	FROM
		trade_data 
	GROUP BY
		time1 
	) a 
WHERE
	percent_rank1 = 0.3
#注意:percent_rank1 的数值需要在数据结果表中有,不是特别灵活

5、cume_dist( ) over(...)累计分布值

CUME_DIST()是一个窗口函数,它返回一组值中值的累积分布。它表示值小于或等于行的值除以总行数的行数。

需求:查询2018年的累计分布值,按照交易额降序排序

SELECT
	substr( pay_time, 1, 7 ) MONTH,
	sum( pay_amount ) sum_amount,
	row_number ( ) over ( ORDER BY sum( pay_amount ) ) rk,
	round( cume_dist ( ) over ( ORDER BY sum( pay_amount ) ), 2 ) cume_dis 
FROM
	trade_data 
WHERE
	YEAR ( pay_time ) = '2018' 
GROUP BY
MONTH

(二)排序:row_number() over (...) rank() over(...) dense_rank() over(...) ntile(n)over(...)

row_number() over (...) rank() over(...) dense_rank() over(...)

row_number() over(partition by ...A... order by ...B... )

rank() over(partition by ...A... order by ...B... )

dense_rank() over(partition by ...A... order by ...B... )

A:分组的字段名称 B:排序的字段名称

注意:row_number()、rank() 和dense_rank()紧邻的括号内是不加任何字段名称的

row_number:它会为查询出来的每一行记录生成一个序号,依次排序且不会重复

dense_rank函数在生成序号时是连续的

rank函数生成的序号有可能不连续

需求:2020年1月,购买商品品类数的用户排名

select
user_name,
	count( DISTINCT goods_category ) count_sum,
	row_number ( ) over ( ORDER BY count( DISTINCT goods_category ) ) rn1,
	dense_rank ( ) over ( ORDER BY count( DISTINCT goods_category ) ) rn2,
	rank ( ) over ( ORDER BY count( DISTINCT goods_category ) ) rn3 
FROM
	trade_data 
WHERE
	SUBSTR( pay_time, 1, 7 ) = "2020-01" 
GROUP BY
	user_name

#count(Distinct XX)
#Substr日期加上引号;

(三)取值函数

ntile(n) over (...) ;first_value(...) ;last_value(...) over(...);nth_value(XX,n)

1、ntile(n) over (...)

ntile(n) over(partition by ..A..order by..B..) n:切分的片数 A:分组的字段名称 B:排序的字段名称

ntile(n),用于将分组数据按照顺序切分成n片,返回当前切片值

NTILE不支持ROWS BETWEEN

需求:查询出将2020年2月的支付用户,按照支付金额分成5组后的结果

SELECT
	a.user_name,
	a.sum,
	ntile ( 5 ) over ( ORDER BY a.sum DESC ) Level
FROM
	(
	SELECT
		user_name,
		sum( pay_amount ) sum 
	FROM
		trade_data 
	WHERE
		SUBSTR( pay_time, 1, 7 ) = "2020-02" 
	GROUP BY
		user_name 
	ORDER BY
		sum 
	) a

需求:查询出2020年支付金额排名前30%的所有用户

SELECT
	b.user_name,
	b.sum,
	b.LEVEL 
FROM
	(
	SELECT
		a.user_name,
		a.sum,
		ntile ( 10 ) over ( ORDER BY a.sum DESC ) LEVEL 
	FROM
		( SELECT user_name, sum( pay_amount ) sum FROM trade_data 
                 WHERE YEAR ( pay_time ) = "2020" GROUP BY user_name ORDER BY sum ) a 
	) b 
WHERE
	b.LEVEL IN ( 1, 2, 3 )

2、first_value(...) over(...)返回分组内第1个数值

需求:查询每年支付金额最少的客户

SELECT
	user_name,
	YEAR ( pay_time ),
	sum( pay_amount ) sum_amount,
	first_value ( user_name ) over ( PARTITION BY YEAR ( pay_time ) ORDER BY sum( pay_amount ) ) Fvalue 
FROM
	trade_data 
GROUP BY
	user_name

3、last_value(...) over(...)返回分组内最后1个数值

需求:查询每年支付金额最多的客户

SELECT
	user_name,
	YEAR ( pay_time ),
	sum( pay_amount ) sum_amount,
	last_value ( user_name ) over ( PARTITION BY YEAR ( pay_time ) ORDER BY sum( pay_amount ) RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) lvalue 
FROM
	trade_data 
GROUP BY
	user_name

4、nth_value(XX,n) over(...)返回分组内第N行

需求:查询每年支付金额第2多的客户

SELECT
	user_name,
	YEAR ( pay_time ),
	sum( pay_amount ) sum_amount,
	nth_value ( user_name, 2 ) over ( PARTITION BY YEAR ( pay_time ) ORDER BY sum( pay_amount ) RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) lvalue 
FROM
	trade_data 
GROUP BY
	user_name

(四)偏移分析函数:lag(...) over(...) lead(...)over(...)

Lag和Lead函数可以在同一次查询中取出同一字段的前N行的数据(Lag)后N行的数据(Lead) 作为独立的列

在实际应用当中,若要用到取今天和昨天的某字段差值时,Lag和Lead函数的应用就显得尤为重要: lag(exp_str,offset,defval) over(partion by ......order by ......)

lead(exp_str,offset,defval) over(partion by ......order by ......)

需求: 查询出King和West的时间偏移(前N行)

SELECT
	user_name,
	date( pay_time ),
	lag ( date( pay_time ), 1, date( pay_time ) ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) lag1,
	lag ( date( pay_time ) ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) lag1,
	lag ( date( pay_time ), 2, date( pay_time ) ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) lag2,
	lag ( date( pay_time ), 2 ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) lag2 
FROM
	trade_data 
WHERE
	user_name IN ( "King", "West" );

需求: King和West的时间偏移(后N行)

SELECT
	user_name,
	date( pay_time ),
	lead ( date( pay_time ), 1, date( pay_time ) ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) lead1,
	lead ( date( pay_time ) ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) lead1,
	lead ( date( pay_time ), 2, date( pay_time ) ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) lead2,
	lead ( date( pay_time ), 2 ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) lead2 
FROM
	trade_data 
WHERE
	user_name IN ( "King", "West" );

需求:查询出支付时间间隔超过100天的用户数

SELECT
	count( DISTINCT a.user_name ) 
FROM
	(
	SELECT
		user_name,
		date( pay_time ) time1,
	lag ( date( pay_time ), 1, date( pay_time ) ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) time2 
	FROM
		trade_data 
	) a 
WHERE
	a.time2 IS NOT NULL 
	AND DATEDIFF( a.time1, a.time2 ) > 100;

运行结果:180

需求:日期按照用户倒序排序

SELECT
       user_name,
       date( pay_time ) time1,
       lag ( date( pay_time ), 1, date( pay_time ) ) over ( PARTITION BY user_name ORDER BY date( pay_time ) desc) time2 
FROM
    trade_data 
#Time1与time2同时倒序

需求:查询出每年支付时间间隔最长的用户

第1步:把时间偏移

SELECT
	user_name,
	date( pay_time ) t1,
	lead ( date( pay_time ), 1 ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) t2 
FROM
	trade_data

第2步:时间差

SELECT
	count( DISTINCT user_name ) 
FROM
	(
	SELECT
		user_name,
		date( pay_time ) t1,
		lead ( date( pay_time ), 1 ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) t2 
	FROM
		trade_data 
	) a 
WHERE
	DATEDIFF( t2, t1 ) > 100
        and t2 is not null

需求:查询出每年支付时间间隔最长的用户

select year(b.pay_time),b.user_name,b.GAP
from
(
SELECT
	a.pay_time,year(a.pay_time),a.user_name,
        DATEDIFF( t2, t1 ) GAP,
        rank() over (PARTITION by year(a.pay_time) order by DATEDIFF( t2, t1 ) desc) as rn
FROM
	(
	SELECT
		user_name,pay_time,
		date( pay_time ) t1,
		lead ( date( pay_time ), 1 ) over ( PARTITION BY user_name ORDER BY date( pay_time ) ) t2 
	FROM
		trade_data 
	) a
	where a.t2 is not null
)b	
	where rn=1
#XX is not null
#where 不用同一个层级语句的别名
#order根据partition by的字段内部分组与排序

三、窗口函数多样化

(一)与case when 结合使用

(二)与随机函数rand结合使用

 

转载自:https://zhuanlan.zhihu.com/p/509380543

标签:...,窗口,name,pay,over,MySQL,详情,time,user
From: https://www.cnblogs.com/zhuozige/p/16953037.html

相关文章

  • 1.1初始化配置MYSQL服务器
    /*初始配置服务器,需先根据操作系统下载MicrosoftAccess2010数据库引擎AccessDatabaseEngine_X64.exe(https://www.microsoft.com/zh-cn/download/confirmation.aspx?id......
  • 【转载】详解mysql插入数据后返回自增ID的七种方法_java
    引言mysql和oracle插入的时候有一个很大的区别是:oracle支持序列做id;mysql本身有一个列可以做自增长字段。mysql在插入一条数据后,如何能获得到这个自增id的......
  • MySQL的零拷贝技术
    1.需要了解Buffer与cache的区别Bbuffer与Cache非常类似,因为它们都用于存储数据数据,被应用层读取字节数据。在很多场合它们有着相同的概念:首先从翻译上,Buffer应该翻译......
  • mysql8 postgresql 递归查询一条SQL支持帝国查询
    创建表点击查看代码createtabletb(idvarchar(3),pidvarchar(3),namevarchar(10));insertintotbvalues('002',0,'浙江省');insertintotbvalues......
  • mysql中游标的运用
    【背景】  最近项目中应对实际情况的需求,又修改了主业务,导致表结构设计不合理,之后又修改了表结构,但是表结构修改之后历史数据需要升级,来适应新的表结构的一系列增删改......
  • docker 安装mysql8
    1.环境准备1.centos7  2.安装包下载1.百度网盘下载链接:https://pan.baidu.com/s/1AfRUkiXD9Oee1hA5az9SXg提取码:gs7b 2.版本说明此安装包为dockermysql8......
  • MySQL
    一、ACID1.1原子性(Atomicity)原子性是指一个事务是一个不可分割的工作单位,其中的操作要么都做,要么都不做,如果事务中一个sql语句执行失败,则已执行的语句也必须回滚,数......
  • mysql的varchar最大能存储多少个字符?
    mysql中varchar能够存储可变长度的字符串。过去我做的诸多业务中,一般存储短字符串的需求,都会使用varchar类型,并且定义长度为255,也就是varchar(255)。不过为了探究varchar......
  • mysql8 添加用户,赋予表操作权限
    1.mysql8修改了安全规则,不能像mysql5.7一次性创建用户并授权,需要分批创建1.创建用户createuser'username'@'host'identifiedby'password'创建用户说明:1.use......
  • CodeSmith部署安装以及编写MySQL模板的时候无法获取description
    最近需要使用VS2015开发新项目,由于考虑到后期的扩展性,打算使用EF的codefirst模式来编写,但是前期又设计好了数据库结构,不想大动干戈的去重新写代码,网上搜索了说用“EntityF......