目录
1.rows和range关键字的区别
rows和range关键字,都可以用来定义window frame范围:
rows between 上限 and 下线
range between 上限 and 下线
区别:
(1)rows是根据分区数据排序之后,每一行的row_number确定每行关联的window frame范围
(2)range是根据分区数据排序之后,每一行的排序列的值确定每行关联的window frame范围
举例说明:
求4月份最近3个月(即2,3,4月)的销售额,给出的月份为1,2,4,5,6。
用rows会按行向上求两行,求得是(1,2,4)份的销售额之和
用range会根据行值去求,取(2,3,4)月份,因为没有3月份的销售额,所以只求(2,4)的销售额之和
2.例题:
2.1 求最近三个月(前两个月和当前月)的累计销量
思路:求最近三个月的累计销量,用窗口函数,窗口大小是前两个月与当前月,
前两个月:2 preceding
当前月:current row
#建表
CREATE TABLE tb_sales (
month int,
sales int
)
;
#加载数据
insert into tb_sales
value
(1,10),
(2,20),
(2,23),
(4,5),
(5,32),
(6,22);
用rows实现:
# 需求:计算最近3个月的累计销量
select
month, sales,
sum(sales) over(
order by month
rows between 2 preceding and current row
) total
from tb_sales
;
用range实现:
# 需求:计算最近3个月的累计销量
select
month, sales,
sum(sales) over(
order by month
range between 2 preceding and current row
) total
from tb_sales
;
通过实践可以看出,求最近3个月的累计销售额时,如果月份之间有间隔,应该用range
2.2 计算最近3个月(前两个月和当前月)的累计销量
#建表
CREATE TABLE tb_sales (
sale_date DATE,
quantity INT
);
#插入数据
-- [0, 1) * 100 ==> [0, 100) + 1 ==> [1, 101)
INSERT INTO tb_sales (sale_date, quantity)
VALUES
('2024-07-01', FLOOR(RAND() * 100) + 1),
('2024-07-02', FLOOR(RAND() * 100) + 1),
('2024-07-03', FLOOR(RAND() * 100) + 1),
('2024-07-04', FLOOR(RAND() * 100) + 1),
('2024-07-05', FLOOR(RAND() * 100) + 1),
('2024-07-08', FLOOR(RAND() * 100) + 1),
('2024-07-09', FLOOR(RAND() * 100) + 1),
('2024-07-10', FLOOR(RAND() * 100) + 1),
('2024-07-11', FLOOR(RAND() * 100) + 1),
('2024-07-12', FLOOR(RAND() * 100) + 1),
('2024-07-13', FLOOR(RAND() * 100) + 1),
('2024-07-14', FLOOR(RAND() * 100) + 1),
('2024-07-15', FLOOR(RAND() * 100) + 1),
('2024-07-18', FLOOR(RAND() * 100) + 1),
('2024-07-19', FLOOR(RAND() * 100) + 1),
('2024-07-20', FLOOR(RAND() * 100) + 1),
('2024-07-21', FLOOR(RAND() * 100) + 1),
('2024-07-22', FLOOR(RAND() * 100) + 1),
('2024-07-23', FLOOR(RAND() * 100) + 1),
('2024-07-24', FLOOR(RAND() * 100) + 1),
('2024-07-25', FLOOR(RAND() * 100) + 1),
('2024-07-28', FLOOR(RAND() * 100) + 1),
('2024-07-29', FLOOR(RAND() * 100) + 1),
('2024-07-30', FLOOR(RAND() * 100) + 1),
('2024-07-31', FLOOR(RAND() * 100) + 1);
;
注意:当窗口函数通过日期进行排序时,设置窗口大小时,mysql的写法和hive有所不同
mysql: range between interval 2 day preceding and current row
hive: range between 2 preceding and current row
mysql的写法:
# 目标: 求最近3天的销售量 (mysql的写法)
select
current_date(),
date_add(current_date(), interval 3 day)
;
select
sale_date, quantity,
sum(quantity) over(
order by sale_date
range between interval 2 day preceding and current row
) total
from db_1.tb_sales
;
hive的写法:
# hive的写法
select
sale_date, quantity,
sum(quantity) over(
order by sale_date
range between 2 preceding and current row
) total
from db_1.tb_sales
;
标签:RAND,rows,07,FLOOR,--,2024,range,100
From: https://blog.csdn.net/weixin_58468790/article/details/140876967