首页 > 数据库 >关于 SQL navigation function 的一点使用记录

关于 SQL navigation function 的一点使用记录

时间:2022-12-09 14:34:08浏览次数:42  
标签:function 10 34 TIMESTAMP 18 SQL 2016 navigation SELECT

来自于 SQL2011 对窗口函数的增强,新添加了叫导航函数的类别,进一步丰富了窗口的计算能力。

这将一次记录几个比较常用的导航函数他们包含,

  • FIRST_VALUE
  • LAST_VALUE
  • NTH_VALUE
  • HEAD
  • LAG

下面我将依次用例子记录使用方法。

 

FIRST_VALUE/LAST_VALUE/NTH_VALUE

首先我们创建一个测试用的数据集后续我都将使用该数据集进行测试。

WITH finishers AS
 (SELECT 'Sophia Liu' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
  UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
  UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
  UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
  UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
  UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
  UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
  UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')

导航函数归根到底还是窗口函数功能的一种增强,下面来看使用

WITH finishers AS
 (SELECT 'Sophia Liu' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
  UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
  UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
  UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
  UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
  UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
  UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
  UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
  FORMAT_TIMESTAMP('%X', finish_time) AS finish_time,
  division,
  FORMAT_TIMESTAMP('%X', fastest_time) AS fastest_time,
  TIMESTAMP_DIFF(finish_time, fastest_time, SECOND) AS delta_in_seconds
FROM (
  SELECT name,
  finish_time,
  division,
  FIRST_VALUE(finish_time)
    OVER (PARTITION BY division ORDER BY finish_time ASC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS fastest_time
  FROM finishers);

+-----------------+-------------+----------+--------------+------------------+
| name            | finish_time | division | fastest_time | delta_in_seconds |
+-----------------+-------------+----------+--------------+------------------+
| Carly Forte     | 03:08:58    | F25-29   | 03:08:58     | 0                |
| Sophia Liu      | 02:51:45    | F30-34   | 02:51:45     | 0                |
| Nikki Leith     | 02:59:01    | F30-34   | 02:51:45     | 436              |
| Jen Edwards     | 03:06:36    | F30-34   | 02:51:45     | 891              |
| Meghan Lederer  | 03:07:41    | F30-34   | 02:51:45     | 956              |
| Lauren Reasoner | 03:10:14    | F30-34   | 02:51:45     | 1109             |
| Lisa Stelzner   | 02:54:11    | F35-39   | 02:54:11     | 0                |
| Lauren Matthews | 03:01:17    | F35-39   | 02:54:11     | 426              |
| Desiree Berry   | 03:05:42    | F35-39   | 02:54:11     | 691              |
| Suzy Slane      | 03:06:24    | F35-39   | 02:54:11     | 733              |
+-----------------+-------------+----------+--------------+------------------+

其他的都很好理解我们就看 FIRST_VALUE 那一行发挥的作用。里面还有一些比较生疏的关键字,我把他们列出来解释一下:

OVER: OVER 和窗口函数一起使用, OVER 语句用于对数据进行分组,并对组内元素进行排序。窗口函数用于给组内生成序号,而导航函数可以直接对组内进行操作取值。

PARTITION BY: 指定分区键,可以用一个或多个键进行分区。PARTITION BY 将表按分区键分区,每个分区是一个窗口,窗口函数和导航函数就是这样作用于各个分区。

ORDER BY: 在分区中指明排序顺序,另外如果 ORDER BY 后面并未接 ROWS/RANGE 子句的话,一般会默认跟 range between unbounded preceding and current row

ROWS BETWEEN: 根据 ORDER BY 子句排序后,与范围关键字进行连用。比如连用下面的 UNBOUNDED PRECEDING 和 UNBOUNDED FOLLOWING 写作 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 表示选取的范围为当前行最前面行到当前行最后面行。

UNBOUNDED PRECEDING: 表示选取当前位置到窗口最前面。

UNBOUNDED FOLLOWING: 表示选取当前位置到窗口最后面。 

 

未在这里展示的但是很重要的关键字:

CURRENT ROW:表示当前行。

num PERCEDING:一般使用是 ROWS BETWEEN 10 PERCEDING AND 9 FOLLOWING 这样使用,表达的意思就是窗口当前行前 10 行到 当前窗口行后 9 行。

num FOLLOWING:一般使用是 ROWS BETWEEN 10 PERCEDING AND 9 FOLLOWING 这样使用,表达的意思就是窗口当前行前 10 行到 当前窗口行后 9 行。

RANGE BETWEEN:  可以看到上面有 ROWS BETWEEN ,其实 ROWS 为物理窗口,RANGE 是逻辑窗口。ROWS 表达的意思更像是我们平时想的那样跟 order by key 的 key 值无关。这里重点说下 range 的不同。range 和当前行值有关和 order by key 的 key 值有关且在该 key 上操作 range 范围。

这个可能比较难理解举个例子。

比如我们有这样一组结果

使用语句

select id,
    sum(id) over (order by id range between 1 preceding and 2 following) range_sum1
    sum(id) over (order by id rows between 1 preceding and 2 following) rows_sum1
from tmp
id             range sum1            rows sum1
1                   5                    5
1                   5                    11
3                   3                    16
6                   33                   21
6                   33                   25
6                   33                   27
7                   42                   30
8                   24                   24

我们从第二行就可以看出开始有结果的不同了。我来解释一下发生了什么。

Range: 上面我们说到了 range 是和 order by 字段逻辑相关的,这里这个字段就是 id 。

between 1 preceding and 2 following 在这里就是 id - 1 and id + 2, 所以这里的第二行是的到范围 1-1 and 1+2 = [0, 3] 这样一个范围内的值,注意是闭区间。所以他会包含前三条 range sum1 = 1 + 1 + 3 = 5

Rows: rows between 就和逻辑无关。所以我们只需要取单纯的上一条和下面两条的范围。这里就会包括 id [1, 1, 3, 6] rows sum1 = 1 + 1 + 3 + 6 = 11

 

介绍完了最主要的关键字其实再看导航函数在作什么就比较好理解了。

FIRST_VALUE 就是取分区排序完了各组的第一个元素。

LAST_VALUE 就是取分区排序完了各组最后一个元素。

NTH_VALUE 就是取分区排序完了各组第 N 个元素。这里单独补充一个 NTH_VALUE 的例子。

WITH finishers AS
 (SELECT 'Sophia Liu' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
  UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
  UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
  UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
  UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
  UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
  UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
  UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
  FORMAT_TIMESTAMP('%X', finish_time) AS finish_time,
  division,
  FORMAT_TIMESTAMP('%X', fastest_time) AS fastest_time,
  FORMAT_TIMESTAMP('%X', second_fastest) AS second_fastest
FROM (
  SELECT name,
  finish_time,
  division,finishers,
  FIRST_VALUE(finish_time)
    OVER w1 AS fastest_time,
  NTH_VALUE(finish_time, 2)
    OVER w1 as second_fastest
  FROM finishers
  WINDOW w1 AS (
    PARTITION BY division ORDER BY finish_time ASC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING));

+-----------------+-------------+----------+--------------+----------------+
| name            | finish_time | division | fastest_time | second_fastest |
+-----------------+-------------+----------+--------------+----------------+
| Carly Forte     | 03:08:58    | F25-29   | 03:08:58     | NULL           |
| Sophia Liu      | 02:51:45    | F30-34   | 02:51:45     | 02:59:01       |
| Nikki Leith     | 02:59:01    | F30-34   | 02:51:45     | 02:59:01       |
| Jen Edwards     | 03:06:36    | F30-34   | 02:51:45     | 02:59:01       |
| Meghan Lederer  | 03:07:41    | F30-34   | 02:51:45     | 02:59:01       |
| Lauren Reasoner | 03:10:14    | F30-34   | 02:51:45     | 02:59:01       |
| Lisa Stelzner   | 02:54:11    | F35-39   | 02:54:11     | 03:01:17       |
| Lauren Matthews | 03:01:17    | F35-39   | 02:54:11     | 03:01:17       |
| Desiree Berry   | 03:05:42    | F35-39   | 02:54:11     | 03:01:17       |
| Suzy Slane      | 03:06:24    | F35-39   | 02:54:11     | 03:01:17       |
+-----------------+-------------+----------+--------------+----------------+

 

LEAD/LAG

LEAD 函数用于返回排序后窗口里的后续行字段的值。这个描述还是有点抽象我们需要举个例子来看下

WITH finishers AS
 (SELECT 'Sophia Liu' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
  UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
  UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
  UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
  UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
  UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
  UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
  UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
  finish_time,
  division,
  LEAD(name)
    OVER (PARTITION BY division ORDER BY finish_time ASC) AS followed_by
FROM finishers;

+-----------------+-------------+----------+-----------------+
| name            | finish_time | division | followed_by     |
+-----------------+-------------+----------+-----------------+
| Carly Forte     | 03:08:58    | F25-29   | NULL            |
| Sophia Liu      | 02:51:45    | F30-34   | Nikki Leith     |
| Nikki Leith     | 02:59:01    | F30-34   | Jen Edwards     |
| Jen Edwards     | 03:06:36    | F30-34   | Meghan Lederer  |
| Meghan Lederer  | 03:07:41    | F30-34   | Lauren Reasoner |
| Lauren Reasoner | 03:10:14    | F30-34   | NULL            |
| Lisa Stelzner   | 02:54:11    | F35-39   | Lauren Matthews |
| Lauren Matthews | 03:01:17    | F35-39   | Desiree Berry   |
| Desiree Berry   | 03:05:42    | F35-39   | Suzy Slane      |
| Suzy Slane      | 03:06:24    | F35-39   | NULL            |
+-----------------+-------------+----------+-----------------+

还是重点来看开窗部分,我们获取开窗每一组的后面那个元素作为 followed_by 字段的值。因为开窗结果已经根据 finish_time 进行排序,所以我们看结果我们总是可以获得每个分区里的下一个人的名字作为 followed_by。第一组 f25-29 是因为该组只有这一个结果,所以取不到下一个值。

同时该函数和 HEAD/LAG 也支持 offset 的筛选,可以筛选往后或者往前后面几条,例如:

WITH finishers AS
 (SELECT 'Sophia Liu' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
  UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
  UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
  UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
  UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
  UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
  UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
  UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
  finish_time,
  division,
  LEAD(name, 2)
    OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_back
FROM finishers;

+-----------------+-------------+----------+------------------+
| name            | finish_time | division | two_runners_back |
+-----------------+-------------+----------+------------------+
| Carly Forte     | 03:08:58    | F25-29   | NULL             |
| Sophia Liu      | 02:51:45    | F30-34   | Jen Edwards      |
| Nikki Leith     | 02:59:01    | F30-34   | Meghan Lederer   |
| Jen Edwards     | 03:06:36    | F30-34   | Lauren Reasoner  |
| Meghan Lederer  | 03:07:41    | F30-34   | NULL             |
| Lauren Reasoner | 03:10:14    | F30-34   | NULL             |
| Lisa Stelzner   | 02:54:11    | F35-39   | Desiree Berry    |
| Lauren Matthews | 03:01:17    | F35-39   | Suzy Slane       |
| Desiree Berry   | 03:05:42    | F35-39   | NULL             |
| Suzy Slane      | 03:06:24    | F35-39   | NULL             |
+-----------------+-------------+----------+------------------+

这个例子展示了我们要取后面第二条而不是默认的第一条。同样没有的话显示 null。同时他们还有支持的第三个参数,如果没有找到 null 的话可以用默认值进行填充。

WITH finishers AS
 (SELECT 'Sophia Liu' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
  UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
  UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
  UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
  UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
  UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
  UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
  UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
  finish_time,
  division,
  LEAD(name, 2, 'Nobody')
    OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_back
FROM finishers;

+-----------------+-------------+----------+------------------+
| name            | finish_time | division | two_runners_back |
+-----------------+-------------+----------+------------------+
| Carly Forte     | 03:08:58    | F25-29   | Nobody           |
| Sophia Liu      | 02:51:45    | F30-34   | Jen Edwards      |
| Nikki Leith     | 02:59:01    | F30-34   | Meghan Lederer   |
| Jen Edwards     | 03:06:36    | F30-34   | Lauren Reasoner  |
| Meghan Lederer  | 03:07:41    | F30-34   | Nobody           |
| Lauren Reasoner | 03:10:14    | F30-34   | Nobody           |
| Lisa Stelzner   | 02:54:11    | F35-39   | Desiree Berry    |
| Lauren Matthews | 03:01:17    | F35-39   | Suzy Slane       |
| Desiree Berry   | 03:05:42    | F35-39   | Nobody           |
| Suzy Slane      | 03:06:24    | F35-39   | Nobody           |
+-----------------+-------------+----------+------------------+

LAG 同理这里就不赘述了,LAG 是寻找当前数据前面的数据,支持参数都一样,同样支持 offset 和 default value。

 

 

Reference:

https://stackoverflow.com/questions/30861919/what-is-rows-unbounded-preceding-used-for-in-teradata

https://cloud.google.com/bigquery/docs/reference/standard-sql/navigation_functions#first_value

https://blog.csdn.net/weixin_42307036/article/details/112381387

 

标签:function,10,34,TIMESTAMP,18,SQL,2016,navigation,SELECT
From: https://www.cnblogs.com/piperck/p/16968865.html

相关文章

  • Mysql 日期大小比较
    Mysql日期大小比较​​mysql时间参数年月日时分秒比较大小​​​​mysql时间参数年月日比较大小​​mysql时间参数年月日时分秒比较大小<selectid="checkTimeR......
  • Mysql开启ssl加密协议及Java客户端配置操作指南
    Mysql开启ssl加密协议及Java客户端配置操作指南​​Mysql配置​​​​验证Mysql开启SSL​​​​Java客户端操作​​​​生成证书密码​​​​配置数据库连接​​​​工具配......
  • springboot+mybatis+log4j日志sql输出和文件输出
    pom引入依赖:<dependency><!--排除spring-boot-starter-logging--><groupId>org.springframework.boot</groupId><artifactId>sprin......
  • mysql8数据类型汇总
    数据类型之enum在Mysql中的Enum数据类型补充:enum类型NOTNULL的时候,默认值是下标为1的值['男','女']的默认值是'男'......
  • ubuntu20.04下安装mysql5.7后,允许远程登录
    1、在mysql的配置文件中设置bind-address=0.0.0.0 2、在databasemysql中设置root用户的host为% 安装deb后使用apt-cachepolicymysql-server查看......
  • mac Django 连接mysql
    目录macdjango2.2正确连接mysql方式问题描述macdjango2.2正确连接mysql方式macDjango电脑连接mysql时候会出现一些错误,因为版本问题。以下操作Django2.2.22亲测有效......
  • 基于Python+Django+Vue+MYSQL的社团管理系统
    OverridetheentrypointofanimageIntroducedinGitLabandGitLabRunner9.4.Readmoreaboutthe extendedconfigurationoptions.Beforeexplainingtheav......
  • 一键部署MySQL8+keepalived双主热备高可用
    概述本次的文章会格外的长,网上大多的配置流程已经不可使用,本篇文章可以称为保姆级教程,而且通过shell脚本大大减少了部署mysql主从,双主的工作量。如上图,VIP地址为192.168......
  • 【MySQL】将查询结果导出到一个文件
    (1)SQL示例如下,要将如下的SQL查询结果导出到一个txt文件,便于研发做结果内容的核对selectps_partkey,sum(ps_supplycost*ps_availqty)asvaluefromparts......
  • MySQL主从复制与读写分离
    一、案例概述在企业应用中,成熟的业务通常数据量都比较大单台mysql在安全性、高可用性和高并发方面都无法满足实际的需求配置多台主从数据库服务器以实现读写分离二、案例......