首页 > 其他分享 >有序行转列

有序行转列

时间:2024-09-03 22:50:57浏览次数:4  
标签:customer 00 01 09 2024 转列 有序 id

一、基础数据

有配送订单表记录骑手配送的物品类型、送达时间、顾客id、配送举例及配送费。

+-----------+-----------+-------------+----------------------+--------------+-----------+----------+
| rider_id  | order_id  | goods_type  |    delivery_time     | customer_id  | distance  | payment  |
+-----------+-----------+-------------+----------------------+--------------+-----------+----------+
| r001      | 0001      | 食品          | 2024-01-01 08:30:00  | c001         | 8.05      | 7.50     |
| r002      | 0002      | 酒水          | 2024-01-01 08:43:00  | c003         | 3.01      | 13.00    |
| r003      | 0003      | 食品          | 2024-01-01 09:15:00  | c004         | 4.12      | 3.50     |
| r002      | 0004      | 文件          | 2024-01-01 09:21:00  | c005         | 10.98     | 15.00    |
| r001      | 0005      | 食品          | 2024-01-01 09:30:00  | c007         | 2.32      | 5.00     |
| r002      | 0006      | 食品          | 2024-01-01 09:47:00  | c002         | 0.78      | 5.00     |
| r003      | 0007      | 文件          | 2024-01-01 10:21:00  | c010         | 8.11      | 8.00     |
| r001      | 0008      | 酒水          | 2024-01-01 11:56:00  | c023         | 4.35      | 15.00    |
| r002      | 0009      | 电子产品        | 2024-01-01 12:30:00  | c031         | 5.05      | 9.50     |
| r002      | 0010      | 文件          | 2024-01-01 13:30:00  | c001         | 6.05      | 7.00     |
+-----------+-----------+-------------+----------------------+--------------+-----------+----------+

二、函数介绍

1.transform介绍

transform(expr, func) - 使用函数对数组中的元素进行转换。

Examples:

> SELECT transform(array(1, 2, 3), x -> x + 1);
 [2,3,4]
> SELECT transform(array(1, 2, 3), (x, i) -> x + i);
 [1,3,5]

Since: 2.4.0

2.其他函数介绍

三、有序行转列

根据配送订单记录表,查询出骑手id,顾客id列表,要求顾客id列表中的顺序按照送达时间早晚排序。

分析

这里要求按照订单送达时间,对顾客id进行排序。直接考虑是使用开窗函数,根据时间进行排序(这种方法不可行,原因在于collect_list和collect_set那句提示“该函数是非确定性的,因为收集结果的顺序取决于行的顺序,这在经过shuffle之后可能是不确定的”。collect_list函数很难测试,但是collect_set因为有去重操作,所以比较好测试。另外一种解法是将时间和用户id拼接在一起,时间在前,用户id在后,这样对整个字符串拼接为数组,进行排序后再拆分,如此得到的数据能确定保证按照顺序完成。

1.不确定解法(不建议使用)

collect_list执行SQL

select rider_id,
       delivery_time,
       customer_id,
       collect_list(customer_id)over(partition by rider_id order by delivery_time) as customer_id_list
from t_delivery_orders

collect_list执行结果

+-----------+----------------------+--------------+---------------------------------------+
| rider_id  |    delivery_time     | customer_id  |           customer_id_list            |
+-----------+----------------------+--------------+---------------------------------------+
| r001      | 2024-01-01 08:30:00  | c001         | ["c001"]                              |
| r001      | 2024-01-01 09:30:00  | c007         | ["c001","c007"]                       |
| r001      | 2024-01-01 11:56:00  | c023         | ["c001","c007","c023"]                |
| r002      | 2024-01-01 08:43:00  | c003         | ["c003"]                              |
| r002      | 2024-01-01 09:21:00  | c005         | ["c003","c005"]                       |
| r002      | 2024-01-01 09:47:00  | c002         | ["c003","c005","c002"]                |
| r002      | 2024-01-01 12:30:00  | c031         | ["c003","c005","c002","c031"]         |
| r002      | 2024-01-01 13:30:00  | c001         | ["c003","c005","c002","c031","c001"]  |
| r003      | 2024-01-01 09:15:00  | c004         | ["c004"]                              |
| r003      | 2024-01-01 10:21:00  | c010         | ["c004","c010"]                       |
+-----------+----------------------+--------------+---------------------------------------+

collect_set执行SQL

select rider_id,
       delivery_time,
       customer_id,
       collect_set(customer_id)over(partition by rider_id order by delivery_time) as customer_id_list
from t_delivery_orders

collect_set执行结果

+-----------+----------------------+--------------+---------------------------------------+
| rider_id  |    delivery_time     | customer_id  |           customer_id_list            |
+-----------+----------------------+--------------+---------------------------------------+
| r001      | 2024-01-01 08:30:00  | c001         | ["c001"]                              |
| r001      | 2024-01-01 09:30:00  | c007         | ["c001","c007"]                       |
| r001      | 2024-01-01 11:56:00  | c023         | ["c001","c007","c023"]                |
| r002      | 2024-01-01 08:43:00  | c003         | ["c003"]                              |
| r002      | 2024-01-01 09:21:00  | c005         | ["c005","c003"]                       |
| r002      | 2024-01-01 09:47:00  | c002         | ["c005","c003","c002"]                |
| r002      | 2024-01-01 12:30:00  | c031         | ["c005","c003","c002","c031"]         |
| r002      | 2024-01-01 13:30:00  | c001         | ["c001","c005","c003","c002","c031"]  |
| r003      | 2024-01-01 09:15:00  | c004         | ["c004"]                              |
| r003      | 2024-01-01 10:21:00  | c010         | ["c010","c004"]                       |
+-----------+----------------------+--------------+---------------------------------------+

我们可以观察最后一行结果,发现顺序是不一样。 可以看到collect_list结果是正确的(这里属于巧合,但是大多数情况都是对的),而collect_set的结果是错误的。两个函数中均有提示,不保证结果顺序,从严谨出发,这两个函数均不可因为开窗函数来保证有序。
如果忽略该问题,使用collect_list进行行转列开窗,大部分情况得到的结果是正确的。我们只需要取出每个骑手最后一行数据即可。

执行SQL

select
    rider_id,
    concat_ws(',',customer_id_list) as customer_id_list
from
(select rider_id,
       delivery_time,
       customer_id,
       collect_list(customer_id)over(partition by rider_id order by delivery_time) as customer_id_list,
       row_number() over (partition by rider_id order by delivery_time desc) as rn
from t_delivery_orders
    ) t
where rn = 1

执行结果

+-----------+---------------------------+
| rider_id  |     customer_id_list      |
+-----------+---------------------------+
| r001      | c001,c007,c023            |
| r002      | c003,c005,c002,c031,c001  |
| r003      | c004,c010                 |
+-----------+---------------------------+

2.拼接排序后拆分

2.1先把时间和顾客id进行拼接,拼接后进行行转列

执行SQL

select rider_id,
       delivery_time,
       customer_id,
       concat(delivery_time, customer_id) as time_customer
from t_delivery_orders

执行结果

+-----------+----------------------+--------------+--------------------------+
| rider_id  |    delivery_time     | customer_id  |      time_customer       |
+-----------+----------------------+--------------+--------------------------+
| r001      | 2024-01-01 08:30:00  | c001         | 2024-01-01 08:30:00c001  |
| r002      | 2024-01-01 08:43:00  | c003         | 2024-01-01 08:43:00c003  |
| r003      | 2024-01-01 09:15:00  | c004         | 2024-01-01 09:15:00c004  |
| r002      | 2024-01-01 09:21:00  | c005         | 2024-01-01 09:21:00c005  |
| r001      | 2024-01-01 09:30:00  | c007         | 2024-01-01 09:30:00c007  |
| r002      | 2024-01-01 09:47:00  | c002         | 2024-01-01 09:47:00c002  |
| r003      | 2024-01-01 10:21:00  | c010         | 2024-01-01 10:21:00c010  |
| r001      | 2024-01-01 11:56:00  | c023         | 2024-01-01 11:56:00c023  |
| r002      | 2024-01-01 12:30:00  | c031         | 2024-01-01 12:30:00c031  |
| r002      | 2024-01-01 13:30:00  | c001         | 2024-01-01 13:30:00c001  |
+-----------+----------------------+--------------+--------------------------+
2.2 对time_customer 按照骑手id分组行转列,并使用sort_array排序

执行SQL

select rider_id,
       sort_array(collect_list(time_customer)) as sorted_time_customer
from (select rider_id,
             delivery_time,
             customer_id,
             concat(delivery_time, customer_id) as time_customer
      from t_delivery_orders) t
group by rider_id

执行结果

+-----------+----------------------------------------------------+
| rider_id  |                sorted_time_customer                |
+-----------+----------------------------------------------------+
| r001      | ["2024-01-01 08:30:00c001","2024-01-01 09:30:00c007","2024-01-01 11:56:00c023"] |
| r002      | ["2024-01-01 08:43:00c003","2024-01-01 09:21:00c005","2024-01-01 09:47:00c002","2024-01-01 12:30:00c031","2024-01-01 13:30:00c001"] |
| r003      | ["2024-01-01 09:15:00c004","2024-01-01 10:21:00c010"] |
+-----------+----------------------------------------------------+
2.3 去掉时间部分后,转换成字符串

执行SQL

select rider_id,
       concat_ws(',', transform(sort_array(collect_list(time_customer)), x->substr(x, 20))) as customer_list
from (select rider_id,
             delivery_time,
             customer_id,
             concat(delivery_time, customer_id) as time_customer
      from t_delivery_orders) t
group by rider_id

执行结果

+-----------+---------------------------+
| rider_id  |       customer_list       |
+-----------+---------------------------+
| r001      | c001,c007,c023            |
| r002      | c003,c005,c002,c031,c001  |
| r003      | c004,c010                 |
+-----------+---------------------------+

四、数据准备

--建表语句
CREATE TABLE IF NOT EXISTS t_delivery_orders
(
    rider_id      string,         -- 骑手ID
    order_id      string,         -- 订单ID
    goods_type    STRING,         -- 物品类型
    delivery_time STRING,         -- 送达时间
    customer_id   STRING,         -- 客户id
    distance      decimal(10, 2), -- 配送距离,单位可以是公里或英里
    payment       decimal(10, 2)  -- 支付金额,骑手的配送费用
)
    COMMENT '骑手配送订单表';
--插入数据
INSERT INTO t_delivery_orders VALUES
('r001', '0001', '食品', '2024-01-01 08:30:00','c001',8.05,7.50),
('r002', '0002', '酒水', '2024-01-01 08:43:00','c003',3.01,13.00),
('r003', '0003', '食品', '2024-01-01 09:15:00','c004',4.12,3.50),
('r002', '0004', '文件', '2024-01-01 09:21:00','c005',10.98,15.00),
('r001', '0005', '食品', '2024-01-01 09:30:00','c007',2.32,5.00),
('r002', '0006', '食品', '2024-01-01 09:47:00','c002',0.78,5.00),
('r003', '0007', '文件', '2024-01-01 10:21:00','c010',8.11,8.00),
('r001', '0008', '酒水', '2024-01-01 11:56:00','c023',4.35,15.00),
('r002', '0009', '电子产品', '2024-01-01 12:30:00','c031',5.05,9.50),
('r002', '0010', '文件', '2024-01-01 13:30:00','c001',6.05,7.00);

相关推荐

  1. 行转列-collect_list,collect_set进行简单行转列
  2. 行转列-使用transform进行有序行转列
  3. 行转列-使用transform进行有序行转列-多列一一对应
  4. 行转列-多行转多列(竖表转横表)
  5. 列转行-多列转多行(横表变竖表)
  6. 列转行-lateral view explode列转行
  7. 列转行-explode_outer和lateral view outer
  8. 列转行-posexplode多列对应转行
  9. 09.列转行-lateral view outer posexplode及posexplode_outer多列对应转行

标签:customer,00,01,09,2024,转列,有序,id
From: https://blog.csdn.net/thenowaiting/article/details/141873979

相关文章

  • 代码随想录算法训练营|Day01 LeetCode 704.二分查找,27.移除元素,977.有序数组的平方
    数组理论基础数组是存放在连续空间上的相同类型数据的集合数组的元素是不能删的,只能覆盖704.二分查找LeetCode:704.有序数组的平方classSolution{public:intsearch(vector<int>&nums,inttarget){intlength=nums.size();inti=0......
  • 行转列,值转换成列
      1.值转换成列操作。值转列操作:[1777题库] 表:Products+-------------+---------+|ColumnName|Type|+-------------+---------+|product_id|int||store|enum||price|int|+-------------+---------+在SQL中,(prod......
  • 数据结构与算法——符号表API设计及有序符号表设计
    Java学习手册+面试指南:https://javaxiaobear.cn符号表最主要的目的就是将一个键和一个值联系起来,符号表能够将存储的数据元素是一个键和一个值共同组成的键值对数据,我们可以根据键来查找对应的值。符号表中,键具有唯一性。符号表的应用:应用查找目的键值字典找出单词的释义单词释义图......
  • Leetcode——1.合并有序数组
    给你两个按非递减顺序排列的整数数组nums1_和nums2,另有两个整数m和n,分别表示nums1和nums2中的元素数目。请你合并nums2_到nums1中,使合并后的数组同样按非递减顺序排列。注意:最终,合并后数组不应由函数返回,而是存储在数组nums1中。为了应对这种情况,nums1的初......
  • 个人信息保护专业人员(CCRC-PIPP)助力企业数据要素合规有序流通
    在当今数字化时代,个人信息作为企业不可或缺的数据资产,其合规管理与高效流通变得尤为重要。组织须采纳必要措施保护用户信息,同时确保遵循相关法规要求。为此,中国网络安全审查认证和市场监管大数据中心推出了个人信息保护专业人员(CCRC-PIPP)能力认证项目,旨在培养更多专业精英,支......
  • 代码随想录算法训练营,29日 | 704. 二分查找,27. 移除元素,977.有序数组的平方,209.长度最
    数组基础文档讲解︰代码随想录(programmercarl.com)1.连续空间、相同类型元素2.元素只能覆盖3.二维数组的地址连续吗(C++连续,Java不连续)704.二分查找题目链接:704.二分查找文档讲解︰代码随想录(programmercarl.com)视频讲解︰二分查找日期:2024-08-29思路:第一反应是想到二分查......
  • jeecg 月份产值行转列(备份)
    @RequestMapping(params="list")publicModelAndViewlist(HttpServletRequestrequest){List<Map<String,Object>>list=newArrayList<>();for(inti=1;i<13;i++){Map<String,Object>m=......
  • 版本控制与Scratch:创意编程的有序之旅
    版本控制与Scratch:创意编程的有序之旅在编程的世界里,版本控制是一种记录和协调代码变更的系统,它对于团队协作和项目管理至关重要。Scratch,作为一款面向儿童和青少年的图形化编程工具,其设计初衷是简化编程学习过程,让编程变得生动有趣。本文将探讨Scratch的编程环境是否具备......
  • 代码随想录算法训练营第一天 | 数组part01:数组理论基础,704. 二分查找,27. 移除元素 97
    数组理论基础数组是存放在连续内存空间上的相同类型数据的集合数组徐璈注意的是:数组的下标都是从0开始的数组内存空间是的地址是连续的正因为舒适的内存空间是连续的,所以在删除和增添元素的时候,需要移动其他元素的地址。在c++中,vector的底层实现是array,严格来说,vector是容......
  • 从混乱到有序:10款建筑项目管理软件推荐
    国内外主流的10款建筑企业项目管理系统对比:PingCode、Worktile、广联达、泛普软件、建文软件、Asana、Trello、Basecamp、Jira、Monday.com。在建筑行业,找到一个能够高效管理时间、成本和资源的项目管理系统常常是一项挑战。这种系统的选择不仅影响项目的流程和效率,还直接关......