首页 > 数据库 >【MySQL】记一次复杂数据查询排重

【MySQL】记一次复杂数据查询排重

时间:2022-11-27 11:04:13浏览次数:51  
标签:status cust MySQL 查询 procurement time invitation 排重 id

事原

在某一次仿真环境测试过程中发现数据列表中出现重复条目,如下图:

【MySQL】记一次复杂数据查询排重_sql脚本

经过排查后发现,用户登录后cust_id(企业id)字段做了mybatis的动态语句判断。其中出现问题的SQL脚本如下:

SELECT  *
FROM
(SELECT
p.procurement_id,p.display_type,p.publish_type,p.valid_time,p.pay_type,p.cust_id,p.add_user,
t.trade_name,p.add_time,p.oper_user,p.oper_time,p.platform_audit_status,p.platform_back_reason,
p.platform_audit_user,p.platform_audit_time,p.status,p.procurement_title,p.alive_flag,c.is_gsp,c.is_gmp,
c.customer_service_user,
IFNULL(IF(p.display_type = 2, sui.CONTACT_NAME, fc.CONTACT_NAME), '暂无') AS CONTACT_NAME,
IFNULL(IF(p.display_type = 2, sui.CELLPHONE, fc.cell_phone), '暂无') AS cellphone,
IF(fc.SEX = 1, '先生', '女士') AS sex,
IF(INSTR(GROUP_CONCAT(t.TRADE_PUBLISH_STATE), '0') > 0, 0, 1) AS TRADE_PUBLISH_STATE,
IF(p.display_type = 2, '*******', c.CUST_NAME) AS CUST_NAME,
pn.status AS inviteStatus,pn.invitation_id,pn.send_time,
(SELECT IF(p.valid_time >= DATE_FORMAT(NOW(), '%Y-%m-%d'), 1, 2)) AS info_status,
IF(p.status = 1, 1, IF(p.status = 6, 1.5, 2)) AS proc_status,p.top_type AS topType,p.top_time AS topTime
FROM
(SELECT p.*, pd.trade_name_id, pd.procurement_detail_id
FROM spot_procurement p
LEFT JOIN (SELECT b.procurement_detail_id,
CAST(b.procurement_id AS UNSIGNED INTEGER) AS procurement_id,
b.trade_name_id,b.is_split
FROM spot_procurement_details b) pd ON pd.procurement_id = p.procurement_id
WHERE
p.platform_audit_status = 1 AND p.alive_flag = 1 AND p.status >= 1 AND (pd.is_split IS NULL OR pd.is_split != 'Y')
ORDER BY pd.trade_name_id ASC) p
LEFT JOIN (SELECT a.receive_cust_id,a.status,a.invitation_id,a.send_time,
CAST(a.procurement_id AS UNSIGNED INTEGER) AS procurement_id
FROM spot_procurement_invitation a) pn ON pn.procurement_id = p.procurement_id
LEFT JOIN spot_trade_name t ON t.trade_name_id = p.trade_name_id
LEFT JOIN spot_frequent_contacts fc ON p.cust_id = fc.CUST_ID AND fc.ALIVE_FLAG = 1 AND fc.IS_FREQUENT = 1
LEFT JOIN spot_company c ON c.cust_id = p.cust_id
LEFT JOIN spot_user_info sui ON c.CUSTOMER_SERVICE_USER = sui.USER_ID
WHERE 1 = 1 AND (pn.receive_cust_id = '100000000000000' OR p.publish_type = 2)
GROUP BY p.procurement_id

UNION

(SELECT
p.procurement_id,p.display_type,p.publish_type,p.valid_time,p.pay_type,p.cust_id,p.add_user,t.trade_name,p.add_time,
p.oper_user,p.oper_time,p.platform_audit_status,p.platform_back_reason,p.platform_audit_user,
p.platform_audit_time,p.status,p.procurement_title,p.alive_flag,c.is_gsp,c.is_gmp,c.customer_service_user,
IFNULL(IF(p.display_type = 2, sui.CONTACT_NAME, fc.CONTACT_NAME), '暂无') AS CONTACT_NAME,
IFNULL(IF(p.display_type = 2, sui.CELLPHONE, fc.cell_phone), '暂无') AS cellphone,
IF(fc.SEX = 1, '先生', '女士') AS sex,
IF(INSTR(GROUP_CONCAT(t.TRADE_PUBLISH_STATE), '0') > 0, 0, 1) AS TRADE_PUBLISH_STATE,
IF(p.display_type = 2, '*******', c.CUST_NAME) AS CUST_NAME,
pn.status AS inviteStatus,pn.invitation_id,pn.send_time,
(SELECT IF(p.valid_time >= DATE_FORMAT(NOW(), '%Y-%m-%d'), 1, 2)) AS info_status,
IF(p.status = 1, 1, IF(p.status = 6, 1.5, 2)) AS proc_status,p.top_type AS topType,p.top_time AS topTime
FROM
(SELECT p.*, pd.trade_name_id, pd.procurement_detail_id
FROM spot_procurement p
LEFT JOIN (SELECT
b.procurement_detail_id,CAST(b.procurement_id AS UNSIGNED INTEGER) AS procurement_id,b.trade_name_id,b.is_split
FROM spot_procurement_details b) pd ON pd.procurement_id = p.procurement_id
WHERE
p.platform_audit_status = 1 AND p.alive_flag = 1 AND p.status >= 1 AND (pd.is_split IS NULL OR pd.is_split != 'Y')
ORDER BY pd.trade_name_id ASC) p
LEFT JOIN (SELECT a.receive_cust_id,a.status,a.invitation_id,a.send_time,
CAST(a.procurement_id AS UNSIGNED INTEGER) AS procurement_id
FROM spot_procurement_invitation a) pn ON pn.procurement_id = p.procurement_id
LEFT JOIN spot_trade_name t ON t.trade_name_id = p.trade_name_id
LEFT JOIN spot_frequent_contacts fc ON p.cust_id = fc.CUST_ID AND fc.ALIVE_FLAG = 1 AND fc.IS_FREQUENT = 1
LEFT JOIN spot_company c ON c.cust_id = p.cust_id
LEFT JOIN spot_user_info sui ON c.CUSTOMER_SERVICE_USER = sui.USER_ID
WHERE 1 = 1 AND (pn.receive_cust_id = '100000000000000' OR p.publish_type = 2)
GROUP BY p.procurement_id)) sss
WHERE
sss.TRADE_PUBLISH_STATE = 1
ORDER BY sss.info_status ASC , sss.add_time DESC
LIMIT 0 , 8

解决

这是什么“祖传代码”,从上面的脚本不难发现当时开发人员几乎将所有业务逻辑都放在这里了。其他的我什么都看不懂,啥也别说了先执行看看吧:

【MySQL】记一次复杂数据查询排重_sql脚本_02

将结果集中全部字段都对比了一遍(由于脚本是带分页操作的,所以全部对比还可以接受),发现就是“invitation_id”和“send_time”字段是存在差异,而这种差异在UNION关键字加持下无法将同一个“procurement_id”字段的记录合并成1条。

那为什么会出现这种情况呢?

是因为在登录后cust_id字段的加入导致SQL被动态调整了,“inviteStatus”、“invitation_id”和“send_time”三个字段是存在“cust_id”字段的时候才会出现的。

因此,破局的方法就是先找出“inviteStatus”、“invitation_id”和“send_time”三个字段是通过那个地方取数的,如下图:

LEFT JOIN (SELECT a.receive_cust_id,a.status,a.invitation_id,a.send_time,
CAST(a.procurement_id AS UNSIGNED INTEGER) AS procurement_id
FROM spot_procurement_invitation a) pn ON pn.procurement_id = p.procurement_id

从上面的脚本可知上述的三个字段都通过这段脚本使用“LEFT JOIN”到主脚本获取到的。

那解决的办法就是先缩小子集的数据范围做到从一对多变成一对一就可以了,而幸运的是只要随机在“spot_procurement_invitation”表中指定一条数据即可。

这里可能会有人说直接用“limit”去解决不就行了。说实用“limit”去解决是可以的,但是要在主结构上做大量的调整才能够完成,这个并不是我想做的(毕竟从第一个脚本的SQL复杂度可知,真正代码中的SQL脚本绝对比现在拿出来调试的要多,现在拿出来调试的脚本是经过动态SQL过滤后的内容,为了不过度改变原有逻辑这种大改的方式被否决)。

第二个想法就是就直接修改LEFT JOIN的数据集内容,用“group by”的方式,使用“min”函数提取最小的“invitation_id”字段值就可以确定在一个“procurement_id”字段中只保留一条最小id的记录了。

SELECT 
a.receive_cust_id,
a.status,
MIN(a.invitation_id) AS invitation_id,
a.send_time,
CAST(a.procurement_id AS UNSIGNED INTEGER) AS procurement_id
FROM
spot_procurement_invitation a
GROUP BY a.receive_cust_id , a.status , a.procurement_id , a.send_time;

但是这样不行。虽然想法很好,但是由于我们需要返回的是所有字段,如果直接通过group by去分组我们会将“send_time”也要放在group by里面去分组,“send_time”精确到秒级,所以通过group by出来的数据永远都是多条的。

最终,我选择使用另一个种方式去做这种唯一性的数据获取

SELECT 
a.receive_cust_id,
a.status,
a.invitation_id,
a.send_time,
CAST(a.procurement_id AS UNSIGNED INTEGER) AS procurement_id
FROM
spot_procurement_invitation a
WHERE
(a.invitation_id , a.procurement_id) IN (SELECT
MAX(b.invitation_id) AS invitation_id, b.procurement_id
FROM
spot_procurement_invitation b
GROUP BY b.procurement_id);

这里通过一层嵌套查询将关键字联合查询出来,之后再列举具体需要的字段。

譬如“invitation_id”字段是需要选中最大的一个,而“procurement_id”字段是用来定位究竟是那个操作的,因此在保留这两个关键字的情况下进行了一次group by。

这个时候就可以只获唯一记录,再根据上述两个关键字做查询条件联合查询就可以得到这条唯一记录对应的全部信息了。

结论

因此当查询需要展示主表内容而有部分内容需要连接子表才可以展示的时候(子表内容无需规定选取的情况下)按照一下查询方式即可:

SELECT 
a.receive_cust_id,
a.status,
a.invitation_id,
a.send_time,
CAST(a.procurement_id AS UNSIGNED INTEGER) AS procurement_id
FROM
spot_procurement_invitation a
WHERE
(a.invitation_id , a.procurement_id) IN (SELECT
MAX(b.invitation_id) AS invitation_id, b.procurement_id
FROM
spot_procurement_invitation b
GROUP BY b.procurement_id);

标签:status,cust,MySQL,查询,procurement,time,invitation,排重,id
From: https://blog.51cto.com/u_15761576/5889835

相关文章