在数据处理方面,spark sql的处理速度优于hive sql
场景1:在数据条数为491条时,使用spark sql 和hive sql在中台处理的时间,分别用时9s和55s
//使用的SQL语句
SELECT
YEAR(update_time) AS year,
month(update_time) as month
FROM
dwd_tb_customer_store_appraise
WHERE
YEAR(update_time) = 2023
GROUP BY
YEAR(update_time),
month(update_time)
ORDER BY
year,
month
spark sql结果:
2024-08-23 09:47:28 get jobid:614507662721155072
2024-08-23 09:47:37 INFO Cost time is: +9.000+s
2024-08-23 09:47:37 INFO Current task status: SUCCESS
hive sql结果:
2024-08-23 09:47:47 get jobid:614752808616329216
2024-08-23 09:47:47 INFO Current task status:RUNNING
2024-08-23 09:47:47 sql:
--Hive
SELECT
YEAR(update_time) AS year,
month(update_time) as month
FROM
dwd_tb_customer_store_appraise
WHERE
YEAR(update_time) = 2023
GROUP BY
YEAR(update_time),
month(update_time)
ORDER BY
year,
month
2024-08-23 09:48:42 INFO Cost time is: +55.000+s
2024-08-23 09:48:42 INFO Current task status: SUCCESS
结论
:可看出在执行数据处理时,使用spark sql的效率要优于hive sql