最近在做数据查询需求的时候,遇到按照时间分组查询统计指标的需求,比如说查询模块的最近 15 天访问数据量,没有数据的日期补 0,以前对于这种类似的需求都是通过代码来补数据,想试试 sql 实现这种查询,因此查询了不少文章,对于类似实现方法的文章网上也有很多,差异也很多,因此这篇文章只作为一个参考,提供一个思路.
对于 mysql
比如刚才这个需求是是针对 MySQL 的,查询模块的最近 15 天访问数据量,没有数据的日期补 0。
对于这个 sql 我们主要思路是按照table 种日期字段进行分组,然后用一个日期序列对结果进行 join,没有数据的日期直接补 0:
select
DATE_FORMAT(DATE_SUB(now(),INTERVAL xc day),'%Y-%m-%d') as date_str
from
(
-- 获取0-14 的序列 @xi:=@xi+1 每次执行都加一
select
@xi := @xi +1 as xc
from
(select 1 union select 2 union select 3) xc1,
(select 1 union select 2 union select 3 union select 4 union select 5) xc2,
(select @xi := -1) xc0
)xcxc
执行结果如下:
然后使用结果 left join 查询结果,null 就补0 即可。
对于 clickhouse
clickhouse提供的函数很多,也提供了类似需求的方案。对于 clickhouse 我接触不是很多,或许还有很多更好的方案。
需求按照分钟或者小时聚合数据:
-- 按照分钟统计
select
toStartOfInterval(start_t, INTERVAL 1 MINUTE) AS data_str,
count() AS records
FROM xxxx.xxxxxx
WHERE xxxxx
GROUP BY data_str
ORDER BY date_str
WITH fill
FROM toDateTime('2024-02-22 23:00:00')
TO toDateTime('2024-02-22 23:59:59')
step 1*60
-- 按照小时统计
select
toStartOfInterval(start_t, INTERVAL 1 hour) AS data_str,
count() AS records
FROM xxxx.xxxxxx
WHERE xxxxx
GROUP BY data_str
ORDER BY date_str
WITH fill
FROM toDateTime('2024-02-22 23:00:00')
TO toDateTime('2024-02-21 23:59:59')
step 1*3600
这里不贴图展示查询结果了。
主要是通过 2 个语法实现。
toStartOfInterval()和 with fill 。建议查看官方文档了解
https://clickhouse.com/docs/en/sql-reference/statements/select/order-by#filling-grouped-by-sorting-prefix
注意事项:ck 我的需求是使用 java 连接查询,使用 druid 连接 ck 的时候,需要去掉 wall filter,因为 ck 的语法 有些不太符合标准 sql,druid 的 sqlparser 会报错拦截。
以下来自官方文档
toStartOfInterval(date_or_date_with_time, INTERVAL x unit [, time_zone])
This function generalizes other toStartOf*()
functions. For example,
toStartOfInterval(t, INTERVAL 1 year)
returns the same astoStartOfYear(t)
,toStartOfInterval(t, INTERVAL 1 month)
returns the same astoStartOfMonth(t)
,toStartOfInterval(t, INTERVAL 1 day)
returns the same astoStartOfDay(t)
,toStartOfInterval(t, INTERVAL 15 minute)
returns the same astoStartOfFifteenMinutes(t)
.
The calculation is performed relative to specific points in time:
Interval | Start |
---|---|
year | year 0 |
quarter | 1900 Q1 |
month | 1900 January |
week | 1970, 1st week (01-05) |
day | 1970-01-01 |
hour | (*) |
minute | 1970-01-01 00:00:00 |
second | 1970-01-01 00:00:00 |
millisecond | 1970-01-01 00:00:00 |
microsecond | 1970-01-01 00:00:00 |
nanosecond | 1970-01-01 00:00:00 |
(*) hour intervals are special: the calculation is always performed relative to 00:00:00 (midnight) of the current day. As a result, only hour values between 1 and 23 are useful.
ORDER BY Expr WITH FILL Modifier
This modifier also can be combined with LIMIT … WITH TIES modifier.
WITH FILL
modifier can be set after ORDER BY expr
with optional FROM expr
, TO expr
and STEP expr
parameters. All missed values of expr
column will be filled sequentially and other columns will be filled as defaults.
To fill multiple columns, add WITH FILL
modifier with optional parameters after each field name in ORDER BY
section.
ORDER BY expr [WITH FILL] [FROM const_expr] [TO const_expr] [STEP const_numeric_expr], ... exprN [WITH FILL] [FROM expr] [TO expr] [STEP numeric_expr]
[INTERPOLATE [(col [AS expr], ... colN [AS exprN])]]
WITH FILL
can be applied for fields with Numeric (all kinds of float, decimal, int) or Date/DateTime types. When applied for String
fields, missed values are filled with empty strings. When FROM const_expr
not defined sequence of filling use minimal expr
field value from ORDER BY
. When TO const_expr
not defined sequence of filling use maximum expr
field value from ORDER BY
. When STEP const_numeric_expr
defined then const_numeric_expr
interprets as is
for numeric types, as days
for Date type, as seconds
for DateTime type. It also supports INTERVAL data type representing time and date intervals. When STEP const_numeric_expr
omitted then sequence of filling use 1.0
for numeric type, 1 day
for Date type and 1 second
for DateTime type. INTERPOLATE
can be applied to columns not participating in ORDER BY WITH FILL
. Such columns are filled based on previous fields values by applying expr
. If expr
is not present will repeat previous value. Omitted list will result in including all allowed columns.
Example of a query without WITH FILL
:
SELECT n, source FROM (
SELECT toFloat32(number % 10) AS n, 'original' AS source
FROM numbers(10) WHERE number % 3 = 1
) ORDER BY n;
Result:
┌─n─┬─source───┐
│ 1 │ original │
│ 4 │ original │
│ 7 │ original │
└───┴──────────┘
标签:00,01,expr,INTERVAL,Clickhouse,Mysql,数据,ORDER,select From: https://www.cnblogs.com/lovedudu/p/18065497