描述
用户行为日志表tb_user_log
id | uid | artical_id | in_time | out_time | sign_cin |
---|---|---|---|---|---|
1 | 101 | 9001 | 2021-10-31 10:00:00 | 2021-10-31 10:00:09 | 0 |
2 | 102 | 9001 | 2021-10-31 10:00:00 | 2021-10-31 10:00:09 | 0 |
3 | 101 | 0 | 2021-11-01 10:00:00 | 2021-11-01 10:00:42 | 1 |
4 | 102 | 9001 | 2021-11-01 10:00:00 | 2021-11-01 10:00:09 | 0 |
5 | 108 | 9001 | 2021-11-01 10:00:01 | 2021-11-01 10:00:50 | 0 |
6 | 108 | 9001 | 2021-11-02 10:00:01 | 2021-11-02 10:00:50 | 0 |
7 | 104 | 9001 | 2021-11-02 10:00:28 | 2021-11-02 10:00:50 | 0 |
8 | 106 | 9001 | 2021-11-02 10:00:28 | 2021-11-02 10:00:50 | 0 |
9 | 108 | 9001 | 2021-11-03 10:00:01 | 2021-11-03 10:00:50 | 0 |
10 | 109 | 9002 | 2021-11-03 11:00:55 | 2021-11-03 11:00:59 | 0 |
11 | 104 | 9003 | 2021-11-03 11:00:45 | 2021-11-03 11:00:55 | 0 |
12 | 105 | 9003 | 2021-11-03 11:00:53 | 2021-11-03 11:00:59 | 0 |
13 | 106 | 9003 | 2021-11-03 11:00:45 | 2021-11-03 11:00:55 | 0 |
(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)
问题:统计每天的日活数及新用户占比
注:
新用户占比=当天的新用户数÷当天活跃用户数(日活数)。
如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过。
新用户占比保留2位小数,结果按日期升序排序。
输出示例:
示例数据的输出结果如下
dt | dau | uv_new_ratio |
---|---|---|
2021-10-30 | 2 | 1.00 |
2021-11-01 | 3 | 0.33 |
2021-11-02 | 3 | 0.67 |
2021-11-03 | 5 | 0.40 |
解释:
2021年10月31日有2个用户活跃,都为新用户,新用户占比1.00;
2021年11月1日有3个用户活跃,其中1个新用户,新用户占比0.33;
输入示例1
DROP TABLE IF EXISTS tb_user_log;
CREATE TABLE tb_user_log (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid INT NOT NULL COMMENT '用户ID',
artical_id INT NOT NULL COMMENT '视频ID',
in_time datetime COMMENT '进入时间',
out_time datetime COMMENT '离开时间',
sign_in TINYINT DEFAULT 0 COMMENT '是否签到'
) CHARACTER SET utf8 COLLATE utf8_bin;
INSERT INTO tb_user_log(uid, artical_id, in_time, out_time, sign_in) VALUES
(101, 9001, '2021-10-31 10:00:00', '2021-10-31 10:00:09', 0),
(102, 9001, '2021-10-31 10:00:00', '2021-10-31 10:00:09', 0),
(101, 0, '2021-11-01 10:00:00', '2021-11-01 10:00:42', 1),
(102, 9001, '2021-11-01 10:00:00', '2021-11-01 10:00:09', 0),
(108, 9001, '2021-11-01 10:00:01', '2021-11-01 10:01:50', 0),
(108, 9001, '2021-11-02 10:00:01', '2021-11-02 10:01:50', 0),
(104, 9001, '2021-11-02 10:00:28', '2021-11-02 10:00:50', 0),
(106, 9001, '2021-11-02 10:00:28', '2021-11-02 10:00:50', 0),
(108, 9001, '2021-11-03 10:00:01', '2021-11-03 10:01:50', 0),
(109, 9002, '2021-11-03 11:00:55', '2021-11-03 11:00:59', 0),
(104, 9003, '2021-11-03 11:00:45', '2021-11-03 11:00:55', 0),
(105, 9003, '2021-11-03 11:00:53', '2021-11-03 11:00:59', 0),
(106, 9003, '2021-11-03 11:00:45', '2021-11-03 11:00:55', 0);
输出:
2021-10-31|2|1.00
2021-11-01|3|0.33
2021-11-02|3|0.67
2021-11-03|5|0.40
我的解题思路:
1.处理in_time-进入时间和out_time-离开时间跨天问题,可以使用union all,后面再去重,这样就能统计到跨天的数据
select uid,
date_format(in_time, '%Y-%m-%d') as c_date
from tb_user_log
union all
select uid,
date_format(out_time, '%Y-%m-%d') as c_date
from tb_user_log
2.使用窗口函数dense_rank()来查找新用户,按照用户分组,日期降序排序,排在第一的说明该用户是第一次访问,即为新用户(可以使用if判断)
select distinct uid, # 记得去重
c_date,
if(dense_rank() over (partition by uid order by c_date) = 1, 1, 0) as num
from (
select uid,
date_format(in_time, '%Y-%m-%d') as c_date
from tb_user_log
union all
select uid,
date_format(out_time, '%Y-%m-%d') as c_date
from tb_user_log
) tt
3.对日期分组求和,得到日活和新用户占比
完整代码 :
select c_date,
count(uid) as dau,
round(sum(num) / count(uid), 2) as uv_new_ratio
from (
select distinct uid,
c_date,
if(dense_rank() over (partition by uid order by c_date) = 1, 1, 0) as num
from (
select uid,
date_format(in_time, '%Y-%m-%d') as c_date
from tb_user_log
union all
select uid,
date_format(out_time, '%Y-%m-%d') as c_date
from tb_user_log
) tt) t
group by c_date
order by c_date
;
其他解题思路:
看了下评价区,很多都是用的left join,但是作为一个Sql Boy,我认为很多计算第一时间都要想到窗口函数能不能解决,不能解决再想其他办法。
做SQL题的核心是理清一步一步的思路,这个思路就是你需要什么数据,然后怎么构造需要的数据
select
t1.dt dt,
count(distinct t1.uid) dau,
round(count(distinct t2.uid)/ count(distinct t1.uid),2) ub_new_ratio
from
( -- 查找每天在线人的信息
select
uid,date(in_time) dt
from tb_user_log
union
select
uid,date(out_time) dt
from tb_user_log
) t1
left join
(
select -- 查找每一天的新用户
uid,min(date(in_time)) dt
from tb_user_log
group by uid
) t2
on t1.uid=t2.uid and t1.dt=t2.dt
group by dt
order by dt
;
标签:11,10,00,SQL166,uid,01,牛客,活数,2021
From: https://www.cnblogs.com/moliyy/p/17678838.html