先说下中位数的概念:
中位数是统计学中的一个重要概念,它代表将一组数据按照大小排序后位于中间位置的数值。如果一组数据有奇数个数值,则中位数是中间那个数;如果数据中有偶数个数值,则中位数是中间两个数的平均值。中位数是一种位置平均数,它是按顺序排列的一组数据中居于中间位置的数,能够将数值集合划分为相等的上下两部分。中位数不受数据中极端值的影响,具有较强的代表性,通常用于描述一组数据的中心趋势
数据demo如下:
no | num |
1 | 1 |
2 | 7 |
3 | 9 |
4 | 15 |
5 | 20 |
6 | 21 |
总共有3种解法
1)解法1
如果是奇数的话,就是(n+1)/2
如果是偶数的话,就是n/2, n/2+1
select avg(num) as median
from
(
select num
,row_number() over(order by num asc) as rn
,count(1) over() as cnt
from tbl
) where rn in ( (cnt+1)/2, cnt/2,cnt/2+1)
2) 解法2
no | num | rn_asc | rn_desc |
1 | 1 | 1 | 6 |
2 | 7 | 2 | 5 |
3 | 9 | 3 | 4 |
4 | 15 | 4 | 3 |
5 | 20 | 5 | 2 |
6 | 21 | 6 | 1 |
正序排列和倒序排列,
偶数的话就是,rn_asc-rn_desc in (1,-1)
奇数的话就是,rn_asc-rn_desc in (0)
select avg(num) as median
from
(
select num
,row_number() over(order by num asc) as rn_asc
,row_number() over(order by num desc) as rn_desc
from tbl
) where rn_asc -rn_desc in ( 1,-1,0)
3) 解法3
偶数时,升序数>=总数/2,降序数>=总数/2
奇数时,升序数>=总数/2,降序数>=总数/2 这两个值相等
select avg(num) as median
from
(
select num
,row_number() over(order by num asc) as rn_asc
,row_number() over(order by num desc) as rn_desc
,count(1) over() as cnt
from tbl
) where rn_asc >=cnt/2
and rn_desc>=2
标签:over,中位数,Hive,asc,num,sql,rn,desc
From: https://blog.csdn.net/2402_86289011/article/details/142306273