学习总结
(1)敲黑板:练习第六题再看看,还欠最后1题困难题。
(2)猴子题解电子书
文章目录
- 学习总结
- 练习一: 各部门工资最高的员工(leetcode184 难度:中等)
- 练习二: 换座位(leetcode626 难度:中等)
- 练习三: 分数排名(leetcode178 难度:中等)
- 练习四:连续出现的数字(leetcode180 难度:中等)
- 练习八:各部门前3高工资的员工(leetcode185 难度:困难)
- 练习九:平面上最近距离 (leetcode612 难度: 困难)
- 练习十:行程和用户(leetcode612 难度:困难)
- Reference
练习一: 各部门工资最高的员工(leetcode184 难度:中等)
【leetcode】184 部门工资最高的员工
创建Employee 表,包含所有员工信息,每个员工有其对应的 Id, salary 和 department Id。
并插入数据:
# 第一题
USE autumn;
CREATE TABLE Employee
(Id INTEGER NOT NULL,
Name VARCHAR(100) NOT NULL,
Salary INTEGER NOT NULL,
DepartmentId INTEGER NOT NULL,
PRIMARY KEY(Id)
)
# DESC Employee;
INSERT INTO Employee VALUES('1', 'Joe', '70000', '1');
INSERT INTO Employee VALUES('2', 'Henry', '80000', '2');
INSERT INTO Employee VALUES('3', 'Sam', '60000', '2');
INSERT INTO Employee VALUES('4', 'Max', '90000', '1');
SELECT * FROM Employee;
+----+-------+--------+--------------+
| Id | Name | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1 | Joe | 70000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
+----+-------+--------+--------------+
创建Department 表,包含公司所有部门的信息。
# 创建部门表
CREATE TABLE Department(
Id VARCHAR(100) NOT NULL,
Name VARCHAR(100) NOT NULL,
PRIMARY KEY(Id)
)
INSERT INTO Department VALUES('1', 'IT');
INSERT INTO Department VALUES('2', 'Sales');
+----+----------+
| Id | Name |
+----+----------+
| 1 | IT |
| 2 | Sales |
+----+----------+
编写一个 SQL 查询,找出每个部门工资最高的员工。例如,根据上述给定的表格,Max 在 IT 部门有最高工资,Henry 在 Sales 部门有最高工资。
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| Sales | Henry | 80000 |
+------------+----------+--------+
(1)如果使用窗口函数,注意这样写是错的,因为窗口函数并没有“减少”行数。
SELECT Department,
Name AS Employee,
MAX(Salary) OVER (PARTITION BY Department) AS Salary
FROM (
SELECT b.Name AS Department, a.Name, a.Salary
FROM Employee AS a
INNER JOIN Department AS b
ON a.DepartmentId = b.Id
) AS F;
上面这样的结果则是:
所以继续使用关联子查询,但是这种写法看似正确,由于执行顺序的问题,WHERE部分是错误的:
# 第一题
SELECT Department,
Name AS Employee,
Salary
FROM (
SELECT b.Name AS Department, a.Name, a.Salary
FROM Employee AS a
INNER JOIN Department AS b
ON a.DepartmentId = b.Id
) AS F
# GROUP BY Department,
# WHERE Salary = (MAX(F.Salary) OVER (PARTITION BY F.Department));
WHERE Salary = (SELECT MAX(F.Salary)
FROM F
GROUP BY F.Department);
(2)我们重新分析下:
员工表employee
是有部门编号,但是无部门类型,部门表department
是有部门类型的。
因为要查所有员工,所以应该用员工表进行左连结(和部门表);然后找出每个部门内最高的工资作为子查询(这块即WHERE
),特别注意该where中的group by的东西需要出现在select后。
PS:这里字段名和表名都是employee,不要看懵了。
# 第一题
SELECT Department.name AS Department,
employee.name AS Employee,
Salary
FROM employee
LEFT JOIN department
ON employee.DepartmentId = department.Id
WHERE (employee.DepartmentId, Salary) in
(SELECT DepartmentId, max(Salary)
FROM employee
GROUP BY DepartmentId);
练习二: 换座位(leetcode626 难度:中等)
【leetcode】626 换座位
小美是一所中学的信息科技老师,她有一张 seat 座位表,平时用来储存学生名字和与他们相对应的座位 id。
其中纵列的id是连续递增的
小美想改变相邻俩学生的座位。
你能不能帮她写一个 SQL query 来输出小美想要的结果呢?
请创建如下所示seat表:
示例:
+---------+---------+
| id | student |
+---------+---------+
| 1 | Abbot |
| 2 | Doris |
| 3 | Emerson |
| 4 | Green |
| 5 | Jeames |
+---------+---------+
即上表的创表并插入数据:
# 第二题
# 创表语句
USE autumn;
CREATE TABLE seat
(id VARCHAR(100) NOT NULL,
student VARCHAR(100) NOT NULL,
PRIMARY KEY(id)
)
# 插入数据
INSERT INTO seat VALUES('1', 'Abbot');
INSERT INTO seat VALUES('2', 'Doris');
INSERT INTO seat VALUES('3', 'Emerson');
INSERT INTO seat VALUES('4', 'Greeen');
INSERT INTO seat VALUES('5', 'Jeames');
假如数据输入的是上表,则输出结果如下:
+---------+---------+
| id | student |
+---------+---------+
| 1 | Doris |
| 2 | Abbot |
| 3 | Green |
| 4 | Emerson |
| 5 | Jeames |
+---------+---------+
注意:
如果学生人数是奇数,则不需要改变最后一个同学的座位。
方法一:
注意因为题目说了id
是按序递增的,最后一个同学的id即所有座位总数,统计这个总数有两种方法:
# 统计座位总数法一
SELECT COUNT(*) AS counts FROM seat;
# 统计座位总数法二
SELECT COUNT(distinct id) FROM seat;
但是第一种不能用在下面做法内,全部sql代码:
SELECT
IF(id%2 = 0,
id - 1,
#IF(id = (SEELCT COUNT(*) AS counts FROM seat), # 这句不可以
IF(id = (select COUNT(distinct id) from seat), # 如果是最后一个
id,
id + 1))
AS id, student
FROM seat
ORDER BY id;
练习三: 分数排名(leetcode178 难度:中等)
假设在某次期末考试中,二年级四个班的平均成绩分别是 93、93、93、91
,请问可以实现几种排名结果?分别使用了什么函数?排序结果是怎样的?(只考虑降序)
+-------+-----------+
| class | score_avg |
+-------+-----------+
| 1 | 93 |
| 2 | 93 |
| 3 | 93 |
| 4 | 91 |
+-------+-----------+
(1)排序1:若分数相同则排名相同,平分后的下一个名次应该是下一个连续的整数值,即名次之间没有间隔值。用了窗口函数,因为是对全部行进行排序,所以不需要用PARTITION BY
。
SELECT Score,
dense_rank() OVER (ORDER BY Score desc) AS 'Rank'
FROM Scores;
练习四:连续出现的数字(leetcode180 难度:中等)
【leetcode】180 连续出现的数字
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| num | varchar |
+-------------+---------+
编写一个 SQL 查询,查找所有至少连续出现三次的数字。
查询的结果如下:
Logs 表:
+----+-----+
| Id | Num |
+----+-----+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 2 |
| 7 | 2 |
+----+-----+
Result 表:
+-----------------+
| ConsecutiveNums |
+-----------------+
| 1 |
+-----------------+
1
方法一:使用自连接
可以看做将一张表复制出多张一毛一样的表来使用,因为是要找连续出现3次及其以上的数字,所以“复制”三次。
# Write your MySQL query statement below
SELECT DISTINCT a.num as ConsecutiveNums
FROM Logs AS a,
Logs AS b,
Logs AS c
WHERE a.Id = b.Id - 1
AND b.Id = c.Id - 1
AND a.Num = b.Num
AND b.Num = c.Num;
方法二:窗口函数
这种方法可以参考猴子题解——拼多多面试题:如何找出连续出现N次的内容?。
leetcode603的连续空余座位和这个很类似:
SELECT DISTINCT a.seat_id
# 自连接
FROM cinema a join cinema b
ON abs(a.seat_id - b.seat_id) = 1
AND a.free = true and b.free = true
ORDER BY seat_id;
练习五:树节点 (leetcode608 难度:中等)
对于tree表,id是树节点的标识,p_id是其父节点的id。
+----+------+
| id | p_id |
+----+------+
| 1 | null |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
+----+------+
每个节点都是以下三种类型中的一种:
- Root: 如果节点是根节点。
- Leaf: 如果节点是叶子节点。
- Inner: 如果节点既不是根节点也不是叶子节点。
写一条查询语句打印节点id及对应的节点类型。按照节点id排序。要求上面例子的对应结果为:
+----+------+
| id | Type |
+----+------+
| 1 | Root |
| 2 | Inner|
| 3 | Leaf |
| 4 | Leaf |
| 5 | Leaf |
+----+------+
说明
- 节点’1’是根节点,因为它的父节点为NULL,有’2’和’3’两个子节点。
- 节点’2’是内部节点,因为它的父节点是’1’,有子节点’4’和’5’。
- 节点’3’,‘4’,'5’是叶子节点,因为它们有父节点但没有子节点。
下面是树的图形:
1
/ \
2 3
/ \
4 5
注意
如果一个树只有一个节点,只需要输出根节点属性。
方法一:用case
本题要判断节点类型:
(1)如果在tree
表中p_id
为null说明该节点一定为根结点;
(2)如果在tree
表中的p_id
一列中没有出现的节点,一定为Inner
节点(没做过父节点);这个判断语句为:WHEN id IN (SELECT p_id FROM tree WHERE p_id IS NOT NULL) THEN 'Inner'
,注意是用tree.id
判断而非tree.p_id
,并且不同WHENE
之间是没有逗号的。
(3)其余为叶子结点Leaf
。
SELECT id,
CASE WHEN p_id IS NULL THEN 'Root'
WHEN id IN (SELECT p_id FROM tree WHERE p_id IS NOT NULL) THEN 'Inner'
ELSE 'Leaf'
END AS Type
FROM tree
ORDER BY id;
方法二:用if
# 方法二:用if
SELECT id,
IF(isnull(p_id),'Root',
IF(id IN (SELECT p_id FROM tree WHERE p_id IS NOT NULL),'Inner','Leaf'))
AS Type
FROM tree
ORDER BY
练习六:至少有五名直接下属的经理 (leetcode570 难度:中等)
Employee表包含所有员工及其上级的信息。每位员工都有一个Id,并且还有一个对应主管的Id(ManagerId)。
+------+----------+-----------+----------+
|Id |Name |Department |ManagerId |
+------+----------+-----------+----------+
|101 |John |A |null |
|102 |Dan |A |101 |
|103 |James |A |101 |
|104 |Amy |A |101 |
|105 |Anne |A |101 |
|106 |Ron |B |101 |
+------+----------+-----------+----------+
针对Employee表,写一条SQL语句找出有5个下属的主管。对于上面的表,结果应输出:
+-------+
| Name |
+-------+
| John |
+-------+
注意:
没有人向自己汇报。
方法一:关联子查询(敲黑板)
回顾我们之前用关联子查询的栗子:
SELECT product_type, product_name, sale_price
FROM product AS p1
WHERE sale_price > (SELECT AVG(sale_price)
FROM product AS p2
WHERE p1.product_type = p2.product_type
GROUP BY product_type);
同样我们的方法和上面的类似,内外两层查询通过b.ManagerId = a.Id
条件关联,主查询需要满足b.Id
个数大于等于5。
SELECT a.Name
FROM Employee2 AS a
WHERE 5 <= (SELECT COUNT(b.Id)
FROM Employee2 AS b
WHERE b.ManagerId = a.Id);
方法二:
速度更快的方法,按照 managerId
分组,使用 having
筛选出大于等于 5 名下属的经理 id。
# Write your MySQL query statement below
SELECT name
FROM employee
WHERE id in(
SELECT managerId
FROM employee
GROUP BY managerId
having count(managerId) >= 5
);
练习七:查询回答率最高的问题 (leetcode578 难度:中等)
求出survey_log表中回答率最高的问题,表格的字段有:uid, action, question_id, answer_id, q_num, timestamp。
uid是用户id;action的值为:“show”, “answer”, “skip”;当action是"answer"时,answer_id
不为空,相反,当action是"show"和"skip"时为空(null);q_num
是问题的数字序号。
写一条sql语句找出回答率最高的 question_id
。
举例:
输入
uid | action | question_id | answer_id | q_num | timestamp |
5 | show | 285 | null | 1 | 123 |
5 | answer | 285 | 124124 | 1 | 124 |
5 | show | 369 | null | 2 | 125 |
5 | skip | 369 | null | 2 | 126 |
输出
question_id |
285 |
说明
问题285的回答率为1/1,然而问题369的回答率是0/1,所以输出是285。
注意:
最高回答率的意思是:同一个问题出现的次数中回答的比例。
方法一:
首先要读懂题目,就比如选手答题的过程,面对出现的每道题目出现show
,可以选择answer
或者skip
,题目要求的是各题的回答率,进行排序。题目的“说明”也说了:问题285的回答率为1/1,然而问题369的回答率是0/1,所以输出是285。所以我们需要先把1/1和0/1这些对应数字找出来:
SELECT question_id,
SUM(case when action = "answer" THEN 1 ELSE 0 END) as num_answers,
SUM(case when action = "show" THEN 1 ELSE 0 END) as num_shows
FROM survey_log
GROUP BY question_id;
结果为:
基于上面的表对(num_answers / num_shows)
进行排序,取最大值:
SELECT question_id
FROM
(
SELECT question_id,
SUM(case when action = "answer" THEN 1 ELSE 0 END) as num_answers,
SUM(case when action = "show" THEN 1 ELSE 0 END) as num_shows
FROM survey_log
GROUP BY question_id
) as tbl
ORDER BY (num_answers / num_shows) DESC
LIMIT 1;
注意:最后加上LIMIT 1
,只要找到了对应的一条记录,就不会继续向下扫描了,效率会大大提高。 LIMIT 1适用于查询结果为1条(也可能为0)会导致全表扫描的的SQL语句。
方法二:
SELECT
question_id
FROM
survey_log
GROUP BY question_id
ORDER BY COUNT(answer_id) / COUNT(IF(action = 'show', 1, 0)) DESC
LIMIT 1;
练习八:各部门前3高工资的员工(leetcode185 难度:困难)
将练习一中的 employee
表清空,重新插入以下数据(也可以复制练习一中的 employee
表,再插入第5、第6行数据):
+----+-------+--------+--------------+
| Id | Name | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1 | Joe | 70000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
| 5 | Janet | 69000 | 1 |
| 6 | Randy | 85000 | 1 |
+----+-------+--------+--------------+
和练习一一样,还有Department
表,包含公司所有部门的信息。
# 创建部门表
CREATE TABLE Department(
Id VARCHAR(100) NOT NULL,
Name VARCHAR(100) NOT NULL,
PRIMARY KEY(Id)
)
INSERT INTO Department VALUES('1', 'IT');
INSERT INTO Department VALUES('2', 'Sales');
+----+----------+
| Id | Name |
+----+----------+
| 1 | IT |
| 2 | Sales |
+----+----------+
【题目要求】编写一个 SQL 查询,找出每个部门工资前三高的员工。例如,根据上述给定的表格,查询结果应返回:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Randy | 85000 |
| IT | Joe | 70000 |
| Sales | Henry | 80000 |
| Sales | Sam | 60000 |
+------------+----------+--------+
方法一:
SELECT D1.Name Department, E1.Name Employee, E1.Salary
FROM Employee E1, Employee E2, Department D1
WHERE E1.DepartmentID = E2.DepartmentID
AND E2.Salary >= E1.Salary
AND E1.DepartmentID = D1.ID
GROUP BY E1.Name
HAVING COUNT(DISTINCT E2.Salary) <= 3
ORDER BY D1.Name, E1.Salary DESC;
此外,请考虑实现各部门前N高工资的员工功能。
练习九:平面上最近距离 (leetcode612 难度: 困难)
point_2d表包含一个平面内一些点(超过两个)的坐标值(x,y)。
写一条查询语句求出这些点中的最短距离并保留2位小数。
|x | y |
|----|----|
| -1 | -1 |
| 0 | 0 |
| -1 | -2 |
最短距离是1,从点(-1,-1)到点(-1,-2)。所以输出结果为:
+--------+
|shortest|
+--------+
|1.00 |
+--------+
注意:所有点的最大距离小于10000。
方法一:自连接:
SELECT p1.x, p1.y, p2.x, p2.y,
round(min(sqrt(power(p1.x - p2.x, 2) + power(p1.y - p2.y, 2))), 2) AS shortest
FROM point_2d AS p1, point_2d AS p2
WHERE p1.x != p2.x
OR p1.y != p2.y
ORDER BY shortest;
其中不等于也可以用<>
,:
SELECT p1.x, p1.y, p2.x, p2.y,
round(min(sqrt(power(p1.x - p2.x, 2) + power(p1.y - p2.y, 2))), 2) AS shortest
FROM point_2d AS p1, point_2d AS p2
WHERE (p1.x, p1.y) <> (p2.x, p2.y);
直接输出shortest
:
select round(min(sqrt(power(p1.x-p2.x,2) + power(p1.y-p2.y,2))),2) shortest
from point_2d p1, point_2d p2
where (p1.x, p1.y) <> (p2.x, p2.y);
练习十:行程和用户(leetcode612 难度:困难)
Trips 表中存所有出租车的行程信息。每段行程有唯一键 Id,Client_Id
和 Driver_Id
是 Users 表中 Users_Id
的外键。Status 是枚举类型,枚举成员为 (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’)。
Id | Client_Id | Driver_Id | City_Id | Status | Request_at |
1 | 1 | 10 | 1 | completed | 2013-10-1 |
2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-1 |
3 | 3 | 12 | 6 | completed | 2013-10-1 |
4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-1 |
5 | 1 | 10 | 1 | completed | 2013-10-2 |
6 | 2 | 11 | 6 | completed | 2013-10-2 |
7 | 3 | 12 | 6 | completed | 2013-10-2 |
8 | 2 | 12 | 12 | completed | 2013-10-3 |
9 | 3 | 10 | 12 | completed | 2013-10-3 |
10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-3 |
Users 表存所有用户。每个用户有唯一键 Users_Id。Banned 表示这个用户是否被禁止,Role 则是一个表示(‘client’, ‘driver’, ‘partner’)的枚举类型。
+----------+--------+--------+
| Users_Id | Banned | Role |
+----------+--------+--------+
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
+----------+--------+--------+
写一段 SQL 语句查出2013年10月1日至2013年10月3日期间非禁止用户的取消率。基于上表,你的 SQL 语句应返回如下结果,取消率(Cancellation Rate)保留两位小数。
+------------+-------------------+
| Day | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
+------------+-------------------+
Reference
(1)leetcode题解
(2)datawhale notebook
(3)SQL优化之limit 1