1. 前言
使用SQL对同一行数据进行列间比较很简单,只需要在WHERE子句里写上比较条件就可以了。但是,对不同行数据进行列间的比较却没那么简单。然而,这并不是说我们不能用SQL进行行与行之间的比较。 使用SQL进行行间比较时,发挥主要作用的技术是关联子查询,特别是与自连接结合的“关联子查询”。
2. 用列表展示与上一年的比较结果
Sales
year(年份)sale(年营业额:亿日元)
199050199151199252199352199450199550199649199755
SELECT S1
.year,S1
.sale
,
CASE WHEN S1
.sale
< (SELECT sale
FROM Sales
AS S2
WHERE S2
.year = S1
.year - 1) THEN '↓'
WHEN S1
.sale
= (SELECT sale
FROM Sales
AS S2
WHERE S2
.year = S1
.year - 1) THEN '→'
WHEN S1
.sale
> (SELECT sale
FROM Sales
AS S2
WHERE S2
.year = S1
.year - 1) THEN '↑'
ELSE '--' END AS var
FROM Sales
AS S1
ORDER BY S1
.year;
注意:在关联子查询中,对于外部查询返回的每一行数据,内部查询都要执行一次。另外,在关联子查询中是信息流是双向的。外部查询的每行数据传递一个值给子查询,然后子查询为每一行数据执行一次并返回它的记录。然后,外部查询根据返回的记录做出决策。
SELECT S1
.year,S1
.sale
,
CASE WHEN S1
.sale
> S2
.sale
THEN '↑'
WHEN S1
.sale
= S2
.sale
THEN '→'
WHEN S1
.sale
< S2
.sale
THEN '↓'
ELSE '--' END AS var
FROM Sales
AS S1
,Sales
AS S2
WHERE S2
.year = S1
.year - 1;
3. 时间轴有断点时:和过去最近的时间进行比较
对某一年,“过去最临近的年份”需要满足下面两个条件
与该年份相比是过去的年份在满足条件1的年份中,年份最早的一个
Sale2:有年份缺失
year(年份)sale(年营业额)
199050199250199352199455199755
SELECT year,sale
FROM Sales2 S1
WHERE sale
= (SELECT sale
FROM Sales S2
WHERE S2
.year = (SELECT max(S3
.year)
FROM Sales2
AS S3
WHERE S1
.year > S3
.year ))
ORDER BY year;
SELECT S1
.year AS year,S1
.sale
AS sale
FROM Sales2
AS S1
,Sales2
AS S2
WHERE S1
.sale
= S2
.sale
AND S2
.year =
(SELECT MAX(year) FROM Sales2 S3
WHERE S1
.year > S3
.year)
ORDER BY year;
注意:再次强调,不清楚请按照SQL内部执行流程在纸上来一遍(关联子查询是从外部开始,将每一条记录带入内部然后返回)
4. 移动累计值和移动平均值
Accounts
prc_date(处理日期)prc_amt(处理金额)
2006-10-2612 0002006–10-282 5002006–10-31-15 0002006–11-0334 0002006–11-04-5 0002006–11-067 2002006–11-1111 000
求移动累计值
SELECT prc_date
,prc_amt
, sum(prc_amt
) over(order by prc_date
) AS onhand_amt
FROM accounts
;
SELECT a1
.prc_date
,(SELECT sum(a2
.prc_amt
)
FROM accounts a2
WHERE a2
.prc_date
< a1
.prc_date
) AS onhand_amt
FROM accounts
as a1
ORDER BY a1
.prc_date
;
求移动平均值
prc_date(处理日期)prc_amt(处理金额)
2006-10-2612 0002006–10-282 5002006–10-31-15 0002006–11-0334 0002006–11-04-5 0002006–11-067 2002006–11-1111 000
SELECT prc_date
, prc_amt
, SUM(prc_amt
) OVER (ORDER BY prc_date
ROWS 2 PRECEDING) AS onhand_amt
FROM Accounts
;
SELECT prc_date
, A1
.prc_amt
,
( SELECT SUM(prc_amt
) FROM Accounts A2
WHERE A1
.prc_date
>= A2
.prc_date
AND (SELECT COUNT(*) FROM Accounts A3
WHERE A3
.prc_date
BETWEEN A2
.prc_date
AND A1
.prc_date
) <= 3
) AS mvg_sum
FROM Accounts A1
SELECT prc_date
, A1
.prc_amt
,
( SELECT SUM(prc_amt
) FROM Accounts A2
WHERE A1
.prc_date
>= A2
.prc_date
AND (SELECT COUNT(*) FROM Accounts A3
WHERE A3
.prc_date
BETWEEN A2
.prc_date
AND A1
.prc_date
) <= 3
HAVING COUNT(*) =3
) AS mvg_sum
FROM Accounts A1
ORDER BY prc_date
;
5. 查询重叠的时间区间
Reservations
reserver(入住客人)start_date(入住日期)end_date(离店日期)
木村2006-10-262006-10-27荒木2006-10-282006-10-31堀2006-10-312006-11-01山本2006-11-032006-11-04内田2006-11-032006-11-05水谷2006-11-062006-11-06
日期的重叠类型
自己入住日期在他人的住宿期间内自己的离店日期在他人的住宿期间内自己的入住日期的离店日期都在他人的住宿期间内
如何查处住宿日期重叠的客人并列表表示
SELECT reserver
, start_date
, end_date
FROM Reservations R1
WHERE EXISTS
(SELECT *
FROM Reservations R2
WHERE R1
.reserver
<> R2
.reserver
AND ( R1
.start_date
BETWEEN R2
.start_date
AND R2
.end_date
OR R1
.end_date
BETWEEN R2
.start_date
AND R2
.end_date
)
);
升级版:把完全包含别人的住宿期间的情况也输出
SELECT reserver
, start_date
, end_date
FROM Reservations R1
WHERE EXISTS
(SELECT *
FROM Reservations R2
WHERE R1
.reserver
<> R2
.reserver
AND ( R1
.start_date
BETWEEN R2
.start_date
AND R2
.end_date
OR R1
.end_date
BETWEEN R2
.start_date
AND R2
.end_date
)
OR (R2
.start_date
BETWEEN R1
.start_date
AND R1
.end_date
AND
R2
.end_date
BETWEEN R1
.start_date
AND R1
.end_date
)
);
本节要点:
关联子查询代码可读性不好在SELECT子句里面使用了关联子查询时,性能可能会变差。我们掌握任何一样东西,都像一把双刃剑,有着好和坏两面。所以请运用智慧利用它作为面向集合语言,SQL在比较多行数据时,不进行排序和循环。SQL的做法时添加比较对象数据的集合,通过关联子查询一行一行的偏移处理。