集合运算符
描述
集合运算符用于将两个输入关系组合成一个单一关系。Spark SQL 支持三种类型的集合运算符:
EXCEPT或MINUSINTERSECTUNION
请注意,输入关系必须具有相同数量的列以及兼容的相应列数据类型。
EXCEPT
EXCEPT 和 EXCEPT ALL 返回在一个关系中存在但不在另一个关系中存在的行。EXCEPT(或者说 EXCEPT DISTINCT)只返回不重复的行,而 EXCEPT ALL 不会从结果行中移除重复项。请注意,MINUS 是 EXCEPT 的别名。
语法
[ ( ] relation [ ) ] EXCEPT | MINUS [ ALL | DISTINCT ] [ ( ] relation [ ) ]
示例
-- Use number1 and number2 tables to demonstrate set operators in this page.
SELECT * FROM number1;
+---+
| c|
+---+
| 3|
| 1|
| 2|
| 2|
| 3|
| 4|
+---+
SELECT * FROM number2;
+---+
| c|
+---+
| 5|
| 1|
| 2|
| 2|
+---+
SELECT c FROM number1 EXCEPT SELECT c FROM number2;
+---+
| c|
+---+
| 3|
| 4|
+---+
SELECT c FROM number1 MINUS SELECT c FROM number2;
+---+
| c|
+---+
| 3|
| 4|
+---+
SELECT c FROM number1 EXCEPT ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 3|
| 3|
| 4|
+---+
SELECT c FROM number1 MINUS ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 3|
| 3|
| 4|
+---+
INTERSECT
INTERSECT 和 INTERSECT ALL 返回在两个关系中都存在的行。INTERSECT(或者说 INTERSECT DISTINCT)只返回不重复的行,而 INTERSECT ALL 不会从结果行中移除重复项。
语法
[ ( ] relation [ ) ] INTERSECT [ ALL | DISTINCT ] [ ( ] relation [ ) ]
示例
(SELECT c FROM number1) INTERSECT (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 2|
+---+
(SELECT c FROM number1) INTERSECT DISTINCT (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 2|
+---+
(SELECT c FROM number1) INTERSECT ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 2|
| 2|
+---+
UNION
UNION 和 UNION ALL 返回在任一关系中存在的行。UNION(或者说 UNION DISTINCT)只返回不重复的行,而 UNION ALL 不会从结果行中移除重复项。
语法
[ ( ] relation [ ) ] UNION [ ALL | DISTINCT ] [ ( ] relation [ ) ]
示例
(SELECT c FROM number1) UNION (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 3|
| 5|
| 4|
| 2|
+---+
(SELECT c FROM number1) UNION DISTINCT (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 3|
| 5|
| 4|
| 2|
+---+
SELECT c FROM number1 UNION ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 3|
| 1|
| 2|
| 2|
| 3|
| 4|
| 5|
| 1|
| 2|
| 2|
+---+