创建函数

描述

CREATE FUNCTION 语句用于在 Spark 中创建临时或永久函数。临时函数的作用域为会话级别，而永久函数则在持久化目录中创建并对所有会话可用。USING 子句中指定的资源在首次执行时可供所有执行器使用。除了 SQL 接口，Spark 还允许用户使用 Scala、Python 和 Java API 创建自定义的用户定义标量函数和聚合函数。请参阅标量 UDF 和UDAF 以获取更多信息。

语法

CREATE [ OR REPLACE ] [ TEMPORARY ] FUNCTION [ IF NOT EXISTS ]
    function_name AS class_name [ resource_locations ]

参数

OR REPLACE

如果指定，则重新加载函数的资源。这主要用于获取函数实现的任何更改。此参数与 IF NOT EXISTS 互斥，不能同时指定。
TEMPORARY

指示所创建函数的作用域。当指定 TEMPORARY 时，创建的函数在当前会话中有效且可见。对于此类函数，目录中不会创建持久条目。
IF NOT EXISTS

如果指定，则仅在函数不存在时创建。如果指定函数已存在于系统中，则函数创建成功（不抛出错误）。此参数与 OR REPLACE 互斥，不能同时指定。
函数名

指定要创建的函数的名称。函数名可以选择用数据库名进行限定。

语法: [ database_name. ] function_name
类名

指定提供要创建函数的实现的类的名称。实现类应扩展以下基类之一：
- 应扩展 org.apache.hadoop.hive.ql.exec 包中的 UDF 或 UDAF。
- 应扩展 org.apache.hadoop.hive.ql.udf.generic 包中的 AbstractGenericUDAFResolver、GenericUDF 或 GenericUDTF。
- 应扩展 org.apache.spark.sql.expressions 包中的 UserDefinedAggregateFunction。
资源位置

指定包含函数实现及其依赖项的资源列表。

语法: USING { { (JAR | FILE | ARCHIVE) resource_uri } , ... }

示例

-- 1. Create a simple UDF `SimpleUdf` that increments the supplied integral value by 10.
--    import org.apache.hadoop.hive.ql.exec.UDF;
--    public class SimpleUdf extends UDF {
--      public int evaluate(int value) {
--        return value + 10;
--      }
--    }
-- 2. Compile and place it in a JAR file called `SimpleUdf.jar` in /tmp.

-- Create a table called `test` and insert two rows.
CREATE TABLE test(c1 INT);
INSERT INTO test VALUES (1), (2);

-- Create a permanent function called `simple_udf`. 
CREATE FUNCTION simple_udf AS 'SimpleUdf'
    USING JAR '/tmp/SimpleUdf.jar';

-- Verify that the function is in the registry.
SHOW USER FUNCTIONS;
+------------------+
|          function|
+------------------+
|default.simple_udf|
+------------------+

-- Invoke the function. Every selected value should be incremented by 10.
SELECT simple_udf(c1) AS function_return_value FROM test;
+---------------------+
|function_return_value|
+---------------------+
|                   11|
|                   12|
+---------------------+

-- Created a temporary function.
CREATE TEMPORARY FUNCTION simple_temp_udf AS 'SimpleUdf' 
    USING JAR '/tmp/SimpleUdf.jar';

-- Verify that the newly created temporary function is in the registry.
-- Please note that the temporary function does not have a qualified
-- database associated with it.
SHOW USER FUNCTIONS;
+------------------+
|          function|
+------------------+
|default.simple_udf|
|   simple_temp_udf|
+------------------+

-- 1. Modify `SimpleUdf`'s implementation to add supplied integral value by 20.
--    import org.apache.hadoop.hive.ql.exec.UDF;
  
--    public class SimpleUdfR extends UDF {
--      public int evaluate(int value) {
--        return value + 20;
--      }
--    }
-- 2. Compile and place it in a jar file called `SimpleUdfR.jar` in /tmp.

-- Replace the implementation of `simple_udf`
CREATE OR REPLACE FUNCTION simple_udf AS 'SimpleUdfR'
    USING JAR '/tmp/SimpleUdfR.jar';

-- Invoke the function. Every selected value should be incremented by 20.
SELECT simple_udf(c1) AS function_return_value FROM test;
+---------------------+
|function_return_value|
+---------------------+
|                   21|
|                   22|
+---------------------+

Spark SQL 指南

创建函数

描述

语法

参数

示例

相关语句