DESCRIBE TABLE
描述
DESCRIBE TABLE
语句返回表的基本元数据信息。元数据信息包括列名、列类型和列注释。可以选择指定分区规范或列名,以分别返回与分区或列相关的元数据。
语法
{ DESC | DESCRIBE } [ TABLE ] [ format ] table_identifier [ partition_spec ] [ col_name ] [ AS JSON ]
参数
-
格式
指定描述输出的可选格式。如果指定了
EXTENDED
或FORMATTED
,则会返回额外的元数据信息(如父数据库、所有者和访问时间)。此外,如果指定了EXTENDED
或FORMATTED
,则可以通过在语句末尾指定AS JSON
来以 JSON 格式返回元数据。 -
表标识符
指定表名,该表名可以选择性地使用数据库名进行限定。
语法:
[ database_name. ] table_name
-
分区规范
一个可选参数,用于指定分区键值对的逗号分隔列表。指定后,会返回额外的分区元数据。
语法:
PARTITION ( partition_col_name = partition_col_val [ , ... ] )
-
列名
一个可选参数,用于指定需要描述的列名。提供的列名可以选择性地进行限定。参数
partition_spec
和col_name
相互排斥,不能同时指定。目前不允许指定嵌套列。目前不支持针对单个列的 JSON 格式。
语法:
[ database_name. ] [ table_name. ] column_name
-
AS JSON
一个可选参数,用于以 JSON 格式返回表元数据。仅在指定
EXTENDED
或FORMATTED
格式时支持(两者生成等效的 JSON)。语法:
[ AS JSON ]
Schema
以下是完整的 JSON Schema。在实际输出中,空字段会被省略,并且 JSON 不会进行美化打印(参见示例)。
{ "table_name": "<table_name>", "catalog_name": "<catalog_name>", "schema_name": "<innermost_namespace_name>", "namespace": ["<namespace_names>"], "type": "<table_type>", "provider": "<provider>", "columns": [ { "name": "<name>", "type": <type_json>, "comment": "<comment>", "nullable": <boolean>, "default": "<default_val>" } ], "partition_values": { "<col_name>": "<val>" }, "partition_columns": ["col1", "col2"], "clustering_columns": ["col1", "col2"], "location": "<path>", "view_text": "<view_text>", "view_original_text": "<view_original_text>", "view_schema_mode": "<view_schema_mode>", "view_catalog_and_namespace": "<view_catalog_and_namespace>", "view_query_output_columns": ["col1", "col2"], // Spark SQL configurations captured at the time of permanent view creation "view_creation_spark_configuration": { "conf1": "<value1>", "conf2": "<value2>" }, "comment": "<comment>", "table_properties": { "property1": "<property1>", "property2": "<property2>" }, "storage_properties": { "property1": "<property1>", "property2": "<property2>" }, "serde_library": "<serde_library>", "input_format": "<input_format>", "output_format": "<output_format>", "num_buckets": <num_buckets>, "bucket_columns": ["<col_name>"], "sort_columns": ["<col_name>"], "created_time": "<yyyy-MM-dd'T'HH:mm:ss'Z'>", "created_by": "<created_by>", "last_access": "<yyyy-MM-dd'T'HH:mm:ss'Z'>", "partition_provider": "<partition_provider>", "collation": "<default_collation>" }
以下是
<type_json>
的 Schema 定义
Spark SQL 数据类型 | JSON 表示 |
---|---|
ByteType | { "name" : "tinyint" } |
ShortType | { "name" : "smallint" } |
IntegerType | { "name" : "int" } |
LongType | { "name" : "bigint" } |
FloatType | { "name" : "float" } |
DoubleType | { "name" : "double" } |
DecimalType | { "name" : "decimal", "precision": p, "scale": s } |
StringType | { "name" : "string", "collation": "<collation>" } |
VarCharType | { "name" : "varchar", "length": n } |
CharType | { "name" : "char", "length": n } |
BinaryType | { "name" : "binary" } |
BooleanType | { "name" : "boolean" } |
DateType | { "name" : "date" } |
VariantType | { "name" : "variant" } |
TimestampType | { "name" : "timestamp_ltz" } |
TimestampNTZType | { "name" : "timestamp_ntz" } |
YearMonthIntervalType | { "name" : "interval", "start_unit": "<start_unit>", "end_unit": "<end_unit>" } |
DayTimeIntervalType | { "name" : "interval", "start_unit": "<start_unit>", "end_unit": "<end_unit>" } |
ArrayType | { "name" : "array", "element_type": <type_json>, "element_nullable": <boolean> } |
MapType | { "name" : "map", "key_type": <type_json>, "value_type": <type_json>, "value_nullable": <boolean> } |
StructType | { "name" : "struct", "fields": [ {"name" : "field1", "type" : <type_json>, “nullable”: <boolean>, "comment": “<comment>”, "default": “<default_val>”}, ... ] } |
示例
-- Creates a table `customer`. Assumes current database is `salesdb`.
CREATE TABLE customer(
cust_id INT,
state VARCHAR(20),
name STRING COMMENT 'Short name'
)
USING parquet
PARTITIONED BY (state);
INSERT INTO customer PARTITION (state = 'AR') VALUES (100, 'Mike');
-- Returns basic metadata information for unqualified table `customer`
DESCRIBE TABLE customer;
+-----------------------+---------+----------+
| col_name|data_type| comment|
+-----------------------+---------+----------+
| cust_id| int| null|
| name| string|Short name|
| state| string| null|
|# Partition Information| | |
| # col_name|data_type| comment|
| state| string| null|
+-----------------------+---------+----------+
-- Returns basic metadata information for qualified table `customer`
DESCRIBE TABLE salesdb.customer;
+-----------------------+---------+----------+
| col_name|data_type| comment|
+-----------------------+---------+----------+
| cust_id| int| null|
| name| string|Short name|
| state| string| null|
|# Partition Information| | |
| # col_name|data_type| comment|
| state| string| null|
+-----------------------+---------+----------+
-- Returns additional metadata such as parent database, owner, access time etc.
DESCRIBE TABLE EXTENDED customer;
+----------------------------+------------------------------+----------+
| col_name| data_type| comment|
+----------------------------+------------------------------+----------+
| cust_id| int| null|
| name| string|Short name|
| state| string| null|
| # Partition Information| | |
| # col_name| data_type| comment|
| state| string| null|
| | | |
|# Detailed Table Information| | |
| Database| default| |
| Table| customer| |
| Owner| <TABLE OWNER>| |
| Created Time| Tue Apr 07 22:56:34 JST 2020| |
| Last Access| UNKNOWN| |
| Created By| <SPARK VERSION>| |
| Type| MANAGED| |
| Provider| parquet| |
| Location|file:/tmp/salesdb.db/custom...| |
| Serde Library|org.apache.hadoop.hive.ql.i...| |
| InputFormat|org.apache.hadoop.hive.ql.i...| |
| OutputFormat|org.apache.hadoop.hive.ql.i...| |
| Partition Provider| Catalog| |
+----------------------------+------------------------------+----------+
-- Returns partition metadata such as partitioning column name, column type and comment.
DESCRIBE TABLE EXTENDED customer PARTITION (state = 'AR');
+------------------------------+------------------------------+----------+
| col_name| data_type| comment|
+------------------------------+------------------------------+----------+
| cust_id| int| null|
| name| string|Short name|
| state| string| null|
| # Partition Information| | |
| # col_name| data_type| comment|
| state| string| null|
| | | |
|# Detailed Partition Inform...| | |
| Database| default| |
| Table| customer| |
| Partition Values| [state=AR]| |
| Location|file:/tmp/salesdb.db/custom...| |
| Serde Library|org.apache.hadoop.hive.ql.i...| |
| InputFormat|org.apache.hadoop.hive.ql.i...| |
| OutputFormat|org.apache.hadoop.hive.ql.i...| |
| Storage Properties|[serialization.format=1, pa...| |
| Partition Parameters|{transient_lastDdlTime=1586...| |
| Created Time| Tue Apr 07 23:05:43 JST 2020| |
| Last Access| UNKNOWN| |
| Partition Statistics| 659 bytes| |
| | | |
| # Storage Information| | |
| Location|file:/tmp/salesdb.db/custom...| |
| Serde Library|org.apache.hadoop.hive.ql.i...| |
| InputFormat|org.apache.hadoop.hive.ql.i...| |
| OutputFormat|org.apache.hadoop.hive.ql.i...| |
+------------------------------+------------------------------+----------+
-- Returns the metadata for `name` column.
-- Optional `TABLE` clause is omitted and column is fully qualified.
DESCRIBE customer salesdb.customer.name;
+---------+----------+
|info_name|info_value|
+---------+----------+
| col_name| name|
|data_type| string|
| comment|Short name|
+---------+----------+
-- Returns the table metadata in JSON format.
DESC FORMATTED customer AS JSON;
{"table_name":"customer","catalog_name":"spark_catalog","schema_name":"default","namespace":["default"],"columns":[{"name":"cust_id","type":{"name":"integer"},"nullable":true},{"name":"name","type":{"name":"string"},"comment":"Short name","nullable":true},{"name":"state","type":{"name":"varchar","length":20},"nullable":true}],"location": "file:/tmp/salesdb.db/custom...","created_time":"2020-04-07T14:05:43Z","last_access":"UNKNOWN","created_by":"None","type":"MANAGED","provider":"parquet","partition_provider":"Catalog","partition_columns":["state"]}