How to create a external table in Hive?

Contents

External Table in Hive

When we create the table in Hive, we can define the type of a table. As per the Hive 1.2 version, It has two types of table as below.

Internal (Managed) Table :It is a default table type in Hive. It means that if we not define the table type in the Create table statement, it automatically create it as Internal Table. It is a Hive owned tables. So Hive itself manage the metadata and actual data of the table. Hive stores the actual data of the internal table in the metastore. Internal table is also called as Managed table.

External Table : Hive metastore store only the schema(metadata) of the table. Hive doesn’t manage the data of the external table. Since the data of external table is stored outside the Hive metastore, the DROP table statement remove only the metadata of the table. The original data of the table is still maintained in the given HDFS location.

In this tutorial, we will look at the external table in Hive with examples.

Syntax to Create External Table in Hive

The keyword External is used in the Create table statement to define the External table in Hive. Also we need to give the location as a HDFS path where we want to store the actual data of the table. If we not provide the location, it will store the data in the default hdfs location that is configured in the system.

In addition, we can define the row format of the table like adding the delimiter between the fields. File format is required after the Stored as keyword. Also we can mention the table properties in the Create table Statement.

CREATE EXTERNAL TABLE [IF NOT EXISTS] <Database_Name>.<Table_Name>(
Column_1 <data_Type>,
Column_2 <data_Type>,....Column_N <data_Type>)
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)];

External table example in Hive

For this example, we will create a table called courier_details as External table in Hive

CREATE EXTERNAL TABLE product_db.courier_details (
 package_id BIGINT,
 type_of_package STRING,
 from_address STRING,
 to_address STRING,
 delivery_dt DATE)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/sys/logistics/hive/daily_tables';

As we can see, we have defined the location of the table and other properties in the create table statement.

External table example in Hive
External table example in Hive

Drop the external table

We have inserted few records into the courier_details table. Lets select the rows from that table.

Select records from external table in hive
Select records from external table in hive

The table data are stored in the given hdfs location /sys/logistics/hive/daily_tables. Even though we drop this external table, we can still find the data in that hdfs location. Lets drop the table and check hdfs file.

Drop external table in Hive
Drop external table in Hive

Hive just removed the meta data of the table. So if we try to select the data in Hive, it will show the error message as ” table not found”.

Remove metadata of the external table in Hive
Remove metadata of the external table in Hive

But we can find the data in the HDFS location that is mentioned in the create table statement.

$ hdfs dfs -ls /sys/logistics/hive/daily_tables
Found 2 items
-rwxrwxrwx 3 test_user_1 hdfs 53 2021-09-12 08:03 /sys/logistics/hive/daily_tables/000000_0.deflate
-rwxrwxrwx 3 test_user_1 hdfs 49 2021-09-12 08:03 /sys/logistics/hive/daily_tables/000000_0_copy_1.deflate

Using hdfs dfs -text command, we can see the data inside the hdfs file.

$ hdfs dfs -text /sys/logistics/hive/daily_tables/000000_0.deflate
89933|Electronics|Oakland|Fontana|2021-09-11
$ hdfs dfs -text /sys/logistics/hive/daily_tables/000000_0_copy_1.deflate
76420|Medicine|Hayward|Pomona|2021-08-22

We created the external table in Hive. Then we executed the drop statement to see whether Hive is removing only metadata of the table or not. As we expected, it removed only the meta data of the table. It didn’t remove the actual data of the table in HDFS.

Recommended Articles