Partitioning is the way to dividing the table based on the key columns and organize the records in a partitioned manner. It is nothing but a directory that contains the chunk of data.
In Hive, the table is stored as files in HDFS. If we specify the partitioned columns in the Hive DDL, it will create the sub directory within the main directory based on partitioned columns.
Create a partitioned Hive table
CREATE TABLE Customer_transactions (
txn_amout DECIMAL(38, 2),
PARTITIONED BY (txn_date STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
The table Customer_transactions is created with partitioned by Transaction date in Hive.Here the main directory is created with the table name and Inside that the sub directory is created with the txn_date in HDFS.
Insert values to the partitioned table in Hive
By default the hive.exec.dynamic.partition.mode is set to strict, then we need to do at least one static partition. In non-strict mode, all partitions are allowed to be dynamic.Since we got the below error while insert the record, we changed the dynamic partition mode to nonstrict.
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
Show partitions in Hive
Lets check the partitions for the created table customer_transactions using the show partitions command in Hive
Partitioned directory in the HDFS for the Hive table
The sub directory has created under the table name for the partitioned columns in HDFS as below
hdfs dfs -ls /apps/hive/warehouse/cust.db/customer_transactions/*
Found 1 items
-rwxrwxrwx 3 revisit_user hdfs 24 2019-06-20 11:45 /apps/hive/warehouse/cust.db/customer_transactions/txn_date=2019-04-14/000000_0.deflate
Found 1 items
-rwxrwxrwx 3 revisit_user hdfs 23 2019-06-20 11:46 /apps/hive/warehouse/cust.db/customer_transactions/txn_date=2019-05-12/000000_0.deflate