How to create hourly partitions in Hive table

Hourly partitions in Hive table

When we have large quantities of data, we look for partition column to improve the query execution time. In some cases, we may get a requirement to load the data every day or hour. For that, we can create a daily or hourly partition to improve the query processing in Hive. In this article, we will create hourly partition and insert the data into a hive table.

Create table with hourly partition in Hive

In this example, we are going to load every day transaction data with the time interval of 4 hour. So the partition columns are date and hour. Lets create the customer_txn table with partitions (txn_date and hour) as below

Create table with hourly partition in Hive
Create table with hourly partition in Hive

Alter table to add hourly partition in Hive

Consider that we need to load the data for the date ‘2021-07-12’ and hour is 4 AM. Lets add the partitions for the same in the customer_txn table.

The alter statement has created the hourly partition as below

Alter table to add hourly partition in Hive
Alter table to add hourly partition in Hive

Insert the data into hourly partition in Hive

Here we are using insert statement to load the data into hive table. Also the partition columns such as txn_date and hour are included in the insert statement.

After inserting the values, Lets check the data in the customer_txn table.

Select values from hourly partitioned table in hive
Select values from hourly partitioned table in hive

Get the location of the partitions in Hive

Using the show table statement, we can get the hdfs location of the partitioned table as below

The output of the show table extended statement provided the location of partitions with other file level details(number of files, file size)

Drop the partitions in Hive

Lets say that we want to drop the partitions for the particular hour. This is can be done using the Alter table statement. As we inserted the data for the hour 4, we can drop this partition from the customer_txn table as below.

It will remove the directory of txn_date=2021-07-12/hour=4 in the hdfs location of this table.

Recommended Articles