How to execute HiveQL queries using Shell script?

Contents

Introduction

Shell script can be used to run the Hive queries in batch mode. It will handle the input values/arguments , execution of queries and errors during the execution. In this tutorial, we will write the hive queries in a file. Then the file name is given as argument to the shell script.

Also we are using hive -f option and hivevar option for executing the hive queries using shell script

Hive query to run in batch mode

Lets consider that we have a below Hive query to run in batch mode. This is saved in a file with the name of input_hive_query.q. The query is inserting the customer order details into product_order_master table. Here the source table is customer_orders which present in the cust_DB database.

Also the variables ${database} and {run_date} are used in the Hive query. We need to set the values for these variables during the execution.

USE ${database};

insert overwrite table  product_order_master
select cust_id,product_id,product_name,price
from cust_DB.customer_orders
where order_date ={run_date};

Shell script to run the Hive query

The below shell script has three functions

  • main () – The execution of the script is starts from here. It calls the other two functions one by one.
  • setup() – It will process the arguments that we pass while executing the script. Also it set the values for the variables ( run_date & database ).
  • run() – This function execute the hive query which is present in the file. The below command is used for the execution
hive --hivevar database=product_DB --hivevar run_date=2021-07-25 -v -f input_hive_query.q

We saved the below shell script in the name of execute_hive.sh.

#!/bin/bash

usage="$(basename $0) -f <script> -d <batch_date>"

# function setup to process the arguments and set the values for the variables
setup() {
    unset BATCH_DATE
    unset HIVEQL_FILE
    while getopts "d:f:" arg
    do
        case ${arg} in
            d)
                BATCH_DATE=$(date --date="${OPTARG}" +%Y-%m-%d)
                if [[ $? -ne 0 ]]
                then
                    echo "Invalid batch date ${BATCH_DATE}"
                    echo "${usage}"
                    exit 1
                fi
                HIVEVAR_LIST="${HIVEVAR_LIST} --hivevar run_date=${BATCH_DATE}"
                ;;
            f)
                HIVEQL_FILE=${OPTARG}
                if [[ ! -f ${OPTARG} ]]
                then
                    echo "File does not exist"
                    echo "${HIVEQL_FILE}"
                    exit 1
                fi

                HIVEVAR_DATABASE="product_DB"
                HIVEVAR_LIST="${HIVEVAR_LIST} --hivevar database=${HIVEVAR_DATABASE}"
                ;;

            \?)
                echo "Invalid option: -${OPTARG}"
                echo "${usage}"
                exit 1
                ;;

       esac
    done

    if [[ -z ${HIVEQL_FILE} ]]
    then
        echo "Missing hiveql file"
        echo "${usage}"
        exit 1
    fi

    NOW=$(date +%Y%m%d%H%M%S)
    BASE_LOG_DIR="/x/home/user/revisit_class"
    LOG_FILE="${BASE_LOG_DIR}/hive_query.${NOW}.log"
# -v means --verbose, Verbose mode (echo executed SQL to the console)
    HIVE_VERBOSE="-v"
}

# function run to execute the hive query
run() {
    echo "hive ${HIVEVAR_LIST} ${HIVE_VERBOSE} -f ${HIVEQL_FILE} 2>&1" | tee -a ${LOG_FILE}
    hive ${HIVEVAR_LIST} ${HIVE_VERBOSE} -f ${HIVEQL_FILE} 2>&1 | tee -a ${LOG_FILE}
    if [[ ${PIPESTATUS[0]} -ne 0 ]]
    then
        echo "   ********** Hive execution failed at `date` "
        exit 1
    else
        echo "   ********** Hive executed successfully at `date`."
    fi
}

main() {
    setup "$@"
    run
}

# Execution starts from here
main "$@"

Command to execute the shell script

We need two arguments to execute our shell script execute_hive.sh .

  • HiveQL file name – The file name input_hive_query.q is given as a first argument with the name of -f.
  • Batch date – The batch date is given as second argument with the name of -d.

sh execute_hive.sh -f input_hive_query.q -d ‘2021-07-25’

After we executing this command, the shell script process the arguments. Then it called hive command to execute the hive query. If there is any error in the execution, it will write the errors in the log file => /x/home/user/revisit_class/hive_query.${NOW}.log.

Recommended Articles