How to set the different execution engine in Hive with examples

Execution engine in Hive

Execution Engine used to communicate with Hadoop daemons such as Name node, Data nodes, and job tracker to execute the Hive query on top of Hadoop file system. It executes the execution plan created by the compiler.

Different types of Execution engine in Hive

Hive queries can run on three different kinds of execution engines and those are listed below

  • Map Reduce
  • Tez
  • Spark

Previously the default execution engine is Map Reduce(MR) in Hive. Now
Apache Tez replaces MapReduce as the default Hive execution engine. We can choose the execution engine by using the SET command as SET hive.execution.engine=tez;

If you want to change the execution engine for all the queries, you need to override the hive.execution.engine property in hive-site.xml file.

Map Reduce (MR)

If we choose the execution engine as MR, the query will be submitted as map reduce jobs. The number of mapper and reducers will be assigned and it will run in a traditional distributed way.

TEZ execution engine

Apache Tez is application framework that build on top of Hadoop Yarn.
It is used for building high performance batch and interactive data processing applications. Tez improves query performance by using the expressions of directed acyclic graphs (DAGs) and data transfer primitives. It is an alternate of the traditional Mapreduce design in Hadoop.

Spark execution engine

Spark execution engine is faster engine for running queries on Hive. It is used for large scale data processing. It overcomes the performance issue that are faced by MR and Tez engines.

Example to set the execution engine in Hive

Lets write the hive queries in a file and set the execution engine only for that query.We have written the below queries in the test.hql file. Here we are using variable ${database} and setting the hive execution engine as tez. While we execute the queries, we need to pass the value for the variable using –hivevar option.

Execution and output

Since the queries are stored in a file, we need to use hive -f option as below to execute queries.Also we are using –hivevar option to pass the value to the database variable

hive -f <file_name> –hivevar <variable_name=value>

The hive queries are running in the Tez engine as we set the execution engine as Tez in the file.

Recommended Articles