How to connect Hive using JDBC connection?

Hive JDBC

Hive allows the applications to connect to it using the JDBC driver. JDBC driver uses Thrift to communicate with the Hive Server. Here Thrift is a software framework that allows application to access Hive remotely. HiveServer is built on Apache Thrift (http://thrift.apache.org/), therefore it is sometimes called as Thrift server. To connect Hive using JDBC, we need the below details

  • Hive JDBC URL
  • Hive JDBC driver
  • Kerberos config and keytab file details (if the Hadoop cluster is enabled with Kerberos authentication)

Hive JDBC url

For a secure cluster, Hive JDBC url is defined in the below format. The client needs to have a valid Kerberos ticket in the ticket cache before connecting.

jdbc:hive2://<host>:<port>/<db>;principal=<Server_Principal_of_HiveServer2>

  • Host: The host name or IP address of the server hosting HiveServer2.
  • Port: The port for the connection to the HiveServer2 instance. We can find the port number in the hive-site.xml file which is located in /etc/hive/conf. In this example, we have used port number 10005.

If the cluster is secure with kerberos authentication enabled, we can find the principal value in the hive-site.xml.

Hive JDBC driver

The Hive JDBC Driver enables users to connect with live Hive data, directly from any applications that support JDBC connectivity. For java application, we need to provide the Hive JDBC jar file which is specific to that hive version.

If it is a maven project, we just need to add the below dependency in the pom.xml. It will be downloaded automatically by maven.

Otherwise we can directly download from repository website and add the same as dependency in the project. Please find the jars for each hive version in hortonworks repository.

Kerberos config and keytab file details

Kerberos is an authentication protocol which helps user to prove its identity to the various services. It stores configuration information in a file named as krb5.conf. The Hadoop user must have a Kerberos principal or keytab to obtain Kerberos credentials to be allowed to access the cluster and use the Hadoop services.

krb5.conf – The krb5.conf file contains Kerberos configuration information, including the locations of KDCs and admin servers for the Kerberos realms of interest, defaults for the current realm and for Kerberos applications, and mappings of hostnames onto Kerberos realms. Here is an example of kerberos config file

Keytab file – The user keytab file is created with the name of <username>.keytab. The principal name is added in this file and we have to use klist -kt <key_tab_file_name> command to see the values of this file.

Example principal : [email protected]

Hive JDBC connection with Kerberos authentication using Java

Let’s write a simple java program to connect Hive using Kerberos authentication. In this program, we are going to get the number of records of a Hive table.

Output:

Table contains 2 rows