How to execute a Select query on BigQuery using Java?

BigQuery

BigQuery is a fully managed and serverless datawarehouse system. It is part of Google Cloud Platform. It can process the massive amounts of data and provide the result quickly. In this tutorial, we are going to read the tables from BigQuery dataset using Java program. For this example, we are using the below SELECT query.

BigQuery Job

The unit of work in BigQuery is called as job. It can be load data, export data, query data, or copy data. If the job is created programmatically, BigQuery execute it asynchronously and can be polled for the status.

Prerequisite to run a BigQuery job using Java

  • Project Id : To use the Google Cloud Platform, we need to create a project in GCP. Then GCP gives the project id which is a unique string used to differentiate our project from all others in Google Cloud.
  • Service Account : A service account is a Google Account that is associated with our Google Cloud project. To access the BigQuery API using program/application, we need to create a service account. Then to access our project, we need to grant the certain roles to that service account.
  • Service Account Key: For the service account, we need to create a key which is used in the program as a service account credential. BigQuery verify the client’s identity using the service account key.

Java program to execute a Select query on BigQuery:

Access BigQuery using Java
Access BigQuery using Java

To access the BigQuery, we need to install Google Cloud BigQuery Client libraries in our program. So we are creating the maven project as below.

Step 1: Set the client libraries in pom.xml

Step 2 : Import the Google Cloud BigQuery libraries in the program

Step 3 : Set the Service Account key credential

As we mentioned earlier, key should be created for the service account. From GCP console, we can download that key in json file format. The same json file is used in the program to set the credential. The path of the json file is given in the code.

Step 4 : Initialize a BigQuery client

In this step, we are initializing the BigQuery client using our project id and service account credential.

Step 5 : Define the query with a QueryJobConfiguration

Next we need to define our query in the QueryJobConfiguration as below. In this example, we are querying from bigquery public dataset bigquery-public-data.github_repos.commits .

Step 6 : Start a Bigquery job

In this step, we are creating a job id and starting the BigQuery job with the BigQuery.create() method. The QueryJobConfiguration is passed to this method.

Step 7 : Wait for the query to complete

Step 8 : Check the errors

The below code used to check the errors in the BigQuery job.

Step 9 : Get the results

Step 10 : Print the results

Here we are iterating the each row and printing the same in the output screen.

Complete program

Output

Finally we executed our Java program to read the data from BigQuery dataset bigquery-public-data.github_repos.commits. As we shown below, the program has printed the results.

BigQuery results using Java program
BigQuery results using Java program

Output in Google cloud console

Also we have verified the results of the select query in the Google cloud console. It looks good.

Select query results in Google Cloud Console
Select query results in Google Cloud Console

Recommended Articles

References