How to execute a Select query on BigQuery using Java?

Contents

BigQuery

BigQuery is a fully managed and serverless datawarehouse system. It is part of Google Cloud Platform. It can process the massive amounts of data and provide the result quickly. In this tutorial, we are going to read the tables from BigQuery dataset using Java program. For this example, we are using the below SELECT query.

SELECT commit, author, repo_name
FROM `bigquery-public-data.github_repos.commits`
WHERE subject like '%bigquery%'
ORDER BY subject DESC LIMIT 10

BigQuery Job

The unit of work in BigQuery is called as job. It can be load data, export data, query data, or copy data. If the job is created programmatically, BigQuery execute it asynchronously and can be polled for the status.

Prerequisite to run a BigQuery job using Java

  • Project Id : To use the Google Cloud Platform, we need to create a project in GCP. Then GCP gives the project id which is a unique string used to differentiate our project from all others in Google Cloud.
  • Service Account : A service account is a Google Account that is associated with our Google Cloud project. To access the BigQuery API using program/application, we need to create a service account. Then to access our project, we need to grant the certain roles to that service account.
  • Service Account Key: For the service account, we need to create a key which is used in the program as a service account credential. BigQuery verify the client’s identity using the service account key.

Java program to execute a Select query on BigQuery:

Access BigQuery using Java
Access BigQuery using Java

To access the BigQuery, we need to install Google Cloud BigQuery Client libraries in our program. So we are creating the maven project as below.

Step 1: Set the client libraries in pom.xml

<!--  Using libraries-bom to manage versions.
See https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM -->
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>25.2.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-bigquery</artifactId>
  </dependency>
</dependencies>

Step 2 : Import the Google Cloud BigQuery libraries in the program

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FieldValueList;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobId;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableResult;
import java.util.UUID;

Step 3 : Set the Service Account key credential

As we mentioned earlier, key should be created for the service account. From GCP console, we can download that key in json file format. The same json file is used in the program to set the credential. The path of the json file is given in the code.

// Path of the Service Account key file
String jsonPath = "D:\\Learning\\GCP\\my-rcs-project-833123-ef45632b1b12.json";

//Set the Service account key as credential
Credentials credentials = GoogleCredentials
                .fromStream(new FileInputStream(jsonPath));

Step 4 : Initialize a BigQuery client

In this step, we are initializing the BigQuery client using our project id and service account credential.

//Initialize BigQuery Client by setting project id and credential
BigQuery bigquery = BigQueryOptions.newBuilder()
                .setProjectId("my-rcs-project-833123")
                .setCredentials(credentials)
                .build().getService();

Step 5 : Define the query with a QueryJobConfiguration

Next we need to define our query in the QueryJobConfiguration as below. In this example, we are querying from bigquery public dataset bigquery-public-data.github_repos.commits .

QueryJobConfiguration queryConfig =
                QueryJobConfiguration.newBuilder(
                                "SELECT commit, author, repo_name "
                                        + "FROM `bigquery-public-data.github_repos.commits` "
                                        + "WHERE subject like '%bigquery%' "
                                        + "ORDER BY subject DESC LIMIT 10")
                        .build();

Step 6 : Start a Bigquery job

In this step, we are creating a job id and starting the BigQuery job with the BigQuery.create() method. The QueryJobConfiguration is passed to this method.

// Create a job ID so that we can safely retry.
JobId jobId = JobId.of(UUID.randomUUID().toString());
Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

Step 7 : Wait for the query to complete

// Wait for the query to complete.
queryJob = queryJob.waitFor();

Step 8 : Check the errors

The below code used to check the errors in the BigQuery job.

// Check for errors
if (queryJob == null) {
    throw new RuntimeException("Job no longer exists");
} else if (queryJob.getStatus().getError() != null) {
    // You can also look at queryJob.getStatus().getExecutionErrors() for all
    // errors, not just the latest one.
    throw new RuntimeException(queryJob.getStatus().getError().toString());
}

Step 9 : Get the results

 // Get the results.
TableResult result = queryJob.getQueryResults();

Step 10 : Print the results

Here we are iterating the each row and printing the same in the output screen.

// Print all pages of the results.
for (FieldValueList row : result.iterateAll()) {
    // String type
    String commit = row.get("commit").getStringValue();
    // Record type
    FieldValueList author = row.get("author").getRecordValue();
    String name = author.get("name").getStringValue();
    String email = author.get("email").getStringValue();
    // String Repeated type
    String repoName = row.get("repo_name").getRecordValue().get(0).getStringValue();
    System.out.printf(
            "Repo name: %s Author name: %s email: %s commit: %s\n", repoName, name, email, commit);
}

Complete program

import com.google.api.client.util.Lists;
import com.google.auth.Credentials;
import com.google.auth.oauth2.GoogleCredentials;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FieldValueList;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobId;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableResult;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.UUID;

public class QueryManager {
    public static void main(String[] args) throws InterruptedException, IOException {

        // Path of the Service Account key file
        String jsonPath = "D:\\Learning\\GCP\\my-rcs-project-833123-ef45632b1b12.json";

        //Set the Service account key as credential
        Credentials credentials = GoogleCredentials
                .fromStream(new FileInputStream(jsonPath));

        //Initialize BigQuery Client by setting project id and credential
        BigQuery bigquery = BigQueryOptions.newBuilder()
                .setProjectId("my-rcs-project-833123")
                .setCredentials(credentials)
                .build().getService();

        // Define the query with a QueryJobConfiguration
        QueryJobConfiguration queryConfig =
                QueryJobConfiguration.newBuilder(
                                "SELECT commit, author, repo_name "
                                        + "FROM `bigquery-public-data.github_repos.commits` "
                                        + "WHERE subject like '%bigquery%' "
                                        + "ORDER BY subject DESC LIMIT 10")
                        .build();

// Create a job ID so that we can safely retry.
        JobId jobId = JobId.of(UUID.randomUUID().toString());
        Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

// Wait for the query to complete.
        queryJob = queryJob.waitFor();

// Check for errors
        if (queryJob == null) {
            throw new RuntimeException("Job no longer exists");
        } else if (queryJob.getStatus().getError() != null) {
            // You can also look at queryJob.getStatus().getExecutionErrors() for all
            // errors, not just the latest one.
            throw new RuntimeException(queryJob.getStatus().getError().toString());
        }

// Get the results.
        TableResult result = queryJob.getQueryResults();

// Print all pages of the results.
        for (FieldValueList row : result.iterateAll()) {
            // String type
            String commit = row.get("commit").getStringValue();
            // Record type
            FieldValueList author = row.get("author").getRecordValue();
            String name = author.get("name").getStringValue();
            String email = author.get("email").getStringValue();
            // String Repeated type
            String repoName = row.get("repo_name").getRecordValue().get(0).getStringValue();
            System.out.printf(
                    "Repo name: %s Author name: %s email: %s commit: %s\n", repoName, name, email, commit);
        }
    }
}

Output

Finally we executed our Java program to read the data from BigQuery dataset bigquery-public-data.github_repos.commits. As we shown below, the program has printed the results.

BigQuery results using Java program
BigQuery results using Java program

Output in Google cloud console

Also we have verified the results of the select query in the Google cloud console. It looks good.

Select query results in Google Cloud Console
Select query results in Google Cloud Console

Recommended Articles

References