Category: Hadoop

How to access a Hive table using Pyspark?

Pyspark Apache Spark is an in-memory data processing framework written in Scala language. It process the data 100 times faster

Create database in Hive The database is an organised collection of tables.Hive has the default database with the name as

Hive JDBC Hive allows the applications to connect to it using the JDBC driver. JDBC driver uses Thrift to communicate

External Table in Hive When we create the table in Hive, we can define the type of a table. As

Introduction Shell script can be used to run the Hive queries in batch mode. It will handle the input values/arguments

Hourly partitions in Hive table When we have large quantities of data, we look for partition column to improve the

If condition/statement in Hive Hive supports many conditional functions such as If, isnull, isnotnull, nvl, nullif, COALESCE and CASE. The

Sum() function in Hive Sum is one of the Aggregate function that returns the sum of the values of the

WebHDFS WebHDFS is a protocol which is based on an industry-standard RESTful mechanism. It provides the same functionality as HDFS,

Show databases like query in Hive Show databases or Show schemas statement is lists all the database names in Hive