How to count the number of lines in a HDFS file?

WC command

wc(word count) command is used in Linux/Unix to find out the number of lines,word count,byte and character count in a file. It can also be combine with pipes for counting number of lines in a HDFS file.

Print the number of lines in Unix/Linux

The wc command with option -l will return the number of lines present in a file. We can combine this command with the hadoop command to get the number of lines in a HDFS file.

Count the number of lines in a HDFS file

Method 1:

If we combine the wc -l along with hdfs dfs -cat command,it will return the number of lines in a HDFS file.

Example:

Method 2:

hadoop fs -text command takes a source file and outputs the file in text format. The allowed formats are zip and TextRecordInputStream.

Example:

Recommended Articles