How to count the number of lines in a HDFS file?

Contents

WC command

wc(word count) command is used in Linux/Unix to find out the number of lines,word count,byte and character count in a file. It can also be combine with pipes for counting number of lines in a HDFS file.

Print the number of lines in Unix/Linux

wc -l

The wc command with option -l will return the number of lines present in a file. We can combine this command with the hadoop command to get the number of lines in a HDFS file.

Count the number of lines in a HDFS file

Method 1:

hdfs dfs -cat  | wc -l

If we combine the wc -l along with hdfs dfs -cat command,it will return the number of lines in a HDFS file.

Example:

hdfs dfs -cat /apps/revisit/employee_part12-0001 | wc -l
12893

Method 2:

hdfs dfs -text  | wc -l

hadoop fs -text command takes a source file and outputs the file in text format. The allowed formats are zip and TextRecordInputStream.

Example:

hdfs dfs -text /apps/revisit/customer_20190611060814.txt | wc -l
1672

Recommended Articles