How to count the number of lines in a HDFS file?
Contents
WC command
wc(word count) command is used in Linux/Unix to find out the number of lines,word count,byte and character count in a file. It can also be combine with pipes for counting number of lines in a HDFS file.
Print the number of lines in Unix/Linux
wc -l
The wc command with option -l will return the number of lines present in a file. We can combine this command with the hadoop command to get the number of lines in a HDFS file.
Count the number of lines in a HDFS file
Method 1:
hdfs dfs -cat| wc -l
If we combine the wc -l along with hdfs dfs -cat command,it will return the number of lines in a HDFS file.
Example:
hdfs dfs -cat /apps/revisit/employee_part12-0001 | wc -l
12893
Method 2:
hdfs dfs -text| wc -l
hadoop fs -text command takes a source file and outputs the file in text format. The allowed formats are zip and TextRecordInputStream.
Example:
hdfs dfs -text /apps/revisit/customer_20190611060814.txt | wc -l
1672
Recommended Articles