Tuesday, 19 August 2008

awk: finding the count of a field

OK, so I had this file that had different types of events in it and I wanted to find the count of how many occurences there were for each event type. The file was comma separated and the events were at $1 (the first field) and looked like this:

TRAF:5
TRAF:8
TRAF:3

Here's the awk command that got the result I was after:

awk -F, '{ te[$1]++ } END { for ( i in te ) print i" : " te[i] }' traf.test

Here's what it's doing:

The file delimiter flag '-F,' we're already familiar with. This tells awk that the file is comma separated.

Now, in the next bit we're introducing awk arrays for the first time: "te[$1]++"

We're creating an array called 'te', you can call this anything you like. We're creating an index in our array based on the contents of $1 (the first field) which is our traffic event type. The double-plus signs are saying that the value of te[$1] is to be incremented. So, what happens is that an array index is created for every unique value found in $1. That means that we've captured all the different possibilities of $1 with out doing much work at all. When awk finds another example of that same index value ($1) it increments the value of that array component.

Once we've gone through the whole file we get to the END section of the code. Here we're seeing a for loop for the first time:

for (i in te) print i" : " te[i]

This loop iterates through the array and prints out the index of the array (i) and then the value of that component. We end up with a print out of each unique value found at $1 and then a count of the occurences of that value.

2 comments:

Rob Berkes said...

thanks a bunch, that was exactly what I was looking for and very well explained!
Rob

Unknown said...

this is great! i was trying to do this in perl but it's a little bit painful and then i discovered awk and your post. with 2 lines of code i can count the errors in my log files so i can see the trends.
thank you!