Wednesday 15 October 2008

awk script to find the maximum value of many returned rows

Below is a complete script that pulls out the maximum value for two specific criteria (disk name and time) - more than one row is returned for each so for the value in $3 we had to find the maximum.

for FILE in $( ls /var/test/perf/archive/measure/input/*$1*disk2.dat.gz )
do

DATFILE=$( echo ${FILE} | awk -F'/' '{ print substr($8,1,length($8)-3) }' )
echo ${DATFILE}

cp ${FILE} .
gunzip *.gz

DATE=$( echo ${DATFILE} | awk -F'-' '{ print $4"/"$3"/"$2 }' )

FILENAME=$( echo ${DATFILE} | awk -F'-' '{ print $1 }' )

awk -v D=${DATE} -F',' '{
if ( $2 ~ /D0201/ && $3 ~ /20:00:00/) {
max201[$3] = ( max201[$3] > $6 ? max201[$3] : $6 )
}
if ( $2 ~ /D0211/ && $3 ~ /20:00:00/) {
max211[$3] = ( max211[$3] > $6 ? max211[$3] : $6 )
}
} END {
for (i in max201) print D","i",D0201,"max201[i];
for (i in max211) print D","i",D0211,"max211[i];
}' ${DATFILE} >> ${FILENAME}-disk2.csv

rm ${DATFILE}

done

---

The clever bit (for me anyway) is this:

max211[$3] = ( max211[$3] > $6 ? max211[$3] : $6 )

The bit between the brackets is saying if the value in $6 is greater than what's already stored in max211[$3] (the value of $6 indexed by column $3) then pass back the new value of $6 - which then updates max211[$3] to the new value.

To put it another way; the bit above between the brackets could be written like this:


if ( max211[$3] > $6 )
max211[$3] = max211[$3]
else
max211[$3] = $6


Or as my book tells me:

Awk provides a conditional operator that is found in the C programming language.
Its form is:


expr ? action1 : action2