pattern matching and processing awk 'pattern {action}' filename reads one line at a time from file, checks for pattern match, performs action if pattern matched pattern NR is a special awk variable meaning the line number of the current record can use a line number, to select a specific line, by comparing it to NR (for example: NR == 2) can specify a range of line numbers (for example: NR == 2, NR == 4) can specify a regular expression, to select all lines that match $n are special awk variables, meaning the value of the nth field (field delimiter is space or tab) $0 is the entire record can use field values, by comparing to $n (for example: $3 == 65) every line is selected if no pattern is specified instructions print - print line(s) that match the pattern, or print fields within matching lines print is default if no action is specified there are many, many instruction, including just about all C statements with similar syntax other instructions will be covered in future courses examples, using the file cars from page 654 of "A Practical Guide to Linux" awk 'NR == 2, NR == 4' cars - print the 2nd through 4th lines (default action is to print entire line) awk '/chevy/' cars - print only lines matching regular expression, same as grep 'chevy' cars awk '{print $3, $1}' cars - print third and first field of all lines (default pattern matches all lines) awk '/chevy/ {print $3, $1}' cars - print third and first fiield of lines matching regular expression awk '$3 == 65' cars - print only lines with a third field value of 65 awk '$5 < = 3000' cars - print only lines with a fifth field value that is less than or equal to 3000 the file testfile can be used with the following examples: awk '{print $1}' testfile - print first field of every record awk '{print $3 $1}' testfile awk '{print $3, $1}' testfile - inserts output field separator (variable OFS, default is space) awk -F, '{print $2}' testfile - specifies that , is input field separator, default is space or tab awk '$2 ~ /[0-9]/ {print $3, $1}' testfile - searches for reg-exp (a digit) only in the second field awk '{printf "%-30s%20s\n", $3, $2}' testfile - print 3rd field left-justified in a 30 character field, 2nd field right-justified in a 20 character field, then skip to a new line (required with printf) awk '$3 <= 23' testfile - prints lines where 3rd field has a value <= 23 awk '$3 <='$var1' {print $3}' testfile - $var1 is a shell variable, not an awk variable, e.g. first execute: var1=23 awk '$3<='$2' {$3++} {print $0}' testfile - if field 3 <= argument 2 then increment field 3, e.g. first execute: set xxx 23 awk '$3> 1 && $3 < 23' testfile - prints lines where 3rd field is in range 1 to 23 awk '$3 < 2 || $3 > 4' testfile - prints lines where 3rd field is outside of range 2 to 4 awk '$3 < "4"' testfile - double quotes force string comparison NF is an awk variable meaning # of fields in current record awk '! (NF == 4)' testfile - lines without 4 fields NR is an awk variable meaning # of current record awk 'NR == 2,NR==7' testfile - range of records from record number 2 to 7 BEGIN is an awk pattern meaning "before first record processed" awk 'BEGIN {OFS="~"} {print $1, $2}' testfile - print 1st and 2nd field of each record, separated by ~ END is an awk pattern meaning "after last record processed" awk '{var+=$3} END {print var}' testfile - sum of 3rd fields in all records awk '{var+=$3} END {print var/NR}' testfile - average of 3rd fields in all records - note that awk handles decimal arithmetic awk '$5 > var {var=$5} END {print var}' testfile - maximum of 5th fields in all records awk '$5 > var {var=$5} END {print var}' testfile - maximum of 5th fields in all records sort -rk5 testfile | awk 'NR==1 {var=$5} var==$5 {print $0}' - print all records with maximum 5th field Simple awk operations involving functions within the command line: awk '/chevy/' cars # Match lines (records) that contain the keyword chevy note that chevy is a regular expression... awk '{print $3, $1}' cars # Pattern not specified - therefore, all lines (records) for fields 3 and 1 are displayed # Note that comma (,) between fields represents delimiter (ie. space) awk '/chevy/ {print $3, $1}' cars # Similar to above, but for chevy awk '/^h/' cars # Match cars that begin with h awk '$1 ~ /^h/' cars ### useful ### # Match with field #1 that begins with h awk '$1 ~ /h/' cars # Match with field #1 any epression containing the letter h awk '$2 ~ /^[tm]/ {print $3, $2, "$" $5}' cars # Match cars that begin with t or m and display field 3 (year), field 2 (model name) and then $ followed by field 4 (price) -------------------------------------------------------------------------------------------------- Complex awk operations involving functions within the command line: awk ‘/chevy/ {print $3, $1}’ cars # prints 3rd & 1st fields of record containing chevy awk ‘$1 ~ /^c/ {print $2, $3}’ cars # print 2nd & 3rd fields of record with 1st field beginning with c awk ‘NR==2 {print $1, $4}’ cars # prints 1st & 4th fields of record for record #2 awk ‘NR==2, NR==8 {print $2, $3}’ cars # prints 2nd & 3rd fields of record for records 2 through 8 awk ‘$3 >= 65 {print $3, $1}’ cars # prints 3rd & 1st fields of record with 3rd field >= 65 awk ‘$5 >= “2000” && $5 < “9000” {print $2, $3}’ cars # prints 2nd & 3rd fields of record within range of 2000 to under 9000 ----- mount -p|grep -v STAG|awk '{printf"mkdir -p %s\nmount -F vxfs %s %s\n",$3,$1,$3}' More useful AWK --------------------- string="test" awk 'BEGIN { logical=5 physical=11 for(run=1;run<11;run++) printf"vxdg -g disk_group -o override adddisk '"$string"'%02d= c2t20d%ds2\n", logical++,physical++ exit}' #| sh ----------------- Searching Apache logs and grouping URLS based on their column postion ($11 - column 11 (referrals)) grep osmapapi jad.txt | awk '{ print $11 }' | sort | nawk '{split($0,url,"/") ; if ( url[3] != last ) {print cnt, last;cnt=1} last=url[3];cnt++ }'| sort -rn|more ----------- How to transpose a variable length comma-delimited row. ------ /tmp/a.dat DG1 1,2,3,4 DG2 10,14,5,6 nawk '{ for(i=1;i> output_file.txt\n",$1}' < /Volumes/Charles05/Temp/ctx_sl.txt | sh Better single pass way - but I get a "Egrep: Reg expression too long" error on the mac ---- awk -F, '{ search=sprintf("%s|%s", search,$1) } END { comm=sprintf("egrep -rli %c%s%c csl >> output_file.txt\n",34,search,34);print comm}' < ctx_sl.txt |sh CSV file format ------ very simple (last value does not have a comma):- string1, string2, stringn, last_string