Notes: awk

Affichage des articles dont le libellé est awk. Afficher tous les articles

lundi 2 avril 2012

get text of between specific lines using sed

1) sed -n 'linenumber1,linenumber2p' file
sed -n '10, 20p' file

Print all the lines between 10 and 20 of a file

Similarly, if you want to print from 10 to the end of line you can use: sed -n '10,$p' filename

This is especially useful if you are dealing with a large file. Sometimes you just want to extract a sample without opening the entire file.

If use variables to define the line number, then we need use double quote and {}:
set a = 1
set b = 5
sed -n "$a, ${b}p" stations_igs_final

2) sed -n 5p file

To print a specific line (line 5) from a file

or use awk:

cat file | awk -v n=5 '{if (NR==n)print $0}'

mardi 27 mars 2012

asort: get the size of array after split in awk

1) In order to get the size of array after using split in awk:

set a = ss_zz_0.1

echo $a | awk '{split($0,qq,"_");n=asort(qq);print n '

2) to split a data file:
awk -F ':' '{print $1}' filename

jeudi 15 mars 2012

test if a string contains a substring

1)
set exist_substring = `echo $string | grep -c $substring`
if ( $exist_substring == 1 ) then
echo "$substring exists in the $string"
endif

2) ---to test if str exists in $mystring:
echo $mystring | awk '{print index($0, "str")}'" searches the $mystring variable for an occurrence of "str" in the string's value.

if exists, return 1, otherwise 0

mardi 6 mars 2012

awk: NF; NR

1) get the last column (field) of the results of grep
grep ss filename | awk '{print $NF}'

2) get the second last column:
grep ss filename | awk '{print $(NF-1)}'

3) get the number of columns (if the array is uniform):

grep ss filename | awk 'END {print NF}'

4) get the number of lines (if the array is uniform):

grep ss filename | awk 'END {print NR}'

5) get the second last line:

grep ss filename | awk 'END {print $(NR-1)}'

6) for an array in text file percentage_data, if we want to get the mean of each column:

# get the number of field in the array
set number_field = ` awk 'END{print NF}' percentage_data `

# for each column, find the mean and output the means in a new file
# averaged_percentage_MII_MI

@ j = 1
while ( $j <= $number_field)
set mean_percentage_temp = `awk -v fd=$j {print $fd}' percentage_data | awk '{sum+=$1}END{print sum/NR}'`
echo $mean_percentage_temp " " >> averaged_percentage_MII_MI

@ j += 1

end

vendredi 2 mars 2012

numeric calculation with shell or awk

1) For processing two variables with one number given to each variable

Shell language is usually not used for complex scientific calculation. But sometimes we may will use it to do some simple calculations with text files. What can we do ?

set a = 1.25E-02
set b = 3.289E-02

There are two ways to calculate numerically with these two variables in the shell language: either use the echo + bc, or the awk.
For example, when we do the addition:
1) echo "$a+$b" | /usr/bin/bc
2) echo "$a $b" | awk '{print $1+$2}'

For comparisons:
1) echo "$a > 0 & $b > 0" | /usr/bin/bc
2) echo "$a $b" | awk '{if ($1 > 0 & $2 > 0) print 1; else print 0}'
3) echo "$a" | awk '{if ($1 > 1 || $1 < -1) print 1; else print 0}'

If we have to do some complicated computation with shell, the AWK is more accurate than the echo + bc, esp when the data is the the scientific format (eg., 1.236778E-02).

2) For processing the data from different files
---- if want to substract two columns of data which are from two files file1 and file2 respectively:
cat file1 | awk '{column1=$which_column_in_file1; getline <"file2"; print column1 - $which_column_in_file2}'

jeudi 16 février 2012

reading and processing lines in a text file

For a text file with arrays is called toto:

76039011873490122462 2592.624526 6407821.687553863 3 0.2461 0.0002892 9667 74 1 10 5320
76039011873490122462 2708.024084 6277975.529578525 3 0.2461 0.0002892 9667 74 2 10 5320
76039011873490122462 2835.623772 6179282.694306660 3 0.2461 0.0002892 9667 74 3 10 5320
76039011873490122462 2916.223642 6142426.064870278 3 0.2461 0.0002892 9667 74 4 10 5320
76039011873490122462 3055.424598 6126591.671642607 3 0.2461 0.0002892 9667 74 5 10 5320
76039011873490122462 3186.624722 6167350.779549360 3 0.2461 0.0002892 9667 74 6 10 5320

1) if we want to read the file:
set file = ` awk '{print $0}' toto `

2) if we want to make some actions to each row, then

set num_lines = `cat toto | wc -l`

@ j = 1

while ( $j <= $num_lines)

set eachline = `awk -v ln=$j '{if (NR==ln) print $0}' toto`

more actions to eachline

@ j += 1

end

Attention: with foreach, we can find each element in each row, but not to get a complete line !

foreach tt ( `cat toto `)

set eachelement = $tt

end

3) to get a column of data, ie., column 4,

cat toto | awk '{print $4}'

lundi 13 février 2012

find int, output a string with a format

set a = 26400.24567

--- find the integer of the variable and then the digital part
echo $a | awk '{print $0-int($0)}'

-----multiple the digital parts with a constant 86400
echo $a | awk '{print $0-int($0)}' | awk '{print $0*86400}'

---- output with a format

echo $a | awk '{print $0-int($0)}' | awk '{print $0*86400}' | awk '{printf ( "%12.5f" ,$0)}'

"f" should not be omitted which identifies the float number, not int

test if a stri exists in another string; get last few letters

---to test if str exists in $mystring:
echo $mystring | awk '{print index($0, "str")}'" searches the $mystring variable for an occurrence of "str" in the string's value.

if exists, return 1, otherwise 0

--- to get the last few letters from a string:
awk '{ print substr( $0, length($0) - 1, length($0) ) }' input_file

split a string with special character with awk

------To split a string separated by "/", put the results in variable a, and then print the first substring:
set c = `echo "tahiti/la1_22460_mrb01" | awk '{split($0,a,"/");print a[1] }'`

vendredi 13 janvier 2012

manimulate row and columns with awk: mean, standard deviation

--To find the lines contain the keywork, then find the 5th column, then do the sum and average

grep "MRB 7822" parametres_gins | awk '{print $5}' | awk '{sum+=$1} END {print sum/NR}'

----To find the lines containing the keyword, then find the 5th column, then do the sqrt of all the data in this column

grep "MRB 7822" parametres_gins | awk '{print $5}' | awk '{sum+=$1*$1} END {print sqrt(sum/NR)}'

---- To find the number of rows of the selected text
grep "MRB 7822" parametres_gins | awk 'END {print NR}'

---- To find the standard deviation of data which are in the 3rd column of a data array in a file :
awk ‘{sum+=$3; sumsq+=$3*$3 } END {print sqrt(sumsq/NR – (sum/NR)^2)}’ data_file

---- to substract two columns of data which are from two files:
cat file1 | awk '{column1=$which_column_in_file1; getline <"file2"; print column1 - $which_column_in_file2}'

Pages