Pages

mercredi 30 janvier 2013

grep multiple-lines with keywords and the memory problem

1) In order to get several lines before or after the matching keyword. GREP with the option of -A -B can well do with it.  -A for after and -B for before.

For example, to find the text which are  5 lines before the line of keyword, and 20 lines after the keyword:

set keyword = "Distance"
grep -B 5  -A 20   $keyword    filename

2) if we use grep for matching/processing files, sometime we find that we use a lot of memories. This is true when we have grep over a large disk containing a lot of files. From the debian website, I found some useful information:


grep uses a DFA algorithm to perform regexp matching. This DFA algorithm
is either implemented in grep, or in the libc (when re_search is used).
Which DFA algorithm is used depends on the version of grep and on the grep
options.

The DFA algorithm is a state machine and each time it is used, the
automaton which represents the regexp may use more memory because a new
transition is investigated.
The memory allocated for the automaton is not freed after each line is
parsed, but it is kept so that if a transition path is also used in a
later line, the processing will be faster. Thus more and more memory will be used.

To reduce the usage of memory, we can use the option -F. 

Aucun commentaire:

Enregistrer un commentaire