regex - Perl iterate through a file to find and save lines separately accordingly to different given patterns -
i need iterate through log file (around 2mb) line line, compare each 5 different patterns , save give match in separate arrays according pattern match with. patterns presented in file following:
(text) pattern1 pattern2 (m lines of text) pattern3 (2 lines of text) pattern1 pattern2 (x lines of text) pattern3 (this continues ~50-100 times number of lines between pattern2 , pattern3 vary) ... pattern3 (5 lines of text) pattern4 (2 lines of text) pattern5 (text)
i aware lot of similar questions have been asked , answered, however, not understand code in answers. plan is:
read each line, check
pattern1
,pattern2
until first match using 2if
statements exit loop immediately.start
while (my $line = <file>)
loop, checkpattern3
, if match found, save linepattern3
, followingspattern1
,pattern2
lines (which 3rd , 4th linespattern3
)
here questions:
a. when exit while (<file>)
loop before reaches end , start while ($line = <file>)
loop after that, 2nd loop start reading top again or continue stopped?
b. can kind enough give sample commented implementation step 2?
c. how can make use of search pattern3
line pattern4
, pattern5
since match pattern4
, pattern5
fixed distance away last match of pattern3
.
d. plan more efficient using if-else
statements checks 5 patterns every line (which, eg if there 30 pattern1
's, total number of comparisons 6*30 + 4 + 5 + 5*number of lines no match)?
e. better/more efficient way solve problem? percentage of having line matched pattern around 1%.
i appreciate answer/suggestion/alternative provided. thanks
if know 5 patterns in advance, easy. you? or part of input file , unpredictable?
assuming know them in advance:
use strict; use warnings; use data::dumper; $current_pattern = 'else'; $pattern_arrays = { 'pattern1' => [], 'pattern2' => [], 'pattern3' => [], 'pattern4' => [], 'pattern5' => [], 'else' => [], }; while ( $line = <data> ) { chomp($line); # remove trailing '\n' $line # see if read 1 of our 5 patterns. remember # $current_pattern, , proceed next line. if ( $line =~ /^(pattern1|pattern2|pattern3|pattern4|pattern5)$/ ) { $current_pattern = $line; next; # jump "while...", i.e. proceed next line } # if here, have $current_pattern, 1 # of "pattern1" ... "pattern5" or "else". $current_pattern # "else" @ beginning, when haven't found # pattern yet (i.e. first line in case). # push $line array belongs $current_pattern. push @{$pattern_arrays->{$current_pattern}}, $line; } # pretty-print arrays. $data::dumper::sortkeys = 1; # sort data::dumper output keys print data::dumper->dump( [$pattern_arrays], ['pattern_arrays'] ); __data__ (text) pattern1 pattern2 (m lines of text) pattern3 (2 lines of text) pattern1 pattern2 (x lines of text) pattern3 (this continues ~50-100 times number of lines between pattern2 , pattern3 vary) ... pattern3 (5 lines of text) pattern4 (2 lines of text) pattern5 (text)
yields:
$pattern_arrays = { 'else' => [ '(text)' ], 'pattern1' => [], 'pattern2' => [ '(m lines of text)', '(x lines of text)' ], 'pattern3' => [ '(2 lines of text)', '(this continues ~50-100 times number of lines between pattern2 , pattern3 vary)', '...', '(5 lines of text)' ], 'pattern4' => [ '(2 lines of text)' ], 'pattern5' => [ '(text)', '' ] };
actually, i'm not sure if asked for. of course, instead of <data>
use other <file>
.
Comments
Post a Comment