perl - mailserver log filtering -
I have a list of multi-GB messenger log files and ~ 350k message IDs. I want to pull out large log file lines with IDs from a long list ... and I want to sharpen it now ... I currently do it in Perl:
#! Use / usr / bin / perl warnings; Opening file with #list - Unique ID open id more than 350k; MyLog_ID; @lista_id = & lt; ID & gt; Closed id; Chomp @lista_id; Open log, malog; # While - External currency is out of memory (& lt; LOG & gt;) {$ wiersz = $ _; My @virus_split = partition ('', $ wiersz); # Foreach (@lista_id) {$ id = $ _; If #Id is the 6th column in Merral ($ wiersz_split [5] eq $ id) {# Print on Completion Match - STDOUT or File or Anything Could Print "@VirusSpplit \ n"; }}} Closed log; It works but it is slow ... compared with the list of each line ID from the log. Should I use databases and join in a way? Or compare substrings?
There are a number of tools for log analysis - e.g. Pflogsumm ... but it summarizes just as it will be fast but useless and I'll use it after filtering my log file
grep -c "status = sent" maillog .. this is for pflogsumm etc. - just the variable is moving.
Any suggestions?
------------------- - UPDATE -------------------
Thank you Dallahlen, I have successfully ({code> @lista_id ) with it:
if ($ lista_id_hash exists {$ wiersz_split [5]}) { Print "$ wiersz"; } Where % lista_id_hash is a hash table where the keys are items taken from my id list. It works superfast to filter interesting logs to 4x6 GB log file & gt; Processing with 350k id takes less than 1 minute.
Use a hash
My% is known; Known for @lista_id {$ _} = 1; # ... while (& lt; & gt;) {# ... Determine ID ($ known {$ id}) {# processing line}; }; PS If your log is large, then you might be better off with sharing such as the last two letters of $ 25 (or 36 ** 2?) Small files a poor man Like a map of, at a time, the number of IDs stored in memory will also be reduced (i.e., when you are processing maillog.split.cf, you should just end the ID in "CF" in the hash).
Comments
Post a Comment