[Logwatch-Devel] Proposal/Suggestion for filter REs

Bjorn L. bl_logwatch at mblmail.net
Fri Dec 16 13:01:49 MST 2005


logwatch at mikecappella.com wrote:

> I would like to propose that sample (sanitized) log entries are provided
> with each filter and patches, perhaps as a comment within the if-then
> clauses.  The comment should minimally indicate platform, and software
> version, something like:
> 
>   LOG: Fedora3: Postfix 2.2.6: ... myhost postfix/smtpd[11128]: disconnect
> from unknown[111.222.111.222]

I don't mind log collection, but I would prefer it not be included in
the source, for two reasons:

a) It just clutters it.
b) People might think that the filter corresponds to the log entry.
    The source code of the application generating the log entries is the
    only reliable place to find out how the log entries are generated.

In summary, I think that log entries are useful to find where in the
source the log entry is created, but that it is dangerous to rely
on just log entries in building REs (regular expressions).  My
concern is that people will tune the RE to the log entry.

Because the above sounds a bit cryptic, I'll elaborate on it, using
examples from sendmail, which underwent a major cleanup earlier this
year.

1.) A lot of the existing code, and many submissions, are based on
   a developer's log entries.  But for many services, log messages
   are built by a program field by field, and not all are included,
   and parameters change.

   By looking at the source code, you can see that when email
   is delivered and a to= statement is printed, only the stat
   field is always printed.  Many others, including ctrladdr,
   delay, xdelay, mailer, pri, relay, ntries, etc., are not
   always printed.

   Pitfall:
   I'm guessing that the first person built an RE filter
   based on their logs, using many of the above fields.
   Subsequently, more REs were added, each with a different
   combination of fields, and sometimes setting different
   variables.

2.) Sometimes log messages include a cause and an action.
   Usually you want to match and report on the cause, not the
   action.

   An example is how sendmail handles undeliverable email.
   There are different options (such as return to sender,
   sender notify, postmaster notify, DSN) that show up as
   radically different log entries, so it might not be
   readily obvious that they are related to the same cause.

   Pitfall:
   We wound up with different groupings based on error
   reporting, not underlying causes, depending on how
   sendmail was configured for the different people
   writing REs. We had similar situations when the
   the detection mechanism and error report on the same line.
   For example, check_rcpt is a detection mechanism, and so
   you want to group errors by cause, not detection.

3.) It's very easy to mask relevant errors.

   For example, at one point a whole category of ruleset=
   statements were ignored.

   Pitfall:
   I assume someone had a ruleset= entry that indeed was
   not important, but it might not have been obvious at
   the time that other important errors were reported
   with the same mechanism.

4.) Multi-line sequences may not be obvious.

   Some errors may sometimes yield more than one error
   entry, and therefore all but one can sometimes be
   ignored.  The trick is finding which is the one that
   is always there, and therefore should not be ignored,
   and which ones should be ignored, so are not double-
   counted.

5) Just by looking at the log lines, it might not be
   obvious if it is a common message, or one that can
   only occur given the custom configuration of a user.


More information about the Logwatch-Devel mailing list