Motivation
Crafting regular expressions can be a daunting task, one that puts off potential contributors, scares users, and discourages package maintainers from including their own patterns, (or results in them doing it badly).
When such macros are implemented, it will be easy to update, for example, the regex for a hostname. The changes will propigate to all of the rules. This will save a lot of work in the end.
Most of the rules used in Logcheck contain a great deal of repetition, such as the patterns used to match an IP address, hostnames, usernames, the time stamp and pid added by syslog, and many more. Amidst all of this repetition is an opportunity to automate a portion of the writing task using a simple meta-pattern syntax, replacing long, tedious regex with easy to remember "tags". This page is here to coordinate an experiment to create a tool that does just that.
Requirements
Implementation Thoughts
- Ideally we want nested macros, such that $email can be comprised of $username@$domain
- Make bigger rules from more granular pieces
- The number of "big" macros should be kept small. Eric says 20 max?
- As the list of tags/macros grows, the value of the tool, (which comes from the fact that it is smaller/simpler/easier than regex), will decrease. Taken to the extreme, at some point it would basically become a re-implementation of regex.
- One possible way to mitigate the downside of a larger list would be to optimize a smaller set for the common case. The rest of the list could be obscured or less advertised/documented, a lesson for the diligent student.
- What's better, expand at install, or expand at ever run?
- Perhaps something similar to the update-exim type thing going on in Debian where it gets run at install and the user can run it later to update?
- GNU make uses = for "expand every time" and := for "expand once". Perhaps copy this idea.
- Want to maintain compatibility with existing rules, so this is just a layer on top of logcheck, a generator. The logcheck code is untouched.
I'm fine with perl. Any objections? Does anyone know m4?
A scratch space for MacroIdeas
- Macros could also help to optimize logcheck by merging what used to be multiple rules into one regexp.
Group lines by match. i.e. all lines matching sshd\[$PID\]: Illegal user $USERNAME from $IP ->
Matched "$SYSLOG sshd\[$PID\]: Illegal user $USERNAME from $IP" 128 times: DATE PID USERNAME IP ... 1234 admin 1.2.3.4 [...]
Macro Syntax
What syntax are we using for the macros?
Miscellanea
I've expanded on my original proof-of-concept, (http://people.debian.org/~eevans), with something that works similarly but uses Template Toolkit. You can pull a copy using darcs, run "darcs get http://sym-link.com/darcs/logcheck".
- I'm not sure we really want to go this direction, but the pre-processor *could* optionally tighten rules upon generation to the particular machine on which it is running. Basically what I'm talking about is dynamic macro substiution. For example, $fqdn would expand to say "foo.logcheck.org"