We use a large and growing number of self-developed alert rules for our customers, which we manage and develop further in an internal git repository via gitlab. For quality assurance in the continuous integration process, we still need a way to test the alert rules automatically.
The idea is to check whether each alert rule triggers on the necessary events and behaves as expected in borderline cases. Very similar to unit testing in software development, just for alert rules instead of source code.
Our idea so far is as follows:
- Connect an up-to-date LogPoint as a virtual machine as a QA system to our Director environment
- Create a snapshot of the "freshly installed" state
- Restore the snapshot via script from the gitlab CI pipeline
- Use the Director API to add a repo, routing policy, normalizer policy, processing policy for the different log types
- Use the Director API to add a device and syslog collector with the corresponding processing policy for each log type
- Use the Director API with our deployment script to deploy all alert rules
- For each alert rule, there is then a formal test specification that uses another script to send predefined log events with current timestamps to the logpoint system and check them against the expected triggering behavior of the enabled alert rules in the specification
- The CI pipeline status is set to "passed" or "failed" accordingly
Are there any ready-made approaches here, or recommendations on how to implement the above?