Ideas: How to automatically (unit) test alert rules?

  • 26 April 2022
  • 4 replies
  • 224 views

Userlevel 4
Badge +8

We use a large and growing number of self-developed alert rules for our customers, which we manage and develop further in an internal git repository via gitlab. For quality assurance in the continuous integration process, we still need a way to test the alert rules automatically.

The idea is to check whether each alert rule triggers on the necessary events and behaves as expected in borderline cases. Very similar to unit testing in software development, just for alert rules instead of source code.

Our idea so far is as follows:

  • Connect an up-to-date LogPoint as a virtual machine as a QA system to our Director environment
  • Create a snapshot of the "freshly installed" state
  • Restore the snapshot via script from the gitlab CI pipeline
  • Use the Director API to add a repo, routing policy, normalizer policy, processing policy for the different log types
  • Use the Director API to add a device and syslog collector with the corresponding processing policy for each log type
  • Use the Director API with our deployment script to deploy all alert rules
  • For each alert rule, there is then a formal test specification that uses another script to send predefined log events with current timestamps to the logpoint system and check them against the expected triggering behavior of the enabled alert rules in the specification
  • The CI pipeline status is set to "passed" or "failed" accordingly

Are there any ready-made approaches here, or recommendations on how to implement the above?


4 replies

Userlevel 4
Badge +7

Sounds like a very cool idea - at least I don’t have anything prepared to share unfortunately, but we are looking at something somewhat similar for demos.

But I thought I’d mention two things that might help/speed things up - firstly, at least for partners we can probably make the Logfaker plugin available that can pipe synthetic logs into LogPoint via Syslog on localhost. It sould be possible to upload those files to LogPoint using SSH and a partner account, and that might be easier than actually having to generate the log messages on another system.

And secondly, perhaps LogPoint sync could be of use too, rather than having to use Director - you could export and import a pretty comprehensive “default” LogPoint configuration from a JSON file, and it might even be possible to replace certain blocks of text with custom-generated JSON. I don’t have much experience with the Director API, so I don’t know which approach would be easier. But I thought I’d mention both ideas in case they help.

Userlevel 4
Badge +8

Hello Nils,

But I thought I’d mention two things that might help/speed things up - firstly, at least for partners we can probably make the Logfaker plugin available that can pipe synthetic logs into LogPoint via Syslog on localhost. It sould be possible to upload those files to LogPoint using SSH and a partner account, and that might be easier than actually having to generate the log messages on another system.

This sounds like a great idea!

Is this Logfaker able to send specific logs, or is it sending just randomly generated logs looking like a windows operating system for example?

 

To build a consistent and reproducable test case, we need to define the exact input into and the expected output of the logpoint.

We currently manage our alert rules in JSON, so my idea was to simply extend the configuration parameters of each rule by a set of testing configuration parameters which contain an array of test cases, the log events for each test case, and, if more than 1 event, a delay between the sent events etc.

Something like this:

{
"config": {
"enabled": true,
"log_types": [
"ThreatIntel"
],
"attack_tags": [],
"sources": [],
"knowledgebase": [],
"authors": [],
"lists": {}
},
"rule": {
// [...]
},
"tests": {
"TEST_THREAT_INTELLIGENCE_SCORE_70": {
"events": [
{ "delay": 0, "event": "..." }
],
"behaviour": {
"triggers": true,
"timeout": 60 // depending on search interval
}
},
"TEST_THREAT_INTELLIGENCE_SCORE_69": {
"events": [
{ "delay": 0, "event": "..." }
],
"behaviour": {
"triggers": false,
"timeout": 60 // depending on search interval
}
}
}
}

 

Userlevel 4
Badge +7

Logfaker sends specific log files that are uploaded to it - they can be prefixed with the IP address of the device as well, so for all intents and purposes it looks like it came from that device “outside”. Inside the log file certain placeholder can be used to generate an up-to-date timestamp in the log.

I’ll send you an email to discuss further.

As for the alert definition, if you are already using something similar through Directory then that is probably the better idea.

Userlevel 4
Badge +8

Logfaker sends specific log files that are uploaded to it - they can be prefixed with the IP address of the device as well, so for all intents and purposes it looks like it came from that device “outside”. Inside the log file certain placeholder can be used to generate an up-to-date timestamp in the log.

We will have a look at it. The best solution would be to have the test-input as “near” as possible to the alert definition, so the information is not spread over many files. So my idea was to have the test definition in the same file as the alert definition.

As for the alert definition, if you are already using something similar through Directory then that is probably the better idea.

We currently use the Director API for this, because we are deploying our rules to multiple 10s of systems. Although the Director API with its token timeout and asynchronous REST API is cumbersome to use, using alert pak files was no option, as the repo selection is ignored and all rules are then running on all available repos.

Thus we developed a very comprehensive program, which manages the alert rules including static lists via the Director API.

 

 

Reply