Solved

How to correlate more than 2 lines of logs?

  • 15 February 2022
  • 11 replies
  • 442 views

Userlevel 4
Badge +8

We have a Cisco IronPort which is analyzing emails.

Each email analysis process generates multiple lines of logs that can be related to each other by a unique id (the normalized field “message_id”).
However, i am now lacking ideas how i can correlate more than two log lines e.g. with a join.

My goal is to first search for logs where the DKIM verification failed. after that I would like to see ALL log lines that contain the same message_id as the first "DKIM" log line. The number of log lines can vary.

Here are some of my approaches, which unfortunately do not give the desired result:

[message="*DKIMVeri*"] as s1 join [device_ip=10.0.0.25] as s2 on s1.message_id = s2.message_id

This only retruns two log lines, not all matching s1.message_id = s2.message_id. Also a “right join” doesn’t work, even when the documentation indicates it.

[4 message="*DKIMVeri*" having same message_id]

“having same” needs to specify the exact amount of logs, while this information is unknown. Furthermore, a result is returned, where only the normalized fields behind the "having same" clause are further usable, not those of the found events. Also the filter “message” here breaks the whole concept.

 

Do you have any ideas how to solve the issue?

icon

Best answer by Adrien Debosschere 7 March 2022, 18:47

View original

11 replies

Userlevel 4
Badge +7

It seems to be my standard response to a lot of these types of queries so apologies - but perhaps a list would be the way to go? Or a second filter after the initial correlation/search?

Not sure I fully grasp the nature of the log messages, perhaps you could gives us some examples?

Userlevel 4
Badge +8

Hello @Nils Krumey

thank you for your quick reply!

Here is such a set of logs (11 logs in this case) which I want to correlate:

<22>Feb 14 16:29:01 mail_log_syslog: Info: Start MID 105970882 ICID 0
<22>Feb 14 16:29:01 mail_log_syslog: Info: Delivery start DCID 59057473 MID 105970882 to RID [0] to Centralized Policy Quarantine
<22>Feb 14 16:29:01 mail_log_syslog: Info: Message finished MID 105970882 done
<22>Feb 14 16:29:01 mail_log_syslog: Info: MID 105970882 ICID 0 RID 0 To: <censored@censored.com>
<22>Feb 14 16:29:01 mail_log_syslog: Info: MID 105970882 queued for delivery
<22>Feb 14 16:29:01 mail_log_syslog: Info: MID 105970882 ICID 0 From: <censored@censored.com>
<22>Feb 14 16:29:01 mail_log_syslog: Info: MID 105970882 was generated based on MID 105970871 by duplicate-quarantine filter 'DKIMVerificationFilter'
<22>Feb 14 16:29:01 mail_log_syslog: Info: MID 105970882 ready 23417 bytes from <censored@censored.com>
<22>Feb 14 16:29:01 mail_log_syslog: Info: MID 105970882 enqueued for transfer to centralized quarantine "EmailAuthQuarantine" (duplicated by content filter DKIMVerificationFilter)
<22>Feb 14 16:29:01 mail_log_syslog: Info: Message done DCID 59057473 MID 105970882 to RID [0] (centralized policy quarantine)
<22>Feb 14 16:29:01 mail_log_syslog: Info: MID 105970882 RID [0] Response 'ok: Message 1452832 accepted'

The idea is to have a table (e.g. with “fields”) to see log_ts, sender and receiver.

Userlevel 4
Badge +7

Hm. Not quite the same thing, but perhaps something like [message="*DKIMVeri*"] | process toList (failed_verification_ids, message_id) and then a separate second search just for message_id IN FAILED_VERIFICATION_IDS ? Running out of ideas on how to limit it to just the rows for a single message ID though...

Userlevel 4
Badge +8

Hello @Nils Krumey,

thanks for the advice.

But i believe that the result for a search template or alert rule must be available in a single search query.

 

Isn’t it a bug, that the right join is not returning all the elements of the right side (s2)?

According to the documentation, this could be understood as such:

Right Join

You can use a right join to return all the values from the table or stream on the right and only the common values from the table or stream on the left.

https://docs.logpoint.com/docs/search-query-language/en/latest/Search%20Query%20Language/Pattern%20Finding.html#right-join

So this query should return more than just 2 rows:

[device_ip=10.0.0.25 message="*DKIMVeri*"] as s1 right join [device_ip=10.0.0.25] as s2 on s1.message_id = s2.message_id

But it doesn’t...

Userlevel 4
Badge +7

That’s what I would think, yes. Does it behave differently if you swap the streams and do a LEFT join?

Userlevel 4
Badge +8

Hello @Nils Krumey,

 

I just tried it out.

It has the exact same result.

Userlevel 4
Badge +7

Well, that makes it slightly less likely to be a bug as we’re not normally that consistent with them :) but your actual use case aside the JOIN queries look fairly straightforward to me and I don’t know why they wouldn’t return more than two results. I might be missing something, but it’s probably a case for Support? I’ll try to find some example logs to test some JOINs with as well if I get the time. Is this on 7.0?

Userlevel 4
Badge +8

Is this on 7.0?

We are still using 6.12.2.

There are some more missing “features”  with having same and followed by.

E.g. a preceeded by would be nice too, especially for detecting attacks like Kerberoasting etc.

But these things should have their own idea or feature request.

Userlevel 4
Badge +7

Yes, I agree, and feel free to log them in the ideas portal. I raised a request for a “not followed by” a while back, but it really helps to have real life examples on why these things are useful.

Userlevel 3
Badge +8

Try with some aggregation, I had the same issue and I simply parsed the values I needed and then did some aggregation at the end as it will show all results, the join only shows the first result. 

Example:
[entityName=* risk > 70 | chart count() by entityName ] as s1 join
[user=* | chart count() by user, category, server, threat, alert, alertId, server_risk ] as s2
on s1.entityName = s2.user
| chart count() by s1.entityName, s2.category, s2.server, s2.threat, s2.alert, s2.alertId, s2.server_risk

Hello Markus,

I’ve not tested it on your log sample but the following query (with some adjustments) should do the trick:

message_id=*
| chart distinct_list(msg) as msg, distinct_list(sender) as sender, distinct_list(receiver) as receiver, max(log_ts) as log_ts by message_id
| search "*DKIMVeri*"
| rename msg as full_msg
| fields log_ts, sender, receiver, message_id, full_msg

The idea is to:

  1. Merge all events sharing the same Message ID into a “super event” while keeping some normalized fields
  2. Filter the super events we are interested in
  3. Display all relevant fields

The full_msg field is just here for debugging purposes and can be dropped.

 

Hope it helps !

 

Regards,

Adrien

Reply