Question

Prefered Way to Fetch Logs from Azure?

  • 2 January 2024
  • 6 replies
  • 307 views

Userlevel 4
Badge +8

Hi,

 

what is LogPoint's recommended way to get logs from Azure AD / EntraID and any other Azure applications into LogPoint?

We have noticed that the Azure EventHubs sometimes provide their logs several days late via the message queue. We were able to verify this independently of LogPoint using a fetcher developed in Python.

How does this work with the "Azure Log Analytics Workspace" module? Can the logs be expected in the SIEM in a timely manner?

A delay of several hours and days is not possible with the current alert rule concept of searching on already indexed timeranges without running the alert rules on utopian high timeranges.


6 replies

Userlevel 4
Badge +7

Hi,

We are obviously beholden to events in the platform getting logged in a timely manner - due to the Eventhub fetcher being a fetcher, there is a slight delay caused by Logpoint due to the fetch interval, but Logpoint should be able to handle that (especially with the new “Delay Alert” option). But if the events aren’t there, there isn’t much that can be done about that - I haven’t heard that about EventHubs, and I don’t know what the Microsoft SLAs are for that, although I have heard that events through the Office 365 Management API can sometimes be delayed quite significantly on the Microsoft side.

Using the Log Analytics Workspace fetcher would be an option, but again Azure Monitor would be configured to send the Entra ID events to a Log Analytics Workspace instead of an EventHub - so whether that would be any more reliable is an unanswered question. The approach so far has been via EventHubs, but if the data is sent to a Log Analytics Workspace and an appropriate KQL query can be constructed to retrieve them with the Log Analytics Workspace fetcher it would be interesting to see whether events arrive there earlier and more consistently.

Most promising are probably our very latest changes to the Universal REST Fetcher that enables us to use the Graph API to query the endpoints directly - now again, there is the risk that they wouldn’t return the events to us in time either when we ask for them, but there is probably less potential of something introducing a delay on the Microsoft side. The Graph API will become an option within Q1 at a guess.

Userlevel 4
Badge +8

Hi @Nils Krumey,

 

thank you for your reply!

 

As said, this eventhub delay has nothing to do with LogPoint. It seems to be an inherent delay inside azure. The delay seems to be totally different from log message to log message. Some messages are coming just a few seconds after creation, followed by messages from the same application which where created a few days ago.

So the eventhub mechanism seems to be totally unusable for a SIEM.

As the maximum “delay alert” value is 24 hours, this doesn’t work here as well. Especially as we don’t have constant delay.

 

Best regards,

Markus

Userlevel 4
Badge +7

The intention of the “delay alert” mechanism is to set it as high as the highest delay we would ever expect if it is variable - but we are really talking about something like perhaps 15 minutes or so, so 24 hours is of course pretty much unusable.

I have not heard from anyone else that they have had THAT much of a delay through their EventHub. What do Microsoft say? I remember seeing an SLA for their Office 365 Management API and it was frightingly long, but weirdly anything I can find on EventHubs is from Chinese Microsoft pages.

There does seem to be something called “Throughput Units” and overall limits on a Namespace - is there anything else going on in that tenant that could somehow impact this, or perhaps any other EventHubs that might be eating up Throughput?

Userlevel 4
Badge +8

Microsoft itself provides some very unspecific explations for this:

https://techcommunity.microsoft.com/t5/europe-tech-talks-forum/azure-event-hubs-possible-delay-causes/m-p/3739154

 

But this issue seems to be popular:

https://stackoverflow.com/questions/40838666/getting-data-from-eventhub-is-delayed
https://community.splunk.com/t5/Getting-Data-In/Delay-during-log-ingestion-from-Azure/m-p/398711

The thing is that not all messages are delayed, so you will get a lot of messages timely, which will give you a lot of results when running your LP search even on a “last 5 or 10 minutes” timerange for example.

So maybe most people just didn’t notice that 1/3 of their messages are delyed by hours or days and thus are out of scope for their alert rules.

Userlevel 4
Badge +8

Here is a test python script (remove the .txt from the file name) you can use with your test eventhub, to see the timestamps of the incoming logs.


You just need to adjust the values in CLIENT_CREDENTIALS and install the azure-eventhub package like this:

pip install azure-eventhub

 It may take a while, but you will see it.

Userlevel 4
Badge +7

Thanks for that - not sure what to suggest, even Microsoft say in that Tech Community post “if you do not find any information from client side Microsoft can help you on this matter”, so I guess that is the only hope we have of fixing those EventHub delays.

As you said initially, EventHubs aren’t the only way of getting logs out of Azure - most services now are able to log into EventHubs, log into Log Analytics workspaces, or can be queried using the Graph API. We have had just had the internal preview of the Universal REST API Fetcher 3.0, which has the changes required to process the Graph API, but we now need to identify (and build) the API endpoints that we want to query, as well as writing/changing all the normalisers for that data. There are obviously loads of Microsoft services that we could query that way, but in my mind the Entra ID API(s), Azure Resource Monitor API and the various Defender APIs are the initial targets. Does that cover your intended use case?

It is a bit too early to share anything yet, but once we have something it’d be great to share it with you to verify whether you see the same type of delays, and to ensure that we ship the right log source template(s) for the type of data that you are looking to retrieve.

Reply