Skip to main content

Hi,

I have a distributed system with dedicated collectors. Now, during setup and configuring a few hundred linux servers via rsyslog to send their logs to one collector, the collector suddenly stopped pushing the data further to the data node. I’ve rebooted the collector, which resulted in temporary relief, however after roughly two hours, the problem resurfaced. 

Using tcpdump on the collector I can see logs streaming in from the log sources (tcp/514), and also on the data node I see openvpn-udp traffic comming in from the IP of the collector - not sure however if in latter I see only tunnel keepalive traffic, as the packages are very small 65-103 byte. 

I don’t have a clue on how to understand what is going on and where to look at - seems some resource problem to me, in terms of memory or cpu the system is well equipped and bored 🙂

The thing is, when doing a search over all repos with "collected_at"="myCollectorName" I don’t get any data at all. As if the thing would not exist.

Do you have ideas how to analyze this better? 

Thx,

Peter

Hi @Peter Stumpf,

do you have “Buffering” enabled?

Did you try disabling it?

 

This sounds like an Apache Kafka producer issue.

It runs inside a docker container, so maybe there is an issue hidden in the logs inside the docker container.

I had similar issue when disk space temporarily ran out and caused the kafka container to become corrupted.

What helped was changing the mode of operation back to DLP, waiting a few minutes and setting it back to Collector mode.

This causes a clean re-build of the kafka container.

 

Best regards

Markus


Reply