I have a distributed system with dedicated collectors. Now, during setup and configuring a few hundred linux servers via rsyslog to send their logs to one collector, the collector suddenly stopped pushing the data further to the data node. I’ve rebooted the collector, which resulted in temporary relief, however after roughly two hours, the problem resurfaced.
Using tcpdump on the collector I can see logs streaming in from the log sources (tcp/514), and also on the data node I see openvpn-udp traffic comming in from the IP of the collector - not sure however if in latter I see only tunnel keepalive traffic, as the packages are very small 65-103 byte.
I don’t have a clue on how to understand what is going on and where to look at - seems some resource problem to me, in terms of memory or cpu the system is well equipped and bored :-).
The thing is, when doing a search over all repos with "collected_at"="myCollectorName" I don’t get any data at all. As if the thing would not exist.
Do you have ideas how to analyze this better?