Skip to main content

We occasionally encounter cases where we cannot perform search because a particular repo has failed. In that situation, the search UI does not allow to make any searches if that repo is selected.

What does "repo failed" mean?

The "readiness" status of service responsible for searching in a particular repo (indexsearcher) is kept by central searching service (merger) in "alive" field for each repo. If the "alive" status of a repo is false in the config file of merger service, failed repo issue is seen when making search in that repo. This could happen if index searcher service for that repo is not running as expected or if the config file of the merger service is not updated according to the status of indexsearcher service.

Mitigation:

Whenever we get repo failed for a particular repo then it is always wise to check the logs for the indexsearcher service of that repo. 

tail -f -n 50 /opt/immune/var/log/service/indexsearcher_<repo_name>/current#replace <repo_name> with actual repo name. for e.g. for repo with name "Windows",the command will betail -f -n 50 /opt/immune/var/log/service/indexsearcher_Windows/current

The above command will output the last 50 logs for the indexsearcher service of the particular repo. 

You can also check if indexsearcher is replying back to alive probe with tcpdump on query.source.socket.

grep "query.source.socket" /opt/immune/etc/config/indexsearcher_<repo_name>/config.jsontcpdump -i any port <query.source.socket port> -Aq

 

If indexsearcher is alive, you should see {"isalive":true} and {"alive":true} messages

mceclip0.png

If there are no errors in the tail command and "alive":true messages are being seen in tcpdump commands but the failed repo error is still being seen with search,  try checking alive status in merger config.

grep -B1 alive /opt/immune/etc/config/merger/config.json

Potential Scenarios

  1. Indexsearcher service is recently restarted
    If an indexsearcher service for a repo is just restarted then for large repos it takes few minutes to scan metadata for stored indexes, before searches can be served. During that period, the repo failed error is observed.
  2. LogPoint machine is recently rebooted
    If a LogPoint machine is recently rebooted then the indexsearcher services take time to initialize services. During those few minutes, repo failed error can be seen for some repos. 
     
  3. Issue in the indexsearcher service
    If there is some error in the indexsearcher service, then repo failed issue does not resolve on its own. 
    In such scenarios, please review the logs of the indexsearcher service as mentioned above. It is recommended to create a support ticket in such scenarios for further investigation and resolution of the problem. It will be helpful to include the service log of indexsearcher service of that repo in the ticket. The log file is located at
    /opt/immune/var/log/service/indexsearcher_<repo_name>/current

Hi Cintia,

 

thanks for sharing this! Just a small format-correction: the first `grep`/`tcpdump` statement (right above the screenshot) is missing a line break ;-)

 

Best regards, Reinhard


Thanks a lot Reinhard 🙂 I am going to make sure we correct this :)


Thanks for writing this article, as I was panicing, searching what this means. But indeed it is likely because I just rebooted it.

But, it has been roughly 20 minutes now and my 300GB repo (maybe 20% full last I checked) still isn’t pulling up. Do you have any estimations on how long on average this reindex may take?


Hi Mike,

I have reached out to my colleague, @Matt Ellis. He will address this shortly. :) 


Hi Mike,

"Do you have any estimations on how long on average this reindex may take?" This should have taken few minutes at most.

I would recommend that you raise a support ticket as there could be another issue, related to the storage.


Reply