New in KB: Optimization of Process ti() based queries

  • 19 April 2022
  • 0 replies
  • 238 views

  • Anonymous
  • 0 replies

Process ti is used in Logpoint to use the search time enrichment functionality from the threat intelligence database. This can be done by using | process ti(field_name) in the query.

Example:

source_address=* | process ti(source_address)  

This query will enrich all the logs which match the source_address field with the ip_address column present in the threat_intelligence table. This source_address to ip_address (and other) mappings can be configured from the threat intelligence plugin's configuration settings. 

 

The search query above takes a large amount of time to complete if there are a large number of logs present within the search time frame. We can optimize queries by filtering logs before piping to process ti() part in the query.

Example:

norm_id=* source_address=* | process ti(destination_address)

This query can be optimized by using another query:

norm_id=* source_address=* destination_address=* | process ti(destination_address)

This query will select only the logs that have destination_address in them, as it would make no sense to match logs that do not have destination_address, using destination_address with threat intelligence database.

Although this does some optimization, for a large number of search results this query also takes time and hence further optimization can be done to complete this query in a small amount of time.

 

For this optimization, first, we need to store the corresponding column from threat_intelligence table which needs to be matched with the actual query result into a dynamic list. For example:-  If we need to map IP field like source_address or destination_address, we can run the following query to populate a dynamic List (e.g MAN_TI List) with ip_address field of threat intelligence database.
 

Table "threat_intelligence" | chart count() by ip_address  | process toList("MAN_TI", ip_address) 

Similarly, for another field, we can do

Table "threat_intelligence" | chart count() by field_name | process toList("Table_name", field_name)

Once we have populated data in the dynamic list we don't need to populate this until the age_limit set on Dynamic list(MAN_TI)

After the table is populated we can change the query

norm_id=* source_address=* destination_address=* | process ti(destination_address)

to

norm_id=* destination_address in MAN_TI  | process ti(destination_address)

In generic form, an optimized query would be

Filter_query field_name in Dynamic_list | process ti(map_field_name)

 

This query will complete much faster than the original query.

 

Note: If there is a large number of entries in the created dynamic_list created above then we will see some error in UI reporting max_boolean_clause count reached. This would mean we need to tune a parameter in Index-Searcher / Merger / Premerger services to increase the number of boolean clauses allowed.

Parameter_name: MaxClauseCount

We need to increase this configurable parameter based on the number of rows present in the dynamic list.

This can be configured in lp_services_config.json located at /opt/immune/storage as follows:

{"merger":{"MaxClauseCount":<value>},"premerger":{"MaxClauseCount":<value>},"index_searcher":{"MaxClauseCount":<value>},}

After this change, we need to regenerate the config file with the following command:

/opt/immune/bin/lido /opt/immune/installed/config-updater/apps/config-updater/regenerateall.sh

Try increasing the value of MaxBooleanCount continuously until the UI search does not throw an error and the search completes smoothly. A good rule of thumb would be to start with double the number of entries in the dynamic list.

This optimization can have some repercussions as well:

In the case of search_head-data_node configuration if bandwidth between these two instances is low, then we can get issues in the search pipeline as the query that needs to be transferred from search_head to data_node would increase in size after this optimization. If the bandwidth is not very low, then this optimization will not have such a significant impact on the system and it would increase search performance significantly.


0 replies

Be the first to reply!

Reply