Process ti is used in Logpoint to use the search time enrichment functionality from the threat intelligence database. This can be done by using | process ti(field_name)
in the query.
Example:
source_address=* | process ti(source_address)
This query will enrich all the logs which match the source_address field with the ip_address column present in the threat_intelligence table. This source_address to ip_address (and other) mappings can be configured from the threat intelligence plugin's configuration settings.
The search query above takes a large amount of time to complete if there are a large number of logs present within the search time frame. We can optimize queries by filtering logs before piping to process ti() part in the query.
Example:
norm_id=* source_address=* | process ti(destination_address)
This query can be optimized by using another query:
norm_id=* source_address=* destination_address=* | process ti(destination_address)
This query will select only the logs that have destination_address in them, as it would make no sense to match logs that do not have destination_address, using destination_address with threat intelligence database.
Although this does some optimization, for a large number of search results this query also takes time and hence further optimization can be done to complete this query in a small amount of time.
For this optimization, first, we need to store the corresponding column from threat_intelligence table which needs to be matched with the actual query result into a dynamic list. For example:- If we need to map IP field like source_address or destination_address, we can run the following query to populate a dynamic List (e.g MAN_TI List) with ip_address field of threat intelligence database.
Table "threat_intelligence" | chart count() by ip_address | process toList("MAN_TI", ip_address)
Similarly, for another field, we can do
Table "threat_intelligence" | chart count() by field_name | process toList("Table_name", field_name)
Once we have populated data in the dynamic list we don't need to populate this until the age_limit set on Dynamic list(MAN_TI)
After the table is populated we can change the query
norm_id=* source_address=* destination_address=* | process ti(destination_address)
to
norm_id=* destination_address in MAN_TI | process ti(destination_address)
In generic form, an optimized query would be
Filter_query field_name in Dynamic_list | process ti(map_field_name)
This query will complete much faster than the original query.
Note: If there is a large number of entries in the created dynamic_list created above then we will see some error in UI reporting max_boolean_clause count reached. This would mean we need to tune a parameter in Index-Searcher / Merger / Premerger services to increase the number of boolean clauses allowed.
Parameter_name: MaxClauseCount
We need to increase this configurable parameter based on the number of rows present in the dynamic list.
This can be configured in lp_services_config.json
located at /opt/immune/storage
as follows:
{"merger":{"MaxClauseCount":<value>},"premerger":{"MaxClauseCount":<value>},"index_searcher":{"MaxClauseCount":<value>},}
After this change, we need to regenerate the config file with the following command:
/opt/immune/bin/lido /opt/immune/installed/config-updater/apps/config-updater/regenerateall.sh
Try increasing the value of MaxBooleanCount continuously until the UI search does not throw an error and the search completes smoothly. A good rule of thumb would be to start with double the number of entries in the dynamic list.
This optimization can have some repercussions as well:
In the case of search_head-data_node configuration if bandwidth between these two instances is low, then we can get issues in the search pipeline as the query that needs to be transferred from search_head to data_node would increase in size after this optimization. If the bandwidth is not very low, then this optimization will not have such a significant impact on the system and it would increase search performance significantly.