Zoekt tuning max file size and trigrams
Problem to solve
context (internal): https://gitlab.com/gitlab-com/enablement-sub-department/section-enable-request-for-help/-/issues/64#note_1843615608
Zoekt skips files larger than 1 MiB, whereas Advanced Search indices the first 1 MiB, but skips the rest.
Zoekt also has a trigram limit (default 20_000
) that means the file won't be indexed even if it is under the file size limit
Proposal
Zoekt offers options to tune the behavior: SizeMax
, TrigramMax
and LargeFiles
We should investigate what would be the disk usage and/or memory implications for making any change. These options could also be exposed in the Admin UI for SM customers to tune.