-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Closed
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokensTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearch
Description
Currently the options for ngram and shingle tokenizers/token filters allow the user to set min_size
and max_size
to any values. This is dangerous as users can set values which produces huge numbers of terms and at best bloat their index but at worst cause problems such as #25841.
I think we should add soft (and/or maybe hard) limits so that neither min_size
or max_size
can be more than say 6 and the difference between min_size
and max_size
can't be more than 2 or 3 (we may even want to make this limit 1).
Note that this does not apply to edge_ngrams
where it is useful to have higher values and a larger difference between min and max values. We should probably decide if there should be different limits here though.
Metadata
Metadata
Assignees
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokensTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearch