-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Closed
Closed
Copy link
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>bugTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearch
Description
Elasticsearch version: Version: 5.6.0, Build: 781a835/2017-09-07T03:09:58.087Z, JVM: 1.8.0_144
Plugins installed: ["analysis-phonetic"]
JVM version: java version "1.8.0_144"
OS version : 16.7.0 Darwin Kernel Version 16.7.0 (OS X 10.12.6)
Description of the problem including expected versus actual behavior: Beider-Morse encoding fails silently (returns original string as token) if the languageset is not specified.
Steps to reproduce:
curl -XPUT 'http://localhost:9200/phonetictest?pretty' -d'{
"settings": {
"analysis": {
"filter": {
"beider_morse_filter": {
"type": "phonetic",
"encoder": "beider_morse",
"name_type": "generic"
}
},
"analyzer": {
"my_beider_morse": {
"tokenizer": "standard",
"filter": "beider_morse_filter"
}
}
}
}
}'
curl -XGET 'http://localhost:9200/phonetictest/_analyze?pretty&analyzer=my_beider_morse' -d'ABADIAS'
Incorrectly returns:
{
"tokens" : [
{
"token" : "ABADIAS",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
Expected token list based on the current BMPM PHP code at http://stevemorse.org/phoneticinfo.htm :
abadias abadia abadios abadio abodias abodia abodios abodio abYdias abYdios avadias avadios avodias avodios obadias obadia obadios obadio obodias obodia obodios obodio obYdias obYdios ovadias ovadios ovodias ovodios Ybadias Ybadios Ybodias Ybodios YbYdias YbYdios abadiaS abadioS abodiaS abodioS obadiaS obadioS obodiaS obodioS
Similar failures occurred with all other attempts.
Metadata
Metadata
Assignees
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>bugTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearch