Skip to content

Beider_morse phonetic encoder silently fails when languageset not specified #26771

@bkazez

Description

@bkazez

Elasticsearch version: Version: 5.6.0, Build: 781a835/2017-09-07T03:09:58.087Z, JVM: 1.8.0_144

Plugins installed: ["analysis-phonetic"]

JVM version: java version "1.8.0_144"

OS version : 16.7.0 Darwin Kernel Version 16.7.0 (OS X 10.12.6)

Description of the problem including expected versus actual behavior: Beider-Morse encoding fails silently (returns original string as token) if the languageset is not specified.

Steps to reproduce:

curl -XPUT 'http://localhost:9200/phonetictest?pretty' -d'{
  "settings": {
    "analysis": {
      "filter": {
        "beider_morse_filter": { 
          "type":    "phonetic",
          "encoder": "beider_morse",
          "name_type": "generic"
        }
      },
      "analyzer": {
        "my_beider_morse": {
          "tokenizer": "standard",
          "filter":    "beider_morse_filter" 
        }
      }
    }
  }
}'

curl -XGET 'http://localhost:9200/phonetictest/_analyze?pretty&analyzer=my_beider_morse' -d'ABADIAS'

Incorrectly returns:

{
  "tokens" : [
    {
      "token" : "ABADIAS",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

Expected token list based on the current BMPM PHP code at http://stevemorse.org/phoneticinfo.htm :

abadias abadia abadios abadio abodias abodia abodios abodio abYdias abYdios avadias avadios avodias avodios obadias obadia obadios obadio obodias obodia obodios obodio obYdias obYdios ovadias ovadios ovodias ovodios Ybadias Ybadios Ybodias Ybodios YbYdias YbYdios abadiaS abadioS abodiaS abodioS obadiaS obadioS obodiaS obodioS

Similar failures occurred with all other attempts.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions