[[analysis-hunspell-tokenfilter]] === Hunspell Token Filter Basic support for hunspell stemming. Hunspell dictionaries will be picked up from a dedicated hunspell directory on the filesystem (`/hunspell`). Each dictionary is expected to have its own directory named after its associated locale (language). This dictionary directory is expected to hold a single `*.aff` and one or more `*.dic` files (all of which will automatically be picked up). For example, assuming the default hunspell location is used, the following directory layout will define the `en_US` dictionary: [source,js] -------------------------------------------------- - conf |-- hunspell | |-- en_US | | |-- en_US.dic | | |-- en_US.aff -------------------------------------------------- Each dictionary can be configured with one setting: `ignore_case`:: If true, dictionary matching will be case insensitive (defaults to `false`) This setting can be configured globally in `elasticsearch.yml` using * `indices.analysis.hunspell.dictionary.ignore_case` or for specific dictionaries: * `indices.analysis.hunspell.dictionary.en_US.ignore_case`. It is also possible to add `settings.yml` file under the dictionary directory which holds these settings (this will override any other settings defined in the `elasticsearch.yml`). One can use the hunspell stem filter by configuring it the analysis settings: [source,js] -------------------------------------------------- { "analysis" : { "analyzer" : { "en" : { "tokenizer" : "standard", "filter" : [ "lowercase", "en_US" ] } }, "filter" : { "en_US" : { "type" : "hunspell", "locale" : "en_US", "dedup" : true } } } } -------------------------------------------------- The hunspell token filter accepts four options: `locale`:: A locale for this filter. If this is unset, the `lang` or `language` are used instead - so one of these has to be set. `dictionary`:: The name of a dictionary. The path to your hunspell dictionaries should be configured via `indices.analysis.hunspell.dictionary.location` before. `dedup`:: If only unique terms should be returned, this needs to be set to `true`. Defaults to `true`. `longest_only`:: If only the longest term should be returned, set this to `true`. Defaults to `false`: all possible stems are returned. NOTE: As opposed to the snowball stemmers (which are algorithm based) this is a dictionary lookup based stemmer and therefore the quality of the stemming is determined by the quality of the dictionary. [float] ==== Dictionary loading By default, the default Hunspell directory (`config/hunspell/`) is checked for dictionaries when the node starts up, and any dictionaries are automatically loaded. Dictionary loading can be deferred until they are actually used by setting `indices.analysis.hunspell.dictionary.lazy` to `true`in the config file. [float] ==== References Hunspell is a spell checker and morphological analyzer designed for languages with rich morphology and complex word compounding and character encoding. 1. Wikipedia, http://en.wikipedia.org/wiki/Hunspell 2. Source code, http://hunspell.sourceforge.net/ 3. Open Office Hunspell dictionaries, http://wiki.openoffice.org/wiki/Dictionaries 4. Mozilla Hunspell dictionaries, https://addons.mozilla.org/en-US/firefox/language-tools/ 5. Chromium Hunspell dictionaries, http://src.chromium.org/viewvc/chrome/trunk/deps/third_party/hunspell_dictionaries/