Elasticsearch Analyzers – Custom Analyzer

In this tutorial, we’re gonna look at way to create an Elasticsearch Customer Analyzer.

I. Custom Analyzer

A Custom Analyzer is combination of:
character filters (optional) -> tokenizer -> token filters (optional)

In accordance with these components, it has following parameters:
char_filter (optional): array of built-in or customised character filters.
tokenizer (required): built-in or customised tokenizer (Word Oriented Tokenizers + Partial Word Tokenizers + Structured Text Tokenizers)
filter (optional): array of built-in or customised token filters.
position_increment_gap (optional): when indexing an array of text values, Elasticsearch inserts a fake “gap” between the last term of one value and the first term of the next value to ensure that a phrase query doesn’t match two terms from different array elements. Defaults to 100.

For example, with array "titles": [ "Java Sample Approach", "Java Technology"], the “gap” between term approach and term java is position_increment_gap.

II. Example

We will create a Customer Analyzer that can:
– replace ^^ with _happy_ and T_T with _sad_ using Mapping Character Filter
– split on punctuation characters using Pattern Tokenizer
– lowercase token text using Lowercase Token Filter
– use the pre-defined list of English stop words using Stop Token Filter

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "jsa_custom_analyzer": {
          "type": "custom",
          "char_filter": [
            "emoticons"
          ],
          "tokenizer": "punctuation",
          "filter": [
            "lowercase",
            "english_stop"
          ]
        }
      },
      "tokenizer": {
        "punctuation": {
          "type": "pattern",
          "pattern": "[ .,!?]"
        }
      },
      "char_filter": {
        "emoticons": {
          "type": "mapping",
          "mappings": [
            "^^ => _happy_",
            "T_T => _sad_"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "jsa_custom_analyzer",
  "text": "we're ^^ because of javasampleapproach, and you?"
}

Terms:

[ we're, _happy_, because, javasampleapproach, you ]

3 thoughts on “Elasticsearch Analyzers – Custom Analyzer”

  1. Great ?I should definitely pronounce, impressed with your website. I had no trouble navigating through all tabs and related info ended up being truly simple to do to access. I recently found what I hoped for before you know it in the least. Reasonably unusual. Is likely to appreciate it for those who add forums or anything, site theme . a tones way for your client to communicate. Excellent task..

Leave a Reply

Your email address will not be published. Required fields are marked *