Elasticsearch Simple Query String Query

In this tutorial, we’re gonna look at Elasticsearch Simple Query String Query that allows us to specify AND|OR|NOT… conditions and multi-field search within a single query string. Unlike query_string query (which is recommended for expert users only), simple_query_string query discards invalid parts of the query and never throws an exception.

I. Simple Query String Query

This is an example using simple_query_string to find out documents that have title or tags containing string that matches for query:

GET javasampleapproach/tutorial/_search
{
  "query": {
    "simple_query_string" : {
        "query": "flow +(processor | publisher) -submissionpublisher",
        "fields": [ "title^3", "tags" ],
        "default_operator": "and"
    }
  }
}

The response will be like this:

{
  ...
  "hits": {
    "total": 2,
    "max_score": 6.4622207,
    "hits": [
      {
        ...
        "_score": 6.4622207,
        "_source": {
          "title": "Java 9 Flow API example – Processor",
          ...
          "tags": [
            "java 9",
            "flow",
            "reactive programming",
            "reactive streams",
            "publisher",
            "subscriber",
            "processor"
          ]
        }
      },
      {
        ...
        "_score": 4.5613484,
        "_source": {
          "title": "Java 9 Flow API example – Publisher and Subscriber",
          ...
          "tags": [
            "java 9",
            "flow",
            "reactive programming",
            "reactive streams",
            "publisher",
            "subscriber"
          ]
        }
      }
    ]
  }
}

In the example above, we use some top level parameters:
query: flow +(processor | publisher) -submissionpublisher
=> with default_operator = and, the query shows that we want documents with string containing:
“flow” AND (“processor” OR “publisher”) and_operator (not “submissionpublisher”)

fields: [ "title^3", "tags" ]
It specifies which fields to be searched for, and how weight each field scores (= title*3 + tags).

Elasticsearch simple_query_string has strong parameters:
– query
– fields
– default_operator
– flags
– analyze_wildcard
– analyzer
– quote_field_suffix
– lenient
– minimum_should_match

Next part of this tutorial explains more details about these parameters.

II. Top Level Parameters

1. Query

query is actual query to be parsed, Elasticsearch supports the following special characters in query string:
+ : AND operation
| : OR operation
- : negates a single token
" : wraps a number of tokens to signify a phrase for searching (e.g: “java sample approach”)
* : at the end of a term to signigy a prefix query (e.g: java*)
( and ) : signify precedence
~N : after a word to signify edit distance (fuzziness)
~N : after a phrase to signify slop amount

*Note: In order to search for any of these special characters, we must escape them with \.

Default Operator

default_operator (default to OR) value is so important because it can make a big different behavior.
For example:

"query": "flow +publisher -submissionpublisher",
"default_operator": "or"

"default_operator": "or" have a different behavior from "default_operator": "and".
or operator applies to the query will tell Elasticsearch that we want to find out documents that contain “flow” or “publisher”, or documents that don’t contain “submissionpublisher”.

The response may be contain many documents that don’t relate to our intention (title: “How to integrate Angular 4 with SpringBoot RestApi” for example).

Fuzziness

Using ~N after a word to make a string fuzziness query:

GET javasampleapproach/tutorial/_search
{
  "query": {
    "simple_query_string" : {
        "query": "angila~2 firebas~1",
        "fields": [ "title" ],
        "default_operator": "and"
    }
  }
}

Response:

{
  ...
  "hits": {
    "total": 2,
    "max_score": 1.0783218,
    "hits": [
      {
        ...
        "_score": 1.0783218,
        "_source": {
          "title": "Angular 4 Firebase Quick Start",
          ...
        }
      },
      {
        ...
        "_id": "4",
        "_score": 1.0171049,
        "_source": {
          "title": "Angular 4 Firebase - CRUD Operations example",
          ...
        }
      }
    ]
  }
}
2. Fields

fields is used to specify field array to perform the parsed query. Defaults to *. So if we don’t add fields parameter to the request query, Elasticsearch will automatically attempt to determine the existing fields in the index’s mapping that are queryable, and perform the search on those fields.

We can also use pattern based field names on fields parameter, or the weight each field scores with ^ symbol:

GET javasampleapproach/tutorial/_search
{
  "query": {
    "simple_query_string" : {
        "query": "flow +(processor | publisher) -submissionpublisher",
        "fields": [ "t*, description^3" ],
        "default_operator": "and"
    }
  }
}

Elasticsearch knows that t* matches with title and tags field, and score calculated from description field will mutiply with 3.

3. FLags

flags specifies which parsing features should be enabled (ALL, NONE, AND, OR, NOT, PREFIX, PHRASE, PRECEDENCE, ESCAPE, WHITESPACE, FUZZY, NEAR, and SLOP). Defaults to ALL.

For using multi flags, we just add | between 2 flags:

GET javasampleapproach/tutorial/_search
{
  "query": {
    "simple_query_string" : {
        "query": "flow+(processor|react*)",
        "fields": [ "title" ],
        "default_operator": "AND",
        "flags" : "AND|PRECEDENCE|OR|PREFIX"
    }
  }
}

Response:

GET javasampleapproach/tutorial/_searc
{
  ...
  "hits": {
    "total": 2,
    "max_score": 1.3594923,
    "hits": [
      {
        ...
        "_score": 1.3594923,
        "_source": {
          "title": "Java 9 Flow API – Reactive Streams",
          ...
        }
      },
      {
        ...
        "_score": 1.2379456,
        "_source": {
          "title": "Java 9 Flow API example – Processor",
          ...
        }
      }
    ]
  }
}
4. Analyzing

analyze_wildcard: If true, analyze the prefix. Defaults to false.

analyzer: force the analyzer to use to analyze each term of the query when creating composite queries.
For example:

GET javasampleapproach/tutorial/_search
{
  "query": {
    "simple_query_string" : {
        "query": "reactive and",
        "fields": [ "title" ],
        "default_operator": "and"
    }
  }
}

Documents with title containing both “reactive” and “and” will match the query.
But if we add an analyzer like this:

GET javasampleapproach/tutorial/_search
{
  "query": {
    "simple_query_string" : {
        "query": "reactive and",
        "fields": [ "title" ],
        "analyzer": "stop",
        "default_operator": "and"
    }
  }
}

Because “and” is a stop word, so it will be removed, documents with title containing “reactive” (not need “and” anymore) will match the query. Response could be document with title: “Java 9 Flow API – Reactive Streams”.

More about Analyzer:
Basic Analyzers
Custom Analyzer

5. Quote Field Suffix

quote_field_suffix: appends a suffix to fields for quoted parts of the query string.

For example, we have title field with analyzer_A, and title.exact field inside title with analyzer_B. Using quote_field_suffix can help us to mix exact search with stemming in a query string by putting word in 2 double quotes:

GET jsa_tutorial/tutorial/_search
{
  "query": {
    "simple_query_string": {
      "fields": [ "title" ],
      "quote_field_suffix": ".exact",
      "query": "\"integrating\" jpa"
    }
  }
}

The query will apply “integrating” with title.exact field (analyzer_B) and “jpa” with title field (analyzer_A).

This is full example:

PUT jsa_tutorial
{
  "settings": {
    "analysis": {
      "analyzer": {
        "english_exact": {
          "tokenizer": "standard",
          "filter": [ "lowercase" ]
        }
      }
    }
  },
  "mappings": {
    "tutorial": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "english",
          "fields": {
            "exact": {
              "type": "text",
              "analyzer": "english_exact"
            }
          }
        }
      }
    }
  }
}

PUT jsa_tutorial/tutorial/1
{
  "title": "Integrate Spring Boot with Bootstrap and JQuery"
}

PUT jsa_tutorial/tutorial/2
{
  "title": "How to use Spring Integration Http Inbound with Spring Boot"
}

PUT jsa_tutorial/tutorial/3
{
  "title": "Spring JPA – One to Many Relationship"
}

Check Response:

# Case 1:
GET jsa_tutorial/tutorial/_search
{
  "query": {
    "simple_query_string": {
      "fields": [ "title" ],
      "query": "integrating"
    }
  }
}
# Response:
# 1- "Integrate Spring Boot with Bootstrap and JQuery"
# 2- "How to use Spring Integration Http Inbound with Spring Boot"

# Case 2:
GET jsa_tutorial/tutorial/_search
{
  "query": {
    "simple_query_string": {
      "fields": [ "title" ],
      "quote_field_suffix": ".exact",
      "query": "\"integrating\" jpa"
    }
  }
}
# Response:
# 3- "Spring JPA – One to Many Relationship"
6. Others

lenient If true, format of field will be ignored, so it may cause format based failures (like search text from a numeric field). Default to false.

For example, this query causes a number_format_exception:

GET jsa_customer_idx/customer/_search
{
  "query": {
    "simple_query_string" : {
        "query": "dozen",
        "fields": [ "quantity" ]
    }
  }
}

If we set lenient to true:

GET jsa_customer_idx/customer/_search
{
  "query": {
    "simple_query_string" : {
        "query": "dozen",
        "fields": [ "quantity" ],
        "lenient": true
    }
  }
}

There is no exception, and the response:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

minimum_should_match: The minimum number of clauses that must match for a document to be returned. See minimum_should_match for options.

2 thoughts on “Elasticsearch Simple Query String Query”

  1. Thanks , I’ve just been looking for information about this subject for a long time and yours is the greatest I’ve
    came upon so far. However, what concerning the conclusion? Are you positive concerning the
    supply?

  2. I know this if off topic but I’m looking into starting my own weblog
    and was wondering what all is required to get setup? I’m assuming having a blog like yours would cost a pretty penny?
    I’m not very internet savvy so I’m not 100%
    positive. Any tips or advice would be greatly appreciated.

    Thank you

Leave a Reply

Your email address will not be published. Required fields are marked *