Elasticsearch provider module

Edition

Incubator (services)

Issues

Git

Latest

1.0.1

Compatible with Magnolia 6.2, 6.1.

The Elasticsearch provider module allows you to perform faceted (although being deprecated) and aggregated searching in a very efficient manner.

This module is at the INCUBATOR level.

Installing with Maven

Maven is the easiest way to install the module. Add the following to your bundle:

<dependency>
  <groupId>info.magnolia.elasticsearch</groupId>
  <artifactId>magnolia-query-manager</artifactId>
  <version>1.0.1</version>
</dependency>

<dependency>
  <groupId>info.magnolia.elasticsearch</groupId>
  <artifactId>magnolia-es-content-delivery</artifactId>
  <version>1.0.1</version>
</dependency>

Elasticsearch

With Brew
Without Brew

brew install elasticsearch
brew services start elasticsearch

brew install logstash
brew services start logstash
brew install kibana
brew services start kibana

Download the Binaries https://www.elastic.co/downloads/elasticsearch
Download the Binaries for Kibana https://www.elastic.co/downloads/kibana
Follow the installation steps on the page previously listed.
Run bin/elasticsearch from your command line.

Set up Elasticsearch cluster

Copy the config directory from /config to new folder, i.e. <path-to-installation>/configurations/001/config. Brew installations, the file is located at: /usr/local/etc/elasticsearch.

Uncomment and modify the following in elasticsearch.yml:

cluster.name: magnolia-cluster
http.cors:
    enabled: true
    allow-origin: "*"

If you’re using multiple configuration, you’ll also need to uncomment and modify the following in elasticsearch.yml:

path.data: <path-to-installation>/data
path.logs: <path-to-installation>/logs
http.port: 9201 # Or whatever unique port you prefer.
export ES_PATH_CONF=<path-to-installation>/config

You can validate both servers are running by opening a browser and going to http://localhost:9200 (along with the other node ports you may have configured) and you will see a JSON response in the magnolia-cluster.

Usage

There are multiple apps that are available to build a proper index with the appropriate search analyzers and tokenizers.

Cluster Editor app: Configure a cluster and its server nodes.
Index Editor app: Define the index and provide the source data locations.
Mapping Editor app: Map the properties with the data types and how they should be mapped (including the applied analysers to each field).

elasticsearch apps

Individual analyzers and tokenizers can be created using the Analyzer and Tokenizer Editor respectively.

Tokenizers

What is that?

A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance, a whitespace tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text "Quick brown fox!" into the terms [Quick, brown, fox!].

See here for more details.

The Edge NGram tokenizer will break down text into words when it encounters a list of specified terms. The expected result of using such a tokenizer is answering the following situation: Given a sequence of letters, what is the likelihood of the next letter.

From the JCR browser, the query manager workspace, import: elasticsearch-provider/magnolia-es-configuration/src/main/resources/mgnl-bootstrap-manual/elastic-configuration/query-manager.tokenizers.xml.

Make sure to remove the empty "tokenizers" content node.

Name = edge_ngram_tokenizer
Type = edge_ngram
Min Gram = 2
Max Gram = 5
Token Chars = Letter

Analyzers

What is that?

Text analysis is the process of converting text, like the body of any email, into tokens or terms which are added to the inverted index for searching. Analysis is performed by an analyzer which can be either a built-in analyzer or a custom analyzer defined per index.

See here for more details.

The HTML analyzer below can be automatically applied to mapped fields when the field is assumed to be of HTML text.

From the JCR browser, in the query manager workspace, import: elasticsearch-provider/magnolia-es-configuration/src/main/resources/mgnl-bootstrap-manual/elastic-configuration/query-manager.analyzers.xml.
Make sure to remove the empty "analyzers" content node.

Here are four commonly used analyzers that you can apply to your search index fields.

Analyzer	Overview	Sample
Edge NGram Analyzer	Useful for suggestion and autocomplete	This analyzer will use the a default tokenizer that takes in search terms and applies the terms in lower-case (making a more efficient search). Fields: Name = edge_ngram_search_analyzer Tokenizer = lowercase Type = standard
Keyword Analyzer	Searches via keywords.	This analyzer is similar to the previous, but applies the tokenizer created in the earlier section. Fields: Name = edge_ngram_analyzer Tokenizer = edge_ngram_tokenizer Filters = Lower Case Type = standard Default Analyzer? = selected Name = edgengram Type = text Search Analyzer = edge_ngram_search_analyzer
Keyword HTML Analyzer	Tokenizes out HTML text, and provides clean keyword searching.	This analyzer uses keyword analysis against text that is in the form of HTML. Basically, the text is indexed in a way that search analysis against it will be accomplished with it’s HTML parts stripped out. Fields: Name = keyword_html_analyzer Tokenizer = keyword Filters = Lower Case, ASCII Folding, and Trim Character Filters = HTML Strip Type = custom
Edge NGram Search Analyzer	Same as above, but provided with different name to distinguish for future modifications.	This analyzer is similar to the previous, with the exception that it is applied to text without HTML tags. Fields: Name = keyword_analyzer Tokenizer = keyword Filters = Lower Case, ASCII Folding, and Trim Type = custom HTML Alternative for = keyword_html_analyzer Default Analyzer? = selected Name = completion Type = completion This analyzer and the next are interchangeable. The system will automatically detect if the text value is of type "HTML", and automatically apply the next analyzer in place, as defined by the "HTML Alternative for:" field.

Analyzer

Overview

Sample

Edge NGram Analyzer

Useful for suggestion and autocomplete

This analyzer will use the a default tokenizer that takes in search terms and applies the terms in lower-case (making a more efficient search).

Fields:

Name = edge_ngram_search_analyzer Tokenizer = lowercase Type = standard

Keyword Analyzer

Searches via keywords.

This analyzer is similar to the previous, but applies the tokenizer created in the earlier section.

Fields:

Name = edge_ngram_analyzer Tokenizer = edge_ngram_tokenizer Filters = Lower Case Type = standard Default Analyzer? = selected Name = edgengram Type = text Search Analyzer = edge_ngram_search_analyzer

Keyword HTML Analyzer

Tokenizes out HTML text, and provides clean keyword searching.

This analyzer uses keyword analysis against text that is in the form of HTML. Basically, the text is indexed in a way that search analysis against it will be accomplished with it’s HTML parts stripped out.

Fields:

Name = keyword_html_analyzer Tokenizer = keyword Filters = Lower Case, ASCII Folding, and Trim Character Filters = HTML Strip Type = custom

Edge NGram Search Analyzer

Same as above, but provided with different name to distinguish for future modifications.

This analyzer is similar to the previous, with the exception that it is applied to text without HTML tags.

Fields:

Name = keyword_analyzer Tokenizer = keyword Filters = Lower Case, ASCII Folding, and Trim Type = custom HTML Alternative for = keyword_html_analyzer Default Analyzer? = selected Name = completion Type = completion This analyzer and the next are interchangeable. The system will automatically detect if the text value is of type "HTML", and automatically apply the next analyzer in place, as defined by the "HTML Alternative for:" field.

Changelog

Version Notes

Version	Notes
`1.0`	Initial release of the module.

1.0

Initial release of the module.

Elasticsearch provider module

Installing with Maven

Elasticsearch

Set up Elasticsearch cluster

Usage

Tokenizers

Analyzers

Changelog

Location

Main doc sections