Magnolia Elasticsearch Module

Introduction and Goals

This module provides the capability to connect an existing Elasticsearch instance to Magnolia in order to index Magnolia content and deliver full-text search for end users. This is achieved through a few configuration steps with minimal implementation effort.

During installation of the module, specific indexing and de-indexing commands are added to the standard Magnolia Publication Workflow. When a page is successfully published, it is crawled on the public instance, and the content is stored as an Elasticsearch document in the index. Certain areas of the page can be excluded from indexing (e.g., navigation, footer, recurring components like newsletter sign-ups, etc.). Additionally, a page can be excluded from indexing via a flag in its properties (e.g., error pages, etc.). When a page is depublished, the Elasticsearch document is removed from the index.

Site definitions allow for configuring a separate Elasticsearch query JSON per site, which is used for search queries. This allows, for instance, enabling fuzzy search on one site and disabling it on another. In addition to the actual content, information such as the Template ID can be stored in the index. This makes it possible, for example, to cluster results.

Requirements Overview

For using this module, you need a running Magnolia DXP setup and Elasticsearch instance.