Installation instruction

You have to add the Ray Sono Marketplace repository to your mirrors in your settings.xml file. The necessary username and password for authentication on Nexus will be provided to you by Ray Sono. In the ideal case, the Nexus password is encrypted and stored in the settings.xml file. Instructions for this can be found here: https://maven.apache.org/guides/mini/guide-encryption.html#how-to-encrypt-server-passwords

E.g.:

<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">

    <servers>
        <server>
            <id>raysono-repo</id>
            <username>{raysono_marketplace_repository_username}</username>
            <password>{raysono_marketplace_repository_password}</password>
        </server>
        <server>
            <id>mgnl-repo</id>
            <username>{magnolia_enterprise_repository_username}</username>
            <password>{magnolia_enterprise_repository_password}</password>
        </server>
    </servers>

    <mirrors>
        <mirror>
            <id>central</id>
            <mirrorOf>central</mirrorOf>
            <url>https://repo.maven.apache.org/maven2</url>
        </mirror>
        <mirror>
            <id>raysono-repo</id>
            <mirrorOf>marketplace</mirrorOf>
            <url>https://nexus.raysono.io/repository/marketplace</url>
        </mirror>
        <mirror>
            <id>mgnl-repo</id>
            <mirrorOf>magnolia</mirrorOf>
            <url>https://nexus.magnolia-cms.com/content/groups/enterprise/</url>
        </mirror>
    </mirrors>

    <profiles>
        <profile>
            <id>nexus</id>
            <repositories>
                <repository>
                    <id>central</id>
                    <url>http://central</url>
                    <releases>
                        <enabled>true</enabled>
                    </releases>
                    <snapshots>
                        <enabled>true</enabled>
                    </snapshots>
                </repository>
                <repository>
                    <id>marketplace</id>
                    <url>http://marketplace</url>
                    <releases>
                        <enabled>true</enabled>
                    </releases>
                    <snapshots>
                        <enabled>true</enabled>
                    </snapshots>
                </repository>
                <repository>
                    <id>magnolia</id>
                    <url>http://magnolia</url>
                    <releases>
                        <enabled>true</enabled>
                    </releases>
                    <snapshots>
                        <enabled>true</enabled>
                    </snapshots>
                </repository>
            </repositories>
            <pluginRepositories>
                <pluginRepository>
                    <id>central</id>
                    <url>http://central</url>
                    <releases>
                        <enabled>true</enabled>
                    </releases>
                    <snapshots>
                        <enabled>true</enabled>
                    </snapshots>
                </pluginRepository>
            </pluginRepositories>
        </profile>
    </profiles>

    <activeProfiles>
        <activeProfile>nexus</activeProfile>
    </activeProfiles>

</settings>

To use this module in your project, add the following dependency to your webapp pom.xml and set the ${raysono.elasticsearch.module.version} property:

<properties>
    <raysono.elasticsearch.module.version>1.0.1</raysono.elasticsearch.module.version>
</properties>

<dependencies>
    <dependency>
        <groupId>com.raysono.magnolia.module.community</groupId>
        <artifactId>raysono-magnolia-elasticsearch</artifactId>
        <version>${raysono.elasticsearch.module.version}</version>
    </dependency>
</dependencies>

After adding the dependency to the webapp pom.xml you can startup the Tomcat server and the module will be installed automatically.

Elasticsearch App

If you login to the Magnolia AdminCentral you will see a new app called Elasticsearch in the 'Edit' section. By clicking on the app you will see the basic configuration page of the module and another tab for the Elasticsearch configuration.

Basic configuration

app_screen1

On the basic configuration page you can configure the following properties:

Property Description

Service enabled

If this is set to false the module will not index anything.

Ray Sono License Key

This is the license key for the Ray Sono Elasticsearch module. If you don’t have a license key please contact Ray Sono. Without a valid license key the module will not index anything.

No index property

This property designates a page as non-indexable. It should then be incorporated into the pages dialog.

Workspaces

This specifies the workspaces using a comma-separated list (e.g., website, dam) to inform the module about which workspaces should be indexed. For DAM, the supported document extensions include pdf, doc, and docx.

Elasticsearch configuration

app_screen2

On the Elasticsearch configuration page you can configure the following properties:

Property Description

Elasticsearch hosts

This defines the Elasticsearch hosts as a comma separated list (e.g. http://localhost:9200, http://localhost:9201).

Authentication

If your Elasticsearch server is secured with basic authentication you can define the username and password here.

Encryption

If you want to encrypt the communication between the module and the Elasticsearch server you can choose between PKCS12, PEM and TLS.

Also you find a table which displays the stats of the Elasticsearch indices.

app_screen3

Site definition configuration

By default the module will deliver parameters to the fallback site definition. This parameters are:

Parameter Description

elasticsearchTemplate

The Elasticsearch template name to be utilized for searching. If not specified, the default template included in this module will be employed. To utilize a custom template, you must place your own template in the root directory of your project’s resources folder and set this property to its name.

jsoupSelectId

This property is relevant for indexing the content of a page. It defines the id of the element (e.g. a div layer which holds all the main content id="content") which should be indexed. If this is not set, jsoupSelectTag will be evaluated.

jsoupSelectTag

This property is crucial for indexing page content when there’s no ID for the main content. It specifies the tag of the element, such as <main>, that should be indexed. In case there are multiple main tags on the page, only the first one will be selected. If left unset, along with the jsoupSelectId property, nothing will be indexed.

jsoupRemoveElements

Here you can define the attribute (e.g. [data-noindex]) which will be used to ignore content of a page. This attribute is particularly useful for excluding elements on a page such as a newsletter teaser. To implement this attribute in your components, assign it to a <div> tag or any other relevant element that should not be indexed.

indexfields

Here you can define the meta tags which should also be indexed.

E.g.:

For <meta name="description" content="This is the description"> you have to add 'description': 'description' to the indexfields.

For <meta name="title" content="This is the title"> you have to add 'title': 'title' to the indexfields.

Therefor see the example definition below.

These parameters are hold to the site definitions, because they can be different for each site definition.

Default example for the parameters of a site definition:

'parameters':
  'elasticsearchTemplate': 'elasticsearch_query.json'
  'elasticsearch':
    'jsoupSelectId': 'content'
    'jsoupSelectTag': 'main'
    'jsoupRemoveElements':
      '0': '[data-noindex]'
    'indexfields':
      'description': 'description'
      'language': 'Content-Language'
      'title': 'title'

Config Repository

Some configuration are not editable through the app for now. You can find them under modules/raysono-magnolia-elasticsearch in the config repository. One of the configurations is nodeTypeMetaData, responsible for holding metadata for indexing content. Here, you can define metadata that should be stored in the Elasticsearch document alongside the page content. This is beneficial, for instance, for specifying the page template or the node type of the element containing the content (e.g., mgnl:page, mgnl:asset, etc.).

DAM Asset Indexation

To index DAM assets as well, you need to include a mapping in the site definition for the path within the DAM.

E.g.:

'mappings':
  'website':
    'URIPrefix': ''
    'handlePrefix': '/b2c_de'
    'repository': 'website'
  'dam':
    'URIPrefix': ''
    'handlePrefix': '/b2c'
    'repository': 'dam'

Note, that for now only pdf, doc and docx files are indexed.

You need to include the language of the file in the languages property within the asset dialog. Failing to do so will result in the asset being indexed, but it won’t be retrievable through the search endpoint due to the absence of language information. Also only one language per asset is supported.

asset_indexation_locale

Asset exclusion

To exclude an asset from being indexed, you can decorate the asset dialog by adding the property which you also have defined for excluding pages from indexing. You can find the setting for this property in the Elasticsearch app in the Basic configuration tab (No index property).

Elasticsearch prerequisites

For each site definition you have to create a index in Elasticsearch, which holds the content for each site definition. The name of the index has to be the same as the name of the site definition. If you have a site definition called b2c you have to create an index called b2c in Elasticsearch.

Elasticsearch Template

As described above, the module comes with a default Elasticsearch template. This template is located in the root directory of the module. If you want to use your own template, you have to place it in the root directory of your projects resources folder. To use the custom template you have to implement your own SearchService to override the method createElasticSearchRequest.

Example for a custom SearchService:

@Slf4j
public class CustomSearchService extends SearchServiceImpl {

        private static final String ELASTIC_TEMPLATE = "elasticsearch_query_fallback.json";

    private static final String SITE_PARAM_SEARCH_KEY = "elasticsearchTemplate";
    private final SiteManager siteManager;

    @Inject
    public RodenstockSearchService(Provider<SearchServiceModule> searchServiceModule, EsStringSanitizer esStringSanitizer, ObjectMapper objectMapper, SiteManager siteManager) {
        super(searchServiceModule, esStringSanitizer, objectMapper, siteManager);
        this.siteManager = siteManager;
    }

    @Override
    public void createElasticsearchRequest(SearchPageRequest searchPageRequest, Request request, String queryTerm) {
        String templateName = tryGetTemplateNameFromSiteParameter(searchPageRequest);
        String queryTemplate = StringUtils.isNotBlank(templateName)
                ? readElasticQueryTemplateFromFile(templateName)
                : readElasticQueryTemplateFromFile(ELASTIC_TEMPLATE);

        if (StringUtils.isBlank(queryTemplate) || queryTemplate.equals("{}")) {
            // Fallback to Default Search Config
            super.createElasticsearchRequest(searchPageRequest, request, queryTerm);
        } else {
            request.setJsonEntity(String.format(queryTemplate,
                    searchPageRequest.getPageFrom(),
                    SearchPageRequest.getPageSize(),
                    searchPageRequest.getLocale(),
                    StringEscapeUtils.escapeJson(queryTerm),
                    StringEscapeUtils.escapeJson(queryTerm),
                    StringEscapeUtils.escapeJson(queryTerm)));
        }
    }

    /**
     * This method tries to read the custom query template from the classpath.
     *
     * @param templateName the name of the template to read
     * @return the template as a string
     */
    private String readElasticQueryTemplateFromFile(String templateName) {
        try (var inputStream = getClass().getResourceAsStream("/" + templateName)){
            if (inputStream != null) {
                return new String(inputStream.readAllBytes(), StandardCharsets.UTF_8);
            }
        } catch (IOException e) {
            log.error("Error reading custom query template", e);
        }
        return "{}";
    }

    /**
     * This method tries to get the template name from the site parameter.
     * @param searchPageRequest the search page request
     * @return the template name or null if not found
     */
    @Nullable
    private String tryGetTemplateNameFromSiteParameter(SearchPageRequest searchPageRequest) {
        Site site = this.siteManager.getSite(searchPageRequest.getSite());

        if (site != null && !(site instanceof NullSite) && site.getParameters() != null) {
            return site.getParameters().containsKey(SITE_PARAM_SEARCH_KEY) ? site.getParameters().get(SITE_PARAM_SEARCH_KEY).toString() : null;
        }
        return null;
    }
}

Also you have to register your custom SearchService in the modul descriptor:

<components>
    <component>
        <type>com.raysono.magnolia.search.service.service.SearchServiceImpl</type>
        <implementation>com.raysono.search.service.CustomSearchService</implementation>
        <scope>singleton</scope>
    </component>
</components>

REST Endpoint

To search for documents you can use the rest endpoint /rest/search. The endpoint expects a GET request with the structure /.rest/search/v1?q=Erntedankfest&s=b2c_de&l=de&o=0.

Endpoint URI: /.rest/search/v1

Query params:

  • q - query term

  • s - site name

  • l - language

  • o - offset

  • e - limit

The rest-anonymous role on a public instance should be granted GET access on /.rest/search/* URLs.

The query to Elasticsearch in elasticsearch_query.json is a JSON string with placeholders, that are replaced with actual values of the URL Parameters listed above in SearchService.