Setting up Jackrabbit Clustering
Clustering in Jackrabbit works as follows: content is shared between all cluster nodes. That means all Jackrabbit cluster nodes need access to the same persistent storage. (PersistenceManager, DataStore, and repository FileSystem).
The persistence manager must be clusterable (e.g., a central database that allows for concurrent access. Any DataStore (file or DB) is clusterable by its very nature, as they store content by unique hash ids.
However, each cluster node needs its own (private) repository directory, including repository.xml file, workspace FileSystem and Search index. Every change made by one cluster node is reported in a journal, which can be either file based or written to some database.
|
In order to use clustering, the following prerequisites must be met:
-
Each cluster node must have its own repository configuration.
-
A DataStore must always be shared between nodes, if used.
-
The global repository FileSystem on the repository level must be shared (only the one that is on the same level as the DataStore; only in the repository.xml file).
-
Each cluster node needs its own (private) workspace level and version FileSystem (only those within the workspace and versioning configuration; the ones in the
repository.xml
andworkspace.xml
file). -
Each cluster node needs its own (private) Search indexes.
-
Every cluster node must be assigned a unique ID.
-
A journal type must be chosen, either based on files or stored in a database.
-
Each cluster node must use the same (shared) journal.
-
The persistence managers must store their data in the same, globally accessible location.
Scenario
We have a comments workspace to be shared by an author and public instance. For simplicity we will use a preassembled bundle from our download site. The unclustered repsositories will use H2 embedded database. The clustered repository will use MySQL database.
MySQL has been choosen as the persistence manager for the clustered repository since it supports concurrent access. Any DataStore (file system or DB) is clusterable by its very nature, as they store content by unique hash ids. |
Prerequisites
The author and public instances must be installed and setup. See getting started with magnolia for more details on that. For this example, author and public will run in separate tomcat instances on port 8080
and 7070
respectively.
Example: Cluster directory structure.
cluster-example
└── magnolia-dx-core
└── author-tomcat8080
└── public-tomcat7070
└── repositories
The repositories folder will be the central place for the author, public and shared repository. |
Instructions
-
Create shared MySQL database.
% mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 5 Server version: 5.7.31 MySQL Community Server (GPL) Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create database magnolia_shared; Query OK, 1 row affected (0.00 sec)
-
Add the MySQL driver to the libs folder (i.e. /TOMCAT_HOME/WEBAPP_HOME/WEB-INF/lib) of both author and public instances. See MySQL Connectors.
-
Create a folder for the shared repository.
cluster-example └── magnolia-dx-core └── author-tomcat8080 └── public-tomcat7070 └── repositories └── shared
In this example the entire setup is located on the same machine with everything in the same parent folder. In a typical setup each instance will most likely be located on different machines so you need to be sure the shared space can be accessed by all instances. -
Create a repository configuration file for the cluster setup.
System properties will be used to set the path of the shared folder and the cluster id. The system property approach will allow the cluster repo config file to be the same for both instances.
-
org.apache.jackrabbit.core.cluster.shared_folder
-
org.apache.jackrabbit.core.cluster.node_id
-
Add the system properties to the setenv.sh/bat file of each tomcat instance.
/cluster-example/magnolia-dx-core/author-tomcat8080/bin/setenv.shexport CATALINA_OPTS="$CATALINA_OPTS -Xms64M -Xmx2048M -Djava.awt.headless=true -Dorg.apache.jackrabbit.core.cluster.node_id=auther_cluster -Dorg.apache.jackrabbit.core.cluster.shared_folder=/cluster-example/magnolia-dx-core/repositories/shared"
/cluster-example/magnolia-dx-core/public-tomcat7070/bin/setenv.shexport CATALINA_OPTS="$CATALINA_OPTS -Xms64M -Xmx2048M -Djava.awt.headless=true -Dorg.apache.jackrabbit.core.cluster.node_id=public_cluster -Dorg.apache.jackrabbit.core.cluster.shared_folder=/cluster-example/magnolia-dx-core/repositories/shared"
-
Create the configuration file for the shared repository.
Click to see file sample
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN" "http://jackrabbit.apache.org/dtd/repository-2.0.dtd"> <Repository> <DataSources> <DataSource name="magnolia"> <param name="driver" value="com.mysql.jdbc.Driver" /> <!-- Use the magnolia_shared database --> <param name="url" value="jdbc:mysql://localhost:3306/magnolia_shared" /> <!-- ******************************** --> <param name="user" value="root" /> <param name="password" value="root" /> <param name="databaseType" value="mysql"/> <param name="validationQuery" value="select 1"/> </DataSource> </DataSources> <!-- The repository level file system will be shared by both instances. Use the system property to set the path.--> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${org.apache.jackrabbit.core.cluster.shared_folder}/repository" /> </FileSystem> <!-- ******************************************************************* --> <Security appName="magnolia"> <SecurityManager class="org.apache.jackrabbit.core.DefaultSecurityManager"/> <AccessManager class="org.apache.jackrabbit.core.security.DefaultAccessManager"> </AccessManager> <!-- login module defined here is used by the repo to authenticate every request. not by the webapp to authenticate user against the webapp context (this one has to be passed before thing here gets invoked --> <LoginModule class="info.magnolia.jaas.sp.jcr.JackrabbitAuthenticationModule"> </LoginModule> </Security> <!-- The repository level data store will be shared by both instances. Use the system property to set the path.--> <DataStore class="org.apache.jackrabbit.core.data.FileDataStore"> <param name="path" value="${org.apache.jackrabbit.core.cluster.shared_folder}/repository/datastore"/> <param name="minRecordLength" value="1024"/> </DataStore> <!-- ***************************************************************** --> <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default" /> <Workspace name="default"> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${wsp.home}/default" /> </FileSystem> <PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.MySqlPersistenceManager"> <param name="dataSourceName" value="magnolia"/> <param name="schemaObjectPrefix" value="pm_${wsp.name}_" /> </PersistenceManager> <SearchIndex class="info.magnolia.jackrabbit.lucene.SearchIndex"> <param name="path" value="${wsp.home}/index" /> <!-- SearchIndex will get the indexing configuration from the classpath, if not found in the workspace home --> <param name="indexingConfiguration" value="/info/magnolia/jackrabbit/indexing_configuration_${wsp.name}.xml"/> <param name="useCompoundFile" value="true" /> <param name="minMergeDocs" value="100" /> <param name="volatileIdleTime" value="3" /> <param name="maxMergeDocs" value="100000" /> <param name="mergeFactor" value="10" /> <param name="maxFieldLength" value="10000" /> <param name="bufferSize" value="10" /> <param name="cacheSize" value="1000" /> <param name="forceConsistencyCheck" value="false" /> <param name="autoRepair" value="true" /> <param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl" /> <param name="respectDocumentOrder" value="true" /> <param name="resultFetchSize" value="100" /> <param name="extractorPoolSize" value="3" /> <param name="extractorTimeout" value="100" /> <param name="extractorBackLogSize" value="100" /> <!-- needed to highlight the searched term --> <param name="supportHighlighting" value="true"/> <!-- custom provider for getting an HTML excerpt in a query result with rep:excerpt() --> <param name="excerptProviderClass" value="info.magnolia.jackrabbit.lucene.SearchHTMLExcerpt"/> </SearchIndex> <WorkspaceSecurity> <AccessControlProvider class="info.magnolia.cms.core.MagnoliaAccessProvider" /> </WorkspaceSecurity> </Workspace> <Versioning rootPath="${rep.home}/version"> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${rep.home}/workspaces/version" /> </FileSystem> <PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.MySqlPersistenceManager"> <param name="dataSourceName" value="magnolia"/> <param name="schemaObjectPrefix" value="version_" /> </PersistenceManager> </Versioning> <Cluster syncDelay="2000"> <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal"> <!-- The revision log will be shared by both instances. Use the system property to set the path. --> <param name="revision" value="${org.apache.jackrabbit.core.cluster.shared_folder}/revision.log" /> <!-- ********************************************************************************************--> <param name="driver" value="com.mysql.jdbc.Driver" /> <param name="url" value="jdbc:mysql://localhost:3306/magnolia_shared" /> <param name="user" value="root" /> <param name="password" value="root" /> <param name="schema" value="mysql" /> <param name="schemaObjectPrefix" value="journal_" /> </Journal> </Cluster> </Repository>
-
-
-
Add the clustered workspaces to
WEB-INF/config/default/repository.xml
.The
repository.xml
file will need to be adjusted for the new clustered repository. For this example it will be the same for both author and public where they share a comments workspace.Click to see file sample
<JCR> <RepositoryMapping> <Map name="website" repositoryName="magnolia" workspaceName="website" /> <Map name="config" repositoryName="magnolia" workspaceName="config" /> <Map name="users" repositoryName="magnolia" workspaceName="users" /> <Map name="userroles" repositoryName="magnolia" workspaceName="userroles" /> <Map name="usergroups" repositoryName="magnolia" workspaceName="usergroups" /> </RepositoryMapping> <!-- magnolia default repository --> <Repository name="magnolia" provider="info.magnolia.jackrabbit.ProviderImpl" loadOnStartup="true"> <param name="configFile" value="${magnolia.repositories.jackrabbit.config}" /> <param name="repositoryHome" value="${magnolia.repositories.home}/magnolia" /> <!-- the default node types are loaded automatically <param name="customNodeTypes" value="WEB-INF/config/repo-conf/nodetypes/magnolia_nodetypes.xml" /> --> <param name="contextFactoryClass" value="org.apache.jackrabbit.core.jndi.provider.DummyInitialContextFactory" /> <param name="providerURL" value="localhost" /> <param name="bindName" value="${magnolia.webapp}" /> <workspace name="website" /> <workspace name="config" /> <workspace name="users" /> <workspace name="userroles" /> <workspace name="usergroups" /> </Repository> <!-- magnolia cluster repository --> <Repository name="cluster" provider="info.magnolia.jackrabbit.ProviderImpl" loadOnStartup="true"> <param name="configFile" value="${magnolia.repositories.jackrabbit.cluster.config}" /> <param name="repositoryHome" value="${magnolia.repositories.cluster}" /> <!-- the default node types are loaded automatically <param name="customNodeTypes" value="WEB-INF/config/repo-conf/nodetypes/magnolia_nodetypes.xml" /> --> <param name="contextFactoryClass" value="org.apache.jackrabbit.core.jndi.provider.DummyInitialContextFactory" /> <param name="providerURL" value="localhost" /> <param name="bindName" value="cluster-${magnolia.webapp}" /> <!-- since forum module has been deprecated, we switch to contacts module for demonstration. --> <!-- <workspace name="forum" /> --> <workspace name="comments" /> </Repository> <RepositoryMapping> <Map name="comments" repositoryName="cluster" workspaceName="comments" /> </RepositoryMapping> </JCR>
-
Configure the properties files.
Some of the properties configuration will differ between the instances. The author instance uses the context magnoliaAuthor while the public instance uses the context magnoliaPublic. For the sake of this cluster example let’s reconfigure the repository creation to be centrally located. This will allow for a better overview of what is shared vs what is private.
cluster-example └── magnolia-dx-core └── author-tomcat8080 └── public-tomcat7070 └── repositories └── author └── public └── shared
WEB-INF/config/default/magnolia.properties
The shared properties will go into the default properties file. As mentioned earlier in this tutorial the unclustered repsositories will use H2 embedded database an the clustered repository will use MySQL database. By making use of system properties the cluster config is a shared configuration.
magnolia.repositories.config=WEB-INF/config/default/repositories.xml magnolia.repositories.jackrabbit.config=WEB-INF/config/repo-conf/jackrabbit-bundle-h2-search.xml magnolia.repositories.jackrabbit.cluster.config=WEB-INF/config/repo-conf/jackrabbit-bundle-mysql-search.xml
WEB-INF/config/magnoliaAuthor/magnolia.properties
Properties specific to the author setup go into the magnoliaAuthor properties file.
magnolia.repositories.home=${magnolia.home}/../../../repositories/author magnolia.repositories.cluster=${magnolia.repositories.home}/cluster magnolia.clusterid=author_cluster magnolia.repositories.jackrabbit.cluster.master=true
WEB-INF/config/magnoliaPublic/magnolia.properties
Properties specific to the public setup go into the magnoliaPublic properties file.
magnolia.repositories.home=${magnolia.home}/../../../repositories/public magnolia.repositories.cluster=${magnolia.repositories.home}/cluster magnolia.clusterid=public_cluster magnolia.repositories.jackrabbit.cluster.master=false
Repository overview
Once both instances are running all repostories are created in the same parent folder. There is the shared repository with it’s shared file system, data store and revison log. There is the author and public magnolia repositories. There is the private search index for each instance in the cluster folder.
cluster-example
└── magnolia-dx-core
└── author-tomcat8080
└── public-tomcat7070
└── repositories
└── author
└── cluster
└── workspaces
└── magnolia
└── public
└── cluster
└── workspaces
└── magnolia
└── shared
└── repostiory
└── datastore
└── meta
└── namespaces
└── nodetypes
└── privileges
└── revison.log