Extended Health Check module
Edition |
Incubator (services) |
||
Git |
|||
Latest |
1.0
|
The Extended Health Check module provides extensible, configurable endpoint for evaluating the "health" of a Magnolia instance. You can use the endpoint for monitoring a Magnolia instance, either manually or automatically, for example for autoscaling.
You configure the values of the HTTP status returned in by the health check and configure the conditions that will be checked for a specific HTTP status.
The Extended Health Check module also provides a store for "health events", significant events that indicate something about the health of the Magnolia instance and can be checked in the extended health check.
You can collect health events from the Magnolia log with log4j configuration. And you can collect health events relating to Magnolia publication failures.
|
This module is at the INCUBATOR level. |
Installing with Maven
Maven is the easiest way to install the module. Add the following to your bundle:
<dependency>
<groupId>info.magnolia</groupId>
<artifactId>healthcheck</artifactId>
<version>1.0</version>
</dependency>
Usage
Health outcomes
Health outcomes define the conditions for which a specific HTTP status is returned by the extended health check.
A health outcome defines:
-
A voter set including one or more health voters or boolean voter sets checking Magnolia health conditions
-
Details returned if the conditions for the health outcome are met (HTTP status and description)
Health outcomes can be disabled (or enabled). A disabled health outcome won’t be examined when an extended health check is requested. |
Health outcomes are defined through the module configuration at /modules/healthcheck/config/outcomes
. You can add or modify the health outcomes defined there.
Node name | Value |
---|---|
modules healthcheck |
Health outcomes are checked in the order they are defined; the first health outcome whose health voters return true is returned as the result of a health check, any remaining health outcomes are ignored.
Here are the configurable properties of a health outcome:
Health voters
Health voters check a single, specific condition about the health of a Magnolia instance. They can be combined with other health voters and boolean voter sets to form complicated logical expressions for a particular health outcome.
The Extended Health Check module includes several health voters:
ContextAvailableVoter
Checks if a Magnolia context is available
The Magnolia context is fundamental to Magnolia operation (unsurprisingly) and indicates a serious problem with Magnolia if one is not available.
ContextAvailableVoter
has the following configuration:
Node name | Value |
---|---|
<voter name> |
The name of the voter. |
class |
Should be |
enabled |
If |
not |
If |
HealthEventPropertyVoter
Checks for specified health events.
The HealthEventPropertyVoter
checks whether specific health events exist meeting the configured criteria. You can also specify a threshold for the number of health events found, as well as the expected value of a health event property.
HealthEventPropertyVoter
has the following configuration:
Node name | Value |
---|---|
<voter name> |
The name of the voter. |
class |
Should be |
enabled |
If |
not |
If |
identifier |
The identifier of the health event. Health events have the following identifiers: * NOTE: If not specified, the identifier will be loggedMessage. |
propertyName |
(required) The name of the health event property whose value will be checked. |
propertyValue |
(required) The expected value of the health event property. |
predicate |
Specifies how the value of propertyName will be compared to the expected The following comparisons are available: * |
threshold |
The number of health events matching the identifier, propertyName, propertyName and predicate. If more health events are found, the voter will return true, otherwise false. If not specified, threshold will be |
interval |
Defines an interval in milliseconds from the current time when the health voter is checked for health events Health events outside of the interval will not be checked. Use interval limit the health events considered (e.g. publication errors within the last 30 minutes). If interval is less than than |
MagnoliaUpdatedNeededVoter
Checks Magnolia modules needing updating.
The MagnoliaUpdatedNeededVoter
checks whether one or more Magnolia modules needs updating.
MagnoliaUpdatedNeededVoter
has the following configuration:
Node name | Value |
---|---|
<voter name> |
The name of the voter. |
class |
Should be |
enabled |
If |
not |
If |
PublicationFailureVoter
Checks for Magnolia publication failures.
The PublicationFailureVoter
checks whether a publication failure has occurred.
PublicationFailureVoter
has the following configuration:
Node name | Value |
---|---|
<voter name> |
The name of the voter. |
class |
Should be |
enabled |
If |
not |
If |
interval |
Defines an interval in milliseconds from the current time when the health voter is checked for publication failures Publication failures outside of the interval will not be counted. Use the interval to limit the publication errors considered (e.g. publication errors within the last 30 minutes). If interval is less than than |
threshold |
The number of publication failures within the specified interval counted. If more publication failures are found, the voter will return true, otherwise false. If not specified, threshold will be |
QueryVoter
Checks for nodes defined in the JCR repository.
The QueryVoter
checks whether nodes in the JCR repository are defined. This voter is useful for checking the messages workspace for system errors like the expiration of the Magnolia license.
QueryVoter
has the following configuration:
Node name | Value |
---|---|
<voter name> |
The name of the voter. |
class |
Should be |
enabled |
If |
not |
If |
workspace |
(required) The workspace that will be searched. |
query |
A valid JCR SQL 2 query that will be evaluated in the workspace. |
threshold |
The number of publication failures within the specified interval counted. If more publication failures are found, the voter will return true, otherwise false. If not specified, threshold will be |
Health events
Health events are collected while Magnolia is running and provide a record that can be checked by health voters. There are two health voters - PublicationFailureVoter
and HealthEventPropertyVoter
- that use health events; the other voters - ContextAvailableVoter
, MagnoliaUpdatedNeededVoter
and QueryVoter
- all check the state of Magnolia at the time of execution.
Health events are collected from two sources:
-
The Magnolia log
-
The results of Magnolia publications
Both sources can provide valuable insight in what has happened in a Magnolia instance outside of the time Magnolia’s health is being checked. |
Health events have:
-
an identifier to indicate where the health event came from: "loggedMessage" for health events from Magnolia logging and "publicationError" from errors occurring during a Magnolia publication
-
name / value properties depending where the health event was collected
Health Log
Health events are stored in a health log and health voters can check the health log for matching their configuration to assess Magnolia’s health.
The health log can store a limited number of health events:
-
up to 10,000 total health events
-
health events older than 6 hours are discarded
Your health voters should not use intervals longer than 6 hours. |
Collecting health events from Magnolia logs
You can collect health events from Magnolia logs and save them in the health log through Magnolia’s log4j configuration.
You will need set up two log4j elements:
-
A health log "Appender" to store any matching messages into the health log
-
One or more "Loggers" to select log messages to be saved by the health log appender
You can filter events by both the health log appender (using the "Filters" attribute) and the loggers (using the "level" attribute). |
The health log appender is declared in the Extended Health Check module, you can use it in your log4j configuration without further declarations:
Here’s a sample health log appender:
<HealthMonitor name="license-monitor" messagePattern=".+">
<PatternLayout>
<PatternLayout pattern="%-5p %c %d{dd.MM.yyyy HH:mm:ss} -- %m%n"/>
</PatternLayout>
</HealthMonitor>
This HealthMonitor
appender will save any log message directed toward it (messagePattern will match any non-empty message) with the specified layout pattern.
HealthMonitor
will save any matching log message to the health log with the following name / value properties:
-
logLevel
: the log level of the message -
logMessage
: the log message -
logThread
: the thread where the message was logged -
logName
: the name of the Logger -
logCallerFQCN
: the fully qualified class name where the message was logged
Here’s some sample loggers that select log messages and send them to the HealthMonitor appender above:
<Logger name="info.magnolia.multisite.sites.MultiSiteManager" level="WARNING">
<AppenderRef ref="license-monitor"/>
</Logger>
<Logger name="info.magnolia.sitemesh.config.MagnoliaConfigurableSiteMeshFilter" level="WARN">
<AppenderRef ref="license-monitor"/>
</Logger>
WARN
levelThese loggers will select WARN level messages from the Magnolia Multi-Site module (specifically info.magnolia.multisite.sites.MultiSiteManager
) and the Magnolia SiteMesh cacheing module (specifically info.magnolia.sitemesh.config.MagnoliaConfigurableSiteMeshFilter
) and sends them to the HealthMonitor appender named "license-monitor". MultiSiteManager
and MagnoliaConfigurableSiteMeshFilter
both report expired licenses at WARN level.
Collecting health events from publications
Errors during a Magnolia publication are not completely captured in the Magnolia logs; the specific error message returned by a Magnolia public instance to the Magnolia author is not recorded in the log of the public instance. Knowing why a publication failed is an important indication of the health of a Magnolia public instance: if the publication failed because of some failure of the JCR repository, the JCR repository Magnolia public instance may be corrupted and the instance should be replaced or repaired. On the other hand, some publication errors may be recoverable, for example, publishing a child node whose parent has not been published will cause a publication error that can be remedied by publishing the parent node and republishing the child node.
Publication errors can be collected by a filter. The filter detects publication requests and saves the results of the publication into the health log.
The Extended Health Check module will install a filter "publishingMonitor" before the publication filter "publishing" to collect the result of publications.
If you change either the publishingMonitor
filter or publishing filter, please not:
-
the
publishingMonitor
filter must be located before the publishing filter in the filter chain to collect publication results -
the
publishingMonitor
filter should have the same bypasses configuration as the publishing filter to identify publication requests
If you don’t want to collect publication results in the health log, you can disable the publishingMonitor filter (set its enabled property to false) or delete the publishingMonitor filter.``
|
Health outcomes provided
The Extended Health Check module includes a number of health outcomes defined:
Name | HTTP status | Description | |||
---|---|---|---|---|---|
|
|
Magnolia has internal errors. |
Couldn’t get a Magnolia context. |
||
|
|
Magnolia has internal errors. |
One or more Magnolia modules needs to be updated. |
||
|
|
Magnolia public instance has publishing failures. |
One or more publication errors was found in the health log. |
||
|
|
Magnolia license has expired! |
One or more licensed expired messages were found in the messages workspace or one or more license expired log messages was found in the health log. |
||
|
|
Test health error (Magnolia is really OK). |
A test outcome (will always be returned) for testing the health check endpoint.
|