LinkAuditor
The info.magnolia.services.seo.audit.impl.LinkAuditor will find links in a rendered HTML page and check if they are accessible. The URLs contained in HTML anchor, link and, IMG elements are extracted and checked. Other URLs, such as URLs contained in JavaScript functions won’t be detected and so won’t be checked.
Checking a large number of links can be time consuming, you may want to use the excludedLinks property to ignore some links or run the auditor only when necessary.
|
- Class
-
info.magnolia.services.seo.audit.impl.LinkAuditor
Properties
In addition to the common auditor properties, this auditor can be configured with the following properties:
Property | Description | ||||
---|---|---|---|---|---|
|
required Determines how a failed audit will be counted:
|
||||
|
required Defines the property name for storing failed audit results.
|
||||
|
required
Defines the content fetcher for the selected node. The fetched content is then scanned for links.
|
||||
|
optional Defines the property name for storing valid links.
|
||||
|
required Defines the base URL to be used when checking relative links. Relative links will be appended to the base URL and then checked, so the base URL should not end with a slash. |
||||
|
optional
Defines a list of credentials (
|
||||
|
optional Defines one or more patterns of URLs to be ignored as Java regular expression. You can define one or more regular expressions. If no regular expressions are defined, then all links will be checked. |
||||
|
optional Defines the expected HTTP status codes for the link to be considered valid. If not set, then the list of valid status codes is: |
||||
|
optional Defines a delay (in milliseconds) between checking links. You can set this property to a non-zero value to avoid flooding a server with HTTP requests. If not set, then the pause time will be |
||||
|
optional
Validate that each link is syntactically correct before attempting to test it. If the link is not correct, it will not be retrieved. The default value of |
||||
|
optional
Allow two slashes ( The default value of |
||||
|
optional
Allow any scheme in a URL when validating URLs. If set to The default value of |
||||
|
optional
Allow relative URLs when validating URLs. The default value of |
||||
|
optional
Allow fragments in URLs when validating URLs. The default value of |
||||
|
optional
Specifies the allowed schemes when validating URLs, will only be used if |
Configuring credentials
The targets nodes allows you to define one or more credentials that will be added to each request used to check a link. You will probably have to define credentials for accessing links to your Magnolia instance if it is an author instance; pages and resources are protected with basic authentication.
The credentials will be added by host, so more than one credentials can be added.
Here is how to configure target credentials within your LinkAuditor
configuration:
Property | Description |
---|---|
|
|
|
required The name of the credentials. |
|
required The credentials class name, should be |
|
required The host or domain name for the credentials. |
|
required The port. |
|
optional The scheme (e.g. The default value for scheme is |
|
required The user name. |
|
required The user’s password. |
|
optional If The default value for preemptive is |
Example
Here is an example from the SEO module. You can find this configuration here: /modules/seo/config/auditManager/auditors/deadLinks
.
deadLinks:
active: true
class: info.magnolia.services.seo.audit.impl.LinkAuditor
description: Check for dead links (pre-prod, prod)
level: auditWarnings
rootUrl: http://localhost:8080
targets:
localhost:
class: info.magnolia.services.seo.audit.impl.HostTarget
host: localhost
password: superuser
port: 8080
scheme: http
user: superuser
fetcher:
class: info.magnolia.services.seo.audit.impl.RequestFetcher
targets:
localhost:
class: info.magnolia.services.seo.audit.impl.HostTarget
host: localhost
password: superuser
port: 8080
scheme: http
user: superuser