How to write metadata validators?

During the reviews feedback session, I saw that Aristotle has some ability to perform automatic checks for metadata. Are there examples online of how to write these, or is there help that I can access?

I was able to copy out the example from the video and get that working in our registry, but was wondering if there were more examples available?

- status: Standard
  object: DataElement
  checks:   
    - validator: RelationValidator
      severity: error
      field: valueDomain
      name: Value Domain linked to Data Element
    - validator: RelationValidator
      severity: error
      field: dataElementConcept
      name: DataElementConcept linked to Data Element
    - validator: RegexValidator
      name: Name pattern must include a colon separator
      severity: warning
      field: name
      regex: ".+: .+"

Also does someone know how to make the text show up here exactly as typed like in the screenshot, instead of converting it to bullet points?

1 Like

@GemmaM congratulations on getting this to work. I am having no success in entering these validation rules. They just don’t stick for me. I enter the text and press submit and nothing happens. Can you share your success story please? :smiley:

I hope this works entered the text as Preformatted text (prefix the text with 4 spaces)

- status: Standard
  object: DataElement
  checks:
- validator: RelationValidator
  severity: error
  field: valueDomain
  name: Value Domain linked to Data Element
- validator: RelationValidator
  severity: error
  field: dataElementConcept
  name: DataElementConcept linked to Data Element
- validator: RegexValidator
  name: Name pattern must include a colon separator
  severity: warning
  field: name
  regex: .+: .+

@GemmaM there isn’t a lot help at the moment, but I’ll write up some more explanation about how they work below, and have this in the Knowledge Base soon. I’ve also edited your question to correctly format the rules.

@Adrian I’ve cleaned up yours and @GemmaM’s example a little further below, and tested it so it should work now. I’ll explain how the nesting and formatting works below.

- status: Standard
  object: DataElement
  checks:   
    - validator: RelationValidator
      severity: error
      field: valueDomain
      name: Value Domain linked to Data Element
    - validator: RelationValidator
      severity: error
      field: dataElementConcept
      name: DataElementConcept linked to Data Element
    - validator: RegexValidator
      name: Name pattern must include a colon separator
      severity: warning
      field: name
      regex: ".+: .+"

(We’re going to add this to the Knowledge Base soon, but in the mean time…)

Help for writing Validation Rules

How to write a “Validation Rules” set

Validation Rules are written as “YAML” configuration files, we we’ve got some enhancements on the roadmap for validations for early next year, but in the meantime you can check your rules are valid YAML with any standard YAML formatting tool - like this: http://www.yamllint.com/.

Each validation “ruleset” is just a list of rules written. So in the example above, it is just a list with one rule in.

How to write a “Validation Rule”

A Validation Rule specifies for a metadata type (such as a Data Element) what should be checked for it to be valid at a given status.

In general, all rules will include a status, a list of checks and an optional metadata type, and will be structured like this:

- status: [required, either "any" or a valid status name, eg. Status]
  object: [optional, either "any", or the name of a metadata type. eg. DataElement]
  checks: [required, a list of Checks (or tests) to run as part of this rule, see below]

For example: To write a rules set with 3 rules that have a test DataElements that are standard, a test for Indicators to be run at all status levels, and a test for any metadata that is candidate:

- status: Standard
  object: DataElement
  checks: [covered below]
- status: any
  object: Indicator
  checks: [covered below]
- status: Candidate
  object: any
  checks: [covered below]

How to write “Checks”

The “Checks” for a Rule is just a list of validators to be run for that metadata type. Because this is a list inside the rule, they need to be indented further in. For example, to write a rule that checks a Data Element registered as standard has a name that starts with a capital letter, you would write this:

- status: Standard
  object: DataElement
  checks:
    - validator: RegexValidator
      name: "Starts with a capital letter"
      severity: warning
      field: name
      regex: [A-Z].+

Each check includes the name of the Validator type to run (see below), a name used to identify it in when showing the outcome in Aristotle, a severity (either “warning” or “error”) and any extra arguments the Validator needs. The standard structure is:

    - validator: [required, see below]
      name: [optional, string, use quotes if there are special characters]
      severity: [required, either warning or error]
      # extra fields here

Using severity to provide information about errors

Severity is used by Aristotle to display a “red cross” for errors or an “orange warning light” for warnings in the user interface to advice users of an error. This helps users differentiate between different classes of error.

For example, a check could test business rules that specify all metadata names must start with a capital letter for metadata to be standardised, and should have a description of at least 100 characters. This allows registrars to know instantly if a name is not formatted properly, and also advise them if a name is shorter than recommended (but may still be a good valid definition).

This would be done like this:

- status: any
  object: any
  checks:
    - validator: RegexValidator
      name: "Starts with a capital letter"
      severity: error
      field: name
      regex: "^[A-Z].+"
    - validator: RegexValidator
      name: "Definition is at least than 100 characters"
      severity: warning
      field: definition
      regex: ".{100,}"

Current Validators

RegexValidator: checking the formatting of text fields

Regular Expressions (or a “Regex”) is a powerful tool for testing that strings follow specific patterns. Rules can include a “Regex” that specifies the format of metadata fields, such as a name or definition.

A RegexValidator takes two extra fields:

  • field - the name of the field to check. This is the field name from the API or GraphQL
  • regex - the regular expression to use to check the field against. Because a regex can include special characters, its recommended to enclose a regex in quotes. Note: Aristotle doesn’t current verify that this is a ‘valid’ regex.

Examples:

  • Testing that a standard Data Element Concept follows a naming business rule that requires it start with a capital letter, and include a colon (:) between the Object Class and Property in its name.

    - status: Standard
      object: DataElement
      checks:
        - validator: RegexValidator
          name: Name pattern must include a colon separator
          severity: warning
          field: name
          regex: "[A-Z].+: .+"
    

    Note: The regex “.+” specifies any character (.), repeated any number of times (+).

  • Testing that a metadata has a definition of at least 100 characters standard Data Element Concept follows a naming business rule that requires a colon (:) between the Object Class and Property in its name.

    - status: Standard
      object: DataElement
      checks:
        - validator: RegexValidator
          name: Definition must be longer than 100 characters
          severity: warning
          field: name
          regex: ".{100,}"
    

Helpful regular expressions (regexes)

There are lots of ways to construct regular expressions, so this is just a few comon examples. To help build more complex regexes, we recommend https://regex101.com/
Note: Unless otherwise stated, regular expressions only match a single character, and match anywhere in a string.

Characters:

  • . - Any single character
  • [A-Z] - Any single upper case letter
  • [a-z] - Any single lowercase letter
  • [A-Za-z] - Any upper or lower case letter
  • [0-9] - Any single digit
  • [0-9A-Za-z] - Any number, upper or lower case letter

Repetition:

  • ? - match 0 or 1 of the previous token (eg. [A-Z]? will match “” (any empty string) or any single upper case character)
  • * - match 0 or more of the previous token (eg. [A-Z]* will match any number of upper case characters
  • + - match 1 or more of the previous token (eg. [A-Z]+ will match a string with at least one upper case characters

Positioning:

  • ^ - Match at the start of the line (eg. ^A will only match if a string starts with a capital A)
  • $ - Match at the end of the line (eg. Z$ will only match if a string ends with a capital Z)

RelationValidator: checking that metadata links exist

Aristotle allows user to enter draft or incomplete metadata, that may not be considered valid. To accommodate this, RelationValidators can be used to check that certain related metadata is attached to an item. For example, RelationValidators can be used to check if a Data Element has an associated Value Domain:

- status: Recorded
  object: DataElement
  checks:   
    - validator: RelationValidator
      severity: error
      field: valueDomain
      name: Value Domain should be linked to Data Element

A RelationValidator takes one extra fields:

  • field - the name of the field to check. This is the field name from the API or GraphQL. It must be a field that is a relation to another metadata item

UniqueValuesValidator: checking that Value Domains have unique values

To accommodate in progress data, or unique business rules Aristotle does not restrict the values used for codes in Value Domains. To accommodate this, UniqueValuesValidator can be used to check that Value Domains have codes that are all unique. This tests all Permissible and Supplementary Values

- status: Recorded
  object: ValueDomain
  checks:   
    - validator: UniqueValuesValidator
      severity: error
      field: valueDomain
      name: Value Domain must have unique values

A UniqueValuesValidator takes no extra fields.

StatusValidator: checking that metadata has followed a status lifecycle

Aristotle does not restrict how metadata is registered, and items can be registered at any status level.
Where business rules specify required earlier statuses a StatusValidator can be used to caputre these business rules.

A StatusValidator takes one extra fields:

  • status - a list of statuses that an item must been at prior to being registered at the stated level. The metadata item must have been register in one of these states immediately prior to the requested status level.

For example, to enforce that all metadata must have been either registered as Recorded or Candidate before being registered as a Standard:

- status: Standard
  checks:   
    - validator: StatusValidator
      name: Candidate or Recorded -> Standard
      description: Metadata should progress along the lifecycle correctly
      severity: warning
      status:
        - Candidate
        - Recorded

Special notes

Because Validation Rules are “YAML”, you can include comments in the rules to record important information. Aristotle won’t recognise these, but it won’t remove them either. Comments are inserted by adding a hash (#) and can either be for a whole line, or just at the end of a line:

# This comment will be ignored
- status: Standard
  object: DataElement  # This comment is also ignored
  checks: [covered below]
1 Like

@sam Wonderful thankyou.

1 Like