Matchers

Filtering Against Data Using the Audisto Scripting Language

Matchers work on data that was gathered using scopes. They take data and an argument and return either TRUE if the data matches the argument, FALSE if not.

Matchers may process different kinds of data:

  • Text: Text based matchers process data as text. Case insensitive data is matched in a case insensitive way. Case sensitive data is processed in a case sensitive way.
  • Number: Number based matchers require the data to be a number. They accept only integer numbers as arguments.
  • HTML: HTML based matchers expect an HTML document as input. They provide easy access to elements of the document, like tags, attributes, or text.
  • Content: Content based matchers expect content as input, like from scope "HTTP Body". They allow for easy detection of content changes.
  • Category: Category based matchers expect a category as input. Categories can be one of "good", "improvable" and "poor".

Note: It is allowed to process numbers and HTML as text. This is valid:

HTTP Status Starts With "3"

Text Related Matchers

Matching Text against Text

The following text processing matchers are available:

  • Contains: Match is successful if the scope contains the desired string
  • Starts With: Match is successful if the scope starts with the desired string
  • Ends With: Match is successful if the scope ends with the desired string
  • Equals: Match is successful if the scope equals the desired string
  • Is Like: Match is successful if the scope equals a string using wildcards. See below for details
  • Matches Regex: Match is successful if the scope matches the given regular expression. We support Java style regular expressions

These matchers are generally case sensitive, that is they distinct between upper case and lower case letters. But there are also case insensitive variants:

  • Contains Case Insensitive: Match is successful if the scope contains the desired string regardless of case
  • Starts With Case Insensitive: Match is successful if the scope starts with the desired string regardless of case
  • Ends With Case Insensitive: Match is successful if the scope ends with the desired string regardless of case
  • Equals Case Insensitive: Match is successful if the scope equals the desired string regardless of case
  • Is Like Case Insensitive: Match is successful if the scope equals a string using wildcards, regardless of case

Each matcher can also be negated by the according negative matches:

  • Does Not Contain: Match is successful if the scope does not contain the desired string
  • Does Not Contain Case Insensitive: Match is successful if the scope does not contain the desired string regardless of case
  • Does Not Start With: Match is successful if the scope does not start with the desired string
  • Does Not Start With Case Insensitive: Match is successful if the scope does not start with the desired string regardless of case
  • Does Not End With: Match is successful if the scope does not end with the desired string
  • Does Not End With Case Insensitive: Match is successful if the scope does not end with the desired string regardless of case
  • Does Not Equal: Match is successful if the scope does not equal the desired string
  • Does Not Equal Case Insensitive: Match is successful if the scope does not equal the desired string regardless of case
  • Is Not Like: Match is successful if the scope does not equal the desired wildcarded string
  • Is Not Like Case Insensitive: Match is successful if the scope does not equal the desired wildcarded string, regardless of case
  • Does Not Match Regex: Match is successful if the scope does not match the given regular expression.

The Is Like matcher supports an asterisk as wildcard character, that matches arbitrary many characters.

Path Is Like "/category/*/top-products.html"

Note that without a starting asterisk, the given string must start with the pattern. If you use a trailing asterisk, the match fits the whole string, else it would fit the shortest possible match. This is relevant when used with URL rewriting.

When defining a match - especially when using Starts With or Ends With - be sure what the scope contains. Also have in mind that regular expressions are powerful but may also be confusing. Don't hesitate to contact us, if you have any questions. We are always eager to help.

There are a lot of regular expression testers on the web, just search for "java regex tester". We particularly like to use:

  • Regex101: Supports Java and other programming languages, explains expressions and allows to specify unit tests. For compatible regular expressions the costs of an expression can be estimated by switching from Java to a programming language that displays runtime and step count.
  • Regex Planet: Supports full fledged Java expressions

Contains, Ends With, and Starts With do not support wildcards! Use Is Like or regular expressions for that.

Counting Text Elements

Either words or characters of a text can be counted.

The according matchers for counting words are:

  • Word Count Equals
  • Word Count Does Not Equal
  • Word Count Less Than
  • Word Count Less Than Or Equal
  • Word Count Greater Than
  • Word Count Greater Than Or Equal

The matchers for counting characters are:

  • Char Count Equals
  • Char Count Does Not Equal
  • Char Count Less Than
  • Char Count Less Than Or Equal
  • Char Count Greater Than
  • Char Count Greater Than Or Equal

When counting text elements, keep in mind that the input is usually unified, and that leading whitespace has been removed and repeating whitespace has been replaced by a single space character.

Splitting a text into words may vary by language. This is not considered yet. Instead we use a general approach that should fit most cases.

Punctuation without whitespace is not considered as a word boundary. Therefore "audisto.com" is treated as one word, not two. This holds for abbreviations, too. "E.g." is considered one word, while "e. g." (notice the space after the dot) is counted as two words.

We therefore recommend to test against ranges using Greater Than, Less Than and similar, rather than testing for exactness using Equals.

Number Related Matchers

The following number related matchers are available to compare the input to a given value:

  • Less Than
  • Less Than Or Equal
  • Greater Than
  • Greater Than Or Equal
  • Exists: Short for Greater Than 0
  • Does Not Exist: Short for Equals 0

To compare numbers for equality, use Equals or Does Not Equal.

Note: Passing text into number based matchers leads to an error.

HTML Related Matchers

The following HTML processing matchers are available:

  • Matches XPath: Match is successful if the given XPath has one or more results
  • Matches CSS Selector: Match is successful if the given CSS Selector has one or more results

Each matcher can also be negated by the according negative matches:

  • Does Not Match XPath: Match is successful if the scope does not match the given XPath.
  • Does Not Match CSS Selector: Match is successful if the scope does not match the given CSS Selector.

XPath

HTML Matches XPath "//head/meta[@name='description' and @content='My best ranking meta description']"

We support XPath 1.0 only. Wikipedia has a good overview.

There are some free XPath-Testers on the web, but they usually only support valid XML documents, not HTML. However, most modern browsers support XPath querying within their developer tools right away. In both Chrome and Firefox, you may for example inspect an element, right click, and choose "Copy XPath" to obtains this element's XPath.

CSS Selectors

HTML Matches CSS Selector "form#register input[name=username]:required"

We fully support CSS Selectors Level 3, but with some notable adaptions:

  • Pseudo-elements like ::before or ::after always match, but the deprecated notation :before and :after is not supported
  • Dynamic pseudo-classes like :visited or :hover always match
  • Target pseudo-class :target never matches anything
  • UI element pseudo-class :indeterminate never matches anything
  • Language pseudo-class :lang is defined as the element or one of its parents having an HTML lang attribute with matching value. xml:lang is not taken into consideration

Additionally, these parts of upcoming CSS Selectors Level 4 are already supported:

  • Input pseudo-classes :required and :optional

Content Related Matchers

The following content processing matchers are available:

  • Fingerprint Equals: Content has the given fingerprinting hash
  • Fingerprint Does Not Equal: Content does not have the given fingerprinting hash

Fingerprints

Fingerprints identify content. They are much shorter than the original content, but still are unique. They therefore can be used to ensure a resource stays the same.

Fingerprints are shown on the "Content" tab of a URL report.

Category Related Matchers

There is a matcher for each of the three categories "good", "improvable" and "poor":

  • Is Good: Given category is "good"
  • Is Improvable: Given category is "improvable"
  • Is Poor: Given category is "poor"