Regular Expressions in the API Toolkit

Rules in the Custom API Toolkit can optionally include Search/Replace filters. These are regular expression operators that allow you to make changes to the returned output.

A simple example: removing "By: " from author bylines.

A simple example: removing “By: ” from author bylines.

Diffbot’s underlying regular expression engine is in Java, which has some distinctions in its processing compared to other language implementations. For an overview and interactive comparison of operator-level availability, please see http://www.regular-expressions.info/refcharacters.html.

The live-preview (the “Diffy the Robot,” above) operates against the live HTML of the page being previewed, and may differ from the actual output. Be sure to confirm the output after saving your rule.

Specific regular expression notes:

  • The ‘wildcard’ dot-character (.) does not match the newline character, \n. If your output includes a line-break, preface your regular expression with the “single-line mode modifier“: (?s)
  • To remove matching content, simply leave “replace with” blank, as in the example above.
  • Backreferences are ¬†supported. For example, you can prepend text with the replace selector (^.*$) and replacement prefix: $1