Can I create multiple custom rules for a single site?

Sometimes a single site needs multiple custom rules, perhaps due to template differences or because you wish to extract different data from different types of pages.

If you’re creating a completely custom API, you can always create multiple APIs for the same site. For instance:

  • /api/categories for category extraction
  • /api/item for item extraction

These APIs could then be used where needed on the sites that have been customized.

If, however, you need to apply the same API to different parts of the same site, you can customize where your rule is in effect by tailoring your rule’s Domain Regex (URL pattern) in the API Toolkit:

1024

By default when you create a new rule, the Domain Regex will apply it to the entire domain. By writing a customized regular expression, you can determine which subset of the web site will be affected by your rule. For example:

  • Adjusting the default Domain Regex to (http(s)?://)?(.*\.)?diffbot.com/products.* will restrict rules from being applied unless a URL contains diffbot.com/products.
  • Adjusting a Domain Regex to (http(s)?://)?(.*\.)?diffbot.com/company.* will restrict rules to those URLs that contain diffbot.com/company.

Using these different Domain Regex values will allow you to apply multiple rulesets within the same API to the same site.