How to correct Article, Product, or other API output with a custom rule

Have you run into a problem where the Diffbot extraction from a particular site is incorrect or needs adjusting? Our API Toolkit not only allows you to create new APIs entirely, but also to override or correct the output returned by our Automatic APIs.

Correcting a field’s output takes immediate effect for your account, and also serves to train our system, improving Diffbot extraction over the long run.

Here’s how to make a correction if you have a problem with a particular site:

Find a problematic URL

Start with a web page that is exhibiting the problem, then visit the API Toolkit in your Developer Dashboard.

Create a rule in the API Toolkit

toolkit1

Select the API you want to correct from the drop-down list, and then “Test” your sample URL’s output.

Optional: adjust the domain-matching for your rule

By default, your rule will apply to any pages whose URLs match the subdomain of the sample URL. In our case, the rule will affect all pages at support.diffbot.com.

(http(s)?://)?(.*\.)?support.diffbot.com.*

To adjust this, click the Change this link. This will provide you a regular expression that can be edited to narrow or broaden your matches. For example, to apply to all pages at diffbot.com:

(http(s)?://)?(.*\.)?diffbot.com.*

To apply only to pages within the “/apitoolkit/” section:

(http(s)?://)?(.*\.)?support.diffbot.com/apitoolkit/.*

Or to apply to all pages at any domain:

.*

Edit the field you wish to correct

The API Toolkit will show a preview of current API output. To correct, click “edit” next to the field you wish to adjust. In our example, we’ll edit the author field, which is hidden for Diffbot support posts.

toolkit2

In the resulting preview window, you can either manually enter a CSS selector, or point-and-click to choose the correct element. A preview of the output will be displayed at the top of the screen.

In our example case, the CSS selector we want is .byline .author:

toolkit3

Click “Save” to save your rule

Once saved, your rule will take immediate effect for API calls (a) using the specified API and (b) matching the domain regular expression.

toolkit4

Any page that doesn’t contain a specified CSS selector will return the default Diffbot API output.

For more advanced techniques, see the following resources: