How Diffbot handles multiple-page articles and discussions

Diffbot’s Article and Discussion APIs allow for automatic page concatenation: the ability to string-together multiple pages into a single response.

The Article API by default will automatically concatenate multiple page articles — up to twenty pages total — into single ‘text’ and ‘html’ responses, and media items from multiple pages into the ‘images’ and ‘videos’ arrays.

To disable this functionality, pass paging=false in your Article API request.

The Discussion API will not concatenate by default. If you wish to enable concatenation, use the maxPages argument to define the maximum number of pages you wish to be returned in a response. Use maxPages=all to return all pages regardless of length.

When an article or discussion thread had multiple pages concatenated, you will see two additional fields in your default response:

  • numPages:  number of pages in total concatenated to form the full output
  • nextPages: a list of additional URLs that were extracted

On occasion a site’s unique pagination design or terminology will confuse our concatenator. In this case you can add the concatenation functionality for this particular site using our Custom API Toolkit, located in the Developer Dashboard.

Read about creating a rule for the nextPage field here: Automatically concatenating pages using the ‘nextPage’ field.