How do I set custom headers in API calls or while crawling?

Diffbot supports setting/sending the following custom headers for direct API calls, while crawling, and within bulk processing jobs. These headers will be used when requesting content from third-party sites:

  • User-Agent
  • Referer
  • Cookie
  • Accept-Language

Direct API Calls

To send a custom header in a direct API call (Automatic or Custom APIs), send your desired value using the X-Forward prefix as a header in your request to http://api.diffbot.com. For example, to have Diffbot use a Referer of “http://www.diffbot.com” in requests, send a header of X-Forward-Referer=http://www.diffbot.com in your API call.

More details can be found within the documentation of individual APIs at https://www.diffbot.com/dev/docs.

While Crawling or Bulk Processing

If you wish to use custom headers while crawling or in the processing of a bulk job, your headers need to be sent in your initial request to http://api.diffbot.com/v3/bulk or http://api.diffbot.com/v3/crawl. Thereafter your custom headers will be used while crawling for links, and in any extraction processing.

To send custom headers in your Crawlbot or Bulk service API requests, send multiple customHeaders values in your POST body (Bulk jobs or Crawlbot jobs) or GET request (Crawlbot only). Headers should be delimited by a colon and URL-encoded:

&customHeaders=Referer%3Ahttp%3A%2F%2Fwww.diffbot.com&customHeaders=Accept-Language%3Aen-us