Can Diffbot crawl sites that use “infinite” or “endless” scrolling?

Currently Crawlbot does not interact with sites to retrieve or pursue links that appear when a page is scrolled — so-called “infinite” or “endless” scrolling. Crawlbot will only pursue links that are available upon an initial page load.

(Related: How to find and access Ajax-generated links while crawling.)

In most cases sites will offer alternative means to find the same links:

  • related links (to other posts or products) on individual post or product pages
  • search filters or category links that narrow the number of results
  • a sitemap file (e.g. sitemap.xml) or similar map to individual item pages

If you find a site that is unable to be crawled without page-scrolling, you may be able to improve results via the following approach:

  1. Write custom Javascript via Diffbot’s custom X-Evaluate header, implementing a click or scroll event — or multiple click/scroll events.
  2. Store your X-Evaluate header as a custom rule against the Analyze API for the site in question.
  3. Use the aforementioned method to execute Ajax/Javascript while crawling
  4. .

For assistance with the above, feel free to contact us at support@diffbot.com.