In a recurring crawl, how do I download only the latest round’s content?

Crawlbot’s recurring-crawl functionality allows you to repeat a crawl to access the latest data on a regular basis. Crawl “rounds” can be started automatically on a schedule, or you can manually start new rounds via the Crawlbot API.

Once you have your repeat crawl running, use the Search API if you want to retrieve only the latest crawled data. By filtering by the timestamp field, you can limit your retrieved output to only that content that’s been processed since the date provided.

For instance, if your latest crawl round started on October 1, 2014, a search using the query min:timestamp:2014-10-01 will retrieve only those objects that were processed on or after the date of the latest round.