How are repeating/recurring crawls scheduled?

Crawl allows for the creation of repeating or recurring crawls, which can be used to regularly monitor sites for content updates or to regularly find new content.

Repeats can be scheduled at the frequency you choose, with 1.0 equaling daily / “every 24 hours.” To re-crawl a site weekly, you would specify 7.0. To crawl ten times per day: 0.1.

Each time a crawl repeats is a crawl “round.” A new round will start based on the time frequency specified after the conclusion of the prior round. To illustrate this, assume the following crawl settings:

  • We schedule a crawl to repeat daily (1.0).
  • We start the crawl at 12:00pm on January 1.
  • The initial round takes four hours.

Based on the above, the second crawl round will start at 4:00pm on January 2: 24 hours after the conclusion of the first round.

If you wish to have more specific control over your crawl round start times, you can use the roundStart argument in the Crawl API to manually start a new crawl round; or you can click “Start a new round” within the Crawl interface.