- Numerous improvements to
normalizedSpecs
in the Product API. - Diffbot Automatic APIs now process PDFs. PDF URLs will be converted to HTML and then analyzed for extractable content. PDFs are not currently supported while crawling.
- Crawlbot fixes to reduce DNS errors when starting new crawls or crawl rounds.
- Crawlbot and Bulk Processing: deletion of a nonexistent job will no longer return a "success" message.
- Improved handling of UTF-8 encoded characters within Crawlbot.