Can Diffbot APIs Extract Content from PDFs or Other Documents?

As of September 2016 Diffbot’s Automatic APIs are able to structure content from PDF files.

This is a beta functionality and only available in direct API calls—it is not currently possible to process PDFs while using Crawlbot. (PDF URLs will be successfully processed in Bulk Service jobs.)

Quality of PDF extraction varies and depends significantly on the underlying structure of the document itself.