This does not mean Diffbot is unable to process the page, it merely means it’s unable to display it in the preview window. You can still use selectors normally for your custom fields – it’ll just take a little longer as you now have to find out which selectors you need manually in a regular browser session, and then use those selectors in the Custom API user interface. Let’s go through an example.
The page kreo.net is failing to render, as evident by the screenshot below.
Let’s assume we want to extract the second paragraph of the text in every Kreo blog article into a field called "secondary-intro", but we cannot preview the site and click on the paragraph.
We open the page in a normal browser session, right click on the element we’re interested in, and click "Inspect". The developer tools will open, probably targeting our desired element. If not, you can use the list of elements now opened to expand the parent elements into children and move your mouse cursor over them and find the desired element…
… or even click the top-most "inspect element" button which turns your cursor into a pointer which you can use to select elements.
Once you’ve managed to target the specific element you’re looking to extract, right click on it in the element list and select
Selector. Paste that selector into the custom field you’re building.
Clicking save will reveal that the paragraph is now being extracted.
In this case, our automatically copied selector is fine but sometimes we need more precision. Don’t be afraid to experiment with selectors – nothing can go wrong, and a bit of CSS is all you need. For example, the selector that got copied for us:
#hs_cos_wrapper_post_body > p:nth-child(3) might look scary, but in reality means: "in the HTML element with the ID
hs_cos_wrapper_post_body, if there are several direct (
>) child elements of the type
p (meaning paragraph), select the third one (
Note that we’re using the third one despite the paragraph being the second one because the first paragraph is occupied by the header image.