Fixing a misidentified page type with Analyze API

Sometimes Diffbot’s Analyze API might misidentify a page as an unsupported type. In such cases, the page type might read other and not return any extracted content. There’s a way to override this.

  1. Go to the Custom API UI
  2. Preview the page being misidentified:

    Screen Shot 2018-10-28 at 02.50.24

  3. Edit the type field by clicking on Edit.

    Screen Shot 2018-10-28 at 02.51.43

  4. Enter the literal value of the page type. One of five is supported: product, article, video, image, discussion.

Save and retry processing the page. Your rule will now see the page matching this rule’s domain regex as the type you defined.

Screen Shot 2018-10-28 at 02.53.02