The Document Structure Analysis extracts different sections of a given document with markup content (which includes formatted documents such as PDF or Microsoft Word files), including the title, headings, abstract and parts of an email.

This process, even though it takes into account some language markers, is based mainly in the markup of the document, so it can be applied to documents in any language.

This API is currently in a beta version! Send us feedback and help us improve!


Everything and anything you need to take advantage of this API's full potential.

Test Console

Choose an input and a configuration, and immediately check the results!

Developer Tools

Do you want to integrate this API into your environment? Check our Developer Tools!


Version Date Status
1.0 26/December/2018

1.0.1 (26/December/2018)

  • Minor bugs and bug with headings order in HTML documents have been fixed.

1.0 (08/August/2017)

  • Initial version.

Click on the version number to see the changelog.

Related Links

Contact Us

Do you have any questions? Have you detected a bug? Contact us through our feedback section or at