Improved Text Area Extraction for PDF Documents in GroupDocs.Parser for Java 18.11

GroupDocs.Parser for Java
We are delighted to announce the release of GroupDocs.Parser for Java 18.11. The latest version came up with one new feature and three enhancements. It allows you to get information about the supported extractors for a document. Furthermore, we have improved the text area extraction for the PDF documents. For more details, please have a look at the release notes of version 18.11.

Features Introduced

Getting Information of Supported Extractors for a Document

This feature helps to get the information about the supported extractors for a document. For example, you can check if you can extract the plain text, formatted text, and metadata from a particular document. Furthermore, you can also check if the document is a container that contains other documents in it. For working example of this feature, please refer to this documentation article.

Enhancements

IFastTextExtractor Interface

GroupDocs.Parser allows changing the default behavior of text extraction. By default, the text is extracted using the Standard Extract mode. In Standard Extract mode, the text is extracted with better quality but it takes more time. This enhancements allows setting the fast text extraction via IFastTextExtractor interface. The support for IFastTextExtractor interface is added to the following classes:
  • PdfTextExtractor
  • CellsTextExtractor
  • SlidesTextExtractor
For working example of this feature, please refer to this documentation article.

IDocumentContentExtractor Interface

This enhancement allows getting the access to Text Analysis API via IDocumentContentExtractor interface. The support for IDocumentContentExtractor interface is added to the following classes:
  • PdfTextExtractor
  • CellsTextExtractor
  • SlidesTextExtractor
  • WordsTextExtractor
For working example of this feature, please refer to this documentation article.

Improved Text Area Extraction for PDF Documents

This enhancement improves the text area extraction for PDF documents. In the latest version, the Y-coordinates of text areas start from the top of the page.

Available Channels and Resources

Here are a few channels and resources for you to download, learn, try and get technical support on GroupDocs.Parser:

Feedback

As always, if you have any questions or suggestions, feel free to write on our forum.