Text and Metadata Extraction APIs for Java Applications Accurately Parse MS Office, Emails, Zip and Legal Documents
GroupDocs.Parser is a Java document parsing and text extraction API for analyzing documents thus allowing users to extract metadata, raw and formatted text from the supported document formats (Word, Excel, PowerPoint, PDF, OneNote, Visio, Text, HTML, Zip, Markdown and Email messages). The API performs content parsing operations with highest level of accuracy and speed.
Also available for: .NET
Product News
Add Watermark Annotation to Microsoft Excel and Word in .NET
GroupDocs.Annotation for .NET now adds new types of annotations for Microsoft Excel spreadsheets like Text Replacement, Resource Redaction and Watermark. You can also add watermark annotation to Word documents. Read more details here.
Substitute Specific Fonts when Converting between .NET Office Documents
GroupDocs.Conversion for .NET supports explicit font substitution when converting from Microsoft Word, Excel and Presentation documents. For converting Email specific options – EmailOptions can be defined in any SaveOptions class such as PdfSaveOptions or CellsSaveOptions. Read more details here.
Dynamically Change the Color of Chart and Point Series While Generating Documents
GroupDocs.Assembly now supports using null-conditional operators ?. and ?[]. You can also change the color of chart series and individual point series dynamically for email messages with HTML and RTF bodies within .NET and Java applications. For a chart with dynamic data, you can set colors of chart as well as point series dynamically based upon expressions.
Merge between different Diagram File Formats in .NET
GroupDocs.Merger for .NET API allows merging multiple files into one document. The latest version supports new diagramming formats (VSDX, VSDM, VSSX, VSSM, VTX, VSTM, VDX, VSTX, VSX) for multiple methods such as Join, MovePage or RemovePage. Read more details here.
From the Library
Code Example: Render ISFF-based DGN Files in .NET Applications
GroupDocs.Viewer for .NET has extended the list of supported file formats by adding the support of ISFF-based DGN (V7) file format. Using GroupDocs.Viewer allows viewing multi format documents in HTML, image, PDF or in original format within C#, WPF and ASP.NET applications. Read more details here.
Code Example: Update and Remove IPTC Metadata in PSD File
GroupDocs.Metadata allows manipulating all popular formats meta information. The latest version now supports updating and removing IPTC metadata in PSD file format within .NET and Java applications. Moreover, the memory consumption is now reduced to perform metadata operations for PSD and MP3 file format.
Code Example: Manually Break Indexing Operations within Large Documents in .NET
GroupDocs.Search for .NET allows adding advance level of indexing and searching capabilities to retrieve full text and metadata from business document formats. Using GroupDocs.Search API you can break indexing operation manually. The break is not instantaneous and in cases of indexing large documents, the breaking can take about a second. Read more details here.
Code Example: Sign Documents with Stamp Signatures
GroupDocs.Signature for Java API now supports signing documents with stamp signatures either in a round or a square shape within all types of Java applications. This version allows working with new PowerPoint formats (OTP, POTX, POTM, PPSM) and enable users to add and search custom object to QR-Code signatures. Read more details here.
Code Example: Extract Text Areas from Document Pages for Text Analysis
GroupDocs.Parser is a .NET API for text extraction for analyzing documents. The latest version supports extracting text areas from document pages that can be helpful for getting data for text analysis. To extract text areas, text extractors implement their own internal private class and provide DocumentContent property. Read more details here.
Code Example: Locking Watermark and Protecting or Unprotecting Word Documents
Feedback
How Can We Help You?
Do you have ideas for what you’d like to see us do in the coming months or have any questions for us? Reply to our newsletter or share your thoughts via the forums. We’ll be happy to hear!
Product Releases and Updates
GroupDocs.Total for .NET – The latest versions of GroupDocs .NET APIs packaged into one product suite. GroupDocs.Total for Java – The latest versions of GroupDocs Java APIs packaged into one product suite. GroupDocs.Conversion for Java 18.6.1 – Hide mail header when converting MSG to PDF. GroupDocs.Comparison for .NET 18.7 – Implement group shapes and GluedShaped in diagrams along with comparing different formats as image. Check out for more releases during last month