Category Archive: GroupDocs.Parser Product Family

Official blog with announcements of latest supported features, hot fixes, technical articles, tips and videos of GroupDocs.Text – A text extraction API for .NET.

Extract ZIP Files Data in Java

ZIP Archives are one of the most popular and commonly used compressed file formats. The main reason for using ZIP files is to reduce the total file size and to send multiple files as a single archive. As a developer, you can extract the text, images, and even metadata from the files that are compressed within ZIP archives. In this article, we will discussĀ how to extract the ZIP archives data in Java.

The following topics are covered in this article:

  • Java API for ZIP files data extraction.
  • How to extract ZIP files data using Java.
Posted in GroupDocs.Parser Product Family | Tagged , , ,

Extract ZIP Files Data in C#

Archives like ZIP, RAR, TAR, GZIP, BZIP2 are commonly used to store more than one file and folder in a single container. Another main reason for archive files is to reduce the total file size using compression algorithms. Just like parsing and extracting data from documents of various file formats, you can treat the archive files in the same way. You can extract the text, images, and even metadata from the files that are compressed within the archives. In this article, we will discuss how to extract the ZIP archives data using C# with your .NET applications.

The following topics are covered in this article:

  • .NET API for ZIP files data extraction
  • How to extract ZIP files data
Posted in GroupDocs.Parser Product Family | Tagged , ,

Extract Images from EPUB, FB2, CHM eBooks in Java

eBooks of various formats are very common in everyday use. The eBook can contain text as well as images. If you want to use the images of any eBook elsewhere, you can get these easily extracted programmatically within your Java application. In this article, you will learn to automate, how to extract images from eBook files such as EPUB, PDF, FB2, CHM in Java.

The following topics will be covered below:

  • Java API - Image Extraction from eBooks
  • Extract Images from EPUB eBook in Java
  • Extract Images from PDF, FB2, CHM eBooks in Java

Continue Reading ...

Posted in GroupDocs.Parser Product Family | Tagged , , , , ,

Extract Images from EPUB, FB2, CHM eBooks in C#

An electronic book, popularly known as eBook, is a book in digital form that is readable on various electronic devices. These devices include dedicated eReaders like Kindle, or laptops, desktop computers, and smartphones. There are many popular file formats of eBooks in-use in the market that include; EPUB, FictionBook FB2, Microsoft Compiled HTML Help - CHM, DjVu, MOBI, PDF, and many others. As a programmer, this article will help you to programmatically extract images from eBooks in C# within .NET applications.

Extract Images from eBooks in C# .NET
EPUB eBook from the Adobe Sample eBook Library

The following topics will be covered in this article:

  • .NET API for Image Extraction from eBooks
  • Extract Images from EPUB eBook in C#
  • Extract Images from FB2, CHM eBooks in C#

Continue Reading ...

Posted in GroupDocs.Parser Product Family | Tagged , , , , ,

Extract Data from Invoices and Receipts in Java

In the era of online businesses, the use of digital invoices and receipts has largely increased. Similarly, the efficient data extraction from these digital invoices is also demanding. In this article, you will be knowing how to extract data from PDF invoices or receipts programmatically in Java.

Continue Reading...

Posted in GroupDocs.Parser Product Family | Tagged , , ,

Read PDF Form Fields using C#

In this article, we will learn how to read and parse PDF documents and then programmatically extract PDF form field values in C#. Earlier, we have seen how to extract values from PDF forms in Java. After reading these articles, if you have filled feedback forms, you can extract the values within your .NET & Java applications for analysis or save them in the database.

Parse PDF Forms to Extract values in C#

Continue Reading

Posted in GroupDocs.Parser Product Family | Tagged , ,

Read PDF Form Fields in Java

In this article, we will discuss how to parse PDF document and extract values from PDF forms programmatically in Java. There are many situations, where we have several filled survey forms or feedbacks in PDF format from a large audience. We can easily extract the filled data values and use them for analysis. Let us now move straight towards reading these PDF forms and extract filled data field values within Java applications.

Continue Reading

Posted in GroupDocs.Parser Product Family | Tagged , ,

Extract Images from Documents using C#

In this article, we will be learning to programmatically extract images from PDF, Excel, PowerPoint, and Word documents in a C# application using document parsing .NET API.

GroupDocs.Parser for .NET is document parsing and data extraction .NET API. It supports document parsing and extraction of images, text, and metadata from word-processing documents, spreadsheets, presentations, archives, and email documents.

Extracted images can be saved in BMP, GIF, JPEG, PNG, and WebP formats.

Posted in GroupDocs.Parser Product Family | Tagged , , , ,

Extract Images from Documents using Java

Today, we will learn to programmatically extract images from PDF, Excel, PowerPoint, and Word documents using Java. For the extraction of images, we will use GroupDocs.Parser for Java. This Java API supports the parsing of documents and extraction of images, text, and metadata from word-processing documents, spreadsheets, presentations, archives, and email documents. Extracted images can be saved in BMP, GIF, JPEG, PNG, and WebP formats.

Following topics will be covered in this article:
  • Image Extraction Java API
  • Image Extraction from PDF documents in Java
  • Extract Images from Word, Excel, PowerPoint documents in Java
  • Extract Image from Specific Page in Java
Posted in GroupDocs.Parser Product Family | Tagged , , ,