Extract Data from Invoices and Receipts in Java

In the era of online businesses, the use of digital invoices and receipts has largely increased. Similarly, the efficient data extraction from these digital invoices is also demanding. In this article, you will be knowing how to extract data from PDF invoices or receipts programmatically in Java. Previously we have seen the extraction of invoice data using C# in one of the earlier posts.

Extract Data from PDF Invoices or Receipts

Document Parsing and Data Extraction Java API

I will be using GroupDocs.Parser for Java to parse PDF invoices and extract data values within Java application. This API also allows extracting text, images, and metadata from documents, images, presentations, archives, email, and many other supported document formats.

Download or Configure

From the downloads section, you may download the JAR file or just get the repository and dependency configurations for the pox.xml of your maven-based Java applications.

How to Extract PDF Invoice Data in Java

The following steps will allow you to easily extract data from the PDF invoices using Java.

  • Create a template.
  • Parse the PDF invoice according to the created template.
  • Extract the information from the parsed PDF.

Create Template for the Invoice

Below is the template that is created according to the invoice. You may also download the used invoice from the sample files available at the GitHub repository.

Parse PDF Invoice/Receipt for Data Extraction

The following lines will parse the PDF invoice according to the created template and extract the invoice data using simple Java code.

The Output

The following is the output of the above code after extraction of data from the invoice.

FROMCOMPANY:    DEMO - Sliced Invoices
FROMADDRESS:    Suite 5A-1204
123 Somewhere Street
Your City AZ 12345
FROMEMAIL:     admin@slicedinvoices.com
TOCOMPANY:    Test Business
TOADDRESS:    123 Somewhere St
Melbourne, VIC 3000
INVOICENUMBER:             Invoice Number
INVOICENUMBERVALUE: NV-3337
INVOICEORDER:                Order Number
INVOICEORDERVALUE:    12345
INVOICEDATE:                    Invoice Date
INVOICEDATEVALUE:        January 25, 2016
DUEDATE:                           Due Date
DUEDATEVALUE:               January 31, 2016
TOTALDUE:                         Total Due
TOTALDUEVALUE:             $93.50

There are many other open-source examples available at GitHub Repository. You can download the code and quickly run the examples. For more guidance and some other ways to use templates for parsing and data extraction in Java, visit the developer guide in the documentation. In case of any further difficulty, reach the support team for free, any time on the forum.

See Also