In the era of online businesses, the use of digital invoices and receipts has largely increased. Similarly, the efficient data extraction from these digital invoices is also demanding. In this article, you will be knowing how to extract data from PDF invoices or receipts programmatically in Java. Previously we have seen the extraction of invoice data using C# in one of the earlier posts.
Document Parsing and Data Extraction Java API
I will be using GroupDocs.Parser for Java to parse PDF invoices and extract data values within Java application. This API also allows extracting text, images, and metadata from documents, images, presentations, archives, email, and many other supported document formats.
Download or Configure
From the downloads section, you may download the JAR file or just get the repository and dependency configurations for the pox.xml of your maven-based Java applications.
How to Extract PDF Invoice Data in Java
The following steps will allow you to easily extract data from the PDF invoices using Java.
- Create a template.
- Parse the PDF invoice according to the created template.
- Extract the information from the parsed PDF.
Create Template for the Invoice
Below is the template that is created according to the invoice. You may also download the used invoice from the sample files available at the GitHub repository.
Parse PDF Invoice/Receipt for Data Extraction
The following lines will parse the PDF invoice according to the created template and extract the invoice data using simple Java code.
The Output
The following is the output of the above code after extraction of data from the invoice.
FROMCOMPANY: DEMO - Sliced Invoices FROMADDRESS: Suite 5A-1204 123 Somewhere Street Your City AZ 12345 FROMEMAIL: admin@slicedinvoices.com TOCOMPANY: Test Business TOADDRESS: 123 Somewhere St Melbourne, VIC 3000 INVOICENUMBER: Invoice Number INVOICENUMBERVALUE: NV-3337 INVOICEORDER: Order Number INVOICEORDERVALUE: 12345 INVOICEDATE: Invoice Date INVOICEDATEVALUE: January 25, 2016 DUEDATE: Due Date DUEDATEVALUE: January 31, 2016 TOTALDUE: Total Due TOTALDUEVALUE: $93.50
There are many other open-source examples available at GitHub Repository. You can download the code and quickly run the examples. For more guidance and some other ways to use templates for parsing and data extraction in Java, visit the developer guide in the documentation. In case of any further difficulty, reach the support team for free, any time on the forum.