Word Search and Replace Text in PDF in Java

Templates are widely used to generate personalized documents by replacing the template keys with respective values. This article guides about how to find and replace text and words in PDF documents in Java. We will separately discuss how to perform words and phrases search, case-sensitive word search, replacing the found text using regular expressions. Finally, we will learn how to hide the searched part of text using Java.

The following topics are going to be covered below:

Java Redaction API for Replacing Text

GroupDocs provides Java API for applying various types of redactions. It allows to redact, hide, or remove the content & even metadata of documents, presentations, spreadsheets, PDF files, and images within the application. For further details about the API, visit its documentation.

Download or Configure

You may download the JAR file from the downloads section, or just get the latest repository and dependency configurations for the pox.xml of your maven-based Java applications.

<repository>
	<id>GroupDocsJavaAPI</id>
	<name>GroupDocs Java API</name>
	<url>https://repository.groupdocs.com/repo/</url>
</repository>
<dependency>
        <groupId>com.groupdocs</groupId>
        <artifactId>groupdocs-redaction</artifactId>
        <version>21.12</version> 
</dependency>

One of the good things is that there is no need to install any PDF editor, or any other third-party software for PDF redaction. The following is the content of the PDF document that is used in the below examples for redaction. The same approach will work for other document formats with hardly any difference in the source code.

Find and Replace Word or Phrase in PDF in Java

You can use this feature to hide any private data, and also to create a new customized document from any template. The following step explains how to find any word/phrase in a PDF document and replace it with some other text within the Java application.

The following code finds and replaces the words in a PDF file using Java. More precisely, it hides all the occurrences of “John Doe” by replacing it with the word “[censored]”.

The output of the above code is as follows.

Find and Replace Case-Sensitive Text or Phrase in PDF using Java

You can perform the case-sensitive search & redaction. The following code replaces the case-sensitive occurrence of the word “John Doe” but not “john doe” within a PDF document using Java.

The output of the code is as follows.

Replace Text in PDF with Regular Expressions (RegEx) in Java

Similarly, you can replace any specific text pattern using regular expressions. The following steps allow you to redact PDF after searching using regular expression (RegEx) within your Java applications.

  • Load the PDF document using Redactor class.
  • Find the regex match using the RegexRedaction class with ReplacementOptions.
  • Apply in the changes to document using apply() method.
  • Save the redacted document using appropriate save() method.

The following Java code shows how to find a certain text pattern in a PDF document using RegEx and later replace/hide it with some other text.

The output of the above code is as follows.

Replace the Text with Colored Box in Java

If you just want to hide the searched confidential information within your PDF file, you can simply put a cover on it. The API allows you to hide the searched text. The following code places the black rectangle over the mentioned private text in Java.

The output of the above code is as follows.

Get a Free API License

You can get a free temporary license in order to use the API without the evaluation limitations.

Conclusion

To sum up, we learned how to find certain text in PDF files using different search techniques. Later we redacted the PDF files by replacing or hiding the text within the applications in Java. More precisely, we performed a simple search for the words, phrases, searched with case sensitivity, and by using RegEx in Java. Lastly, we changed the search results with either some other text or by simply hiding it with color over it.

For more details about the API, visit the documentation. For queries, contact us via the forum.

See Also