Word Search and Replace Text in Word Documents using Java

In one of the articles, we have already discussed how to redact words in documents as a .NET developer. The strategy is used in many ways to erase sensitive content, hide or remove private information like email addresses or identification numbers. This article discusses how to perform word search in Word DOC/DOCX documents in Java. We will separately discuss how to find and replace the text, words, or phrases with different techniques using Java API for redaction.

The following topics are going to be covered below:

Java API for Word Search and Replacing Text

GroupDocs provides a Java redaction API that allows finding and replacing the content of MS Word supported files and other documents of various other file formats. In addition to the text redaction and rasterization, the API supports metadata, annotation, spreadsheet, and also the images redaction features. The supported file formats of the Word documents, spreadsheets, presentations, images, and PDF documents are available at the documentation.

Download or Configure

You may download the JAR file from the downloads section, or just get the latest repository and dependency configurations for the pox.xml of your maven-based Java applications.

<repository>
	<id>GroupDocsJavaAPI</id>
	<name>GroupDocs Java API</name>
	<url>https://repository.groupdocs.com/repo/</url>
</repository>
<dependency>
        <groupId>com.groupdocs</groupId>
        <artifactId>groupdocs-redaction</artifactId>
        <version>21.12</version> 
</dependency>

MS Word or any other third-party software is not required for the redaction process. Let’s now start with different approaches to deal with search and replace text. The following is the screenshot of a Word document that is used in the below examples. You can use the same methods for other document formats as well with very little or no change in the source code.

Document to redact text

Find and Replace Words or Phrase using Java

The following steps explain how to find and then replace the occurrences of a word/phrase in a Word document within the Java application.

  • Load the DOC/DOCX file using Redactor class.
  • Find the exact phrase or word, using the ExactPhraseRedaction and ReplacementOptions classes.
  • Use apply method of Redactor to apply redaction.
  • To save the file at different location after making changes, use the output stream.
  • Save the redaction changes using the save method.

The following code finds and replaces the word “John Doe” in the above Word document using Java. It replaces all the occurrences of “John Doe” with the word “[censored]”.

The output of the code is as follows.

Redact using Exact Phrase

Case-Sensitive Word Search and Replace Text in Java

You seem cautious about the exact letter case of the word and only want to replace the word that only matches your case-sensitive search. The following code replaces the existence of the exact case match of the word “John Doe” in Java.

The output of the code is as follows.

Case sensitive redaction

Replace Text using Regular Expressions (RegEx) in Java

If you do not want to change the exact word but some pattern that exists in your document, you can use the Regular expressions. The following steps allow you to find and replace any pattern of text using regular expressions (RegEx) within your Java applications.

  • Load the document using Redactor class.
  • Create the RegEx using the RegexRedaction.
  • Provide the text using ReplacementOptions to replace the RegEx match.
  • Use apply method replace all the regex matches.
  • Use the save method to get the redacted document.

The following code shows how to perform the word search in a Word file using RegEx and replace it with some other text using Java.

The following is the output of the above code:

RegEx Redaction

Replace the Text with Colored Box in Java

If you do not want to replace your content and just want to hide it, the API allows you to cover to text match by drawing a box over it. The following Java code hides the text with the black rectangle box.

The output of the above code is as follows.

Hide Text using Box

Get a Free API License

You can get a free temporary license in order to use the API without the evaluation limitations.

Conclusion

To sum up, you learned how to perform word search to find text in Word documents using exact text phrase search, case-sensitive search, search using regular expressions, and last but not least hiding the text instead of replacing it. You can use these different techniques to replace the findings in different ways within MS Word documents.

For more details and learning about the API, visit the documentation. For queries, contact us via the forum.

See Also