Find and Replace Text in PDF using C#

Templates are widely used to generate customized documents. This article guides about how to find and replace text and words in PDF documents using C#. We will separately discuss how to programmatically replace words and phrases, replacement of words with case-sensitive search, replacing using regular expressions. Finally, we will also learn how to hide the searched string using C#.

The following topics are going to be covered below:

.NET Redaction API for Replacing Text

GroupDocs showcases GroupDocs.Redaction for .NET, the API to redact, hide, or remove content & even metadata of documents, presentations, spreadsheets, PDF files, and images within .NET application. For further details about the API, visit its documentation.

You can download the DLLs or MSI installer from the downloads section or install the API in your .NET application via NuGet.

PM> Install-Package GroupDocs.Redaction

No need to install any PDF editor, or any other third-party software for redaction. The following is the screenshot of a PDF document that is used in the below examples. The same approach will work for other document formats with hardly very little or no change in the code.

Find and Replace Word or Phrase in PDF using C#

You can use this feature to hide any confidential data, and also to create a new customized document from the template. The following step explains how to find any word/phrase in a PDF document with some other text within the C# application.

The following code finds and replaces the word in C#. More precisely, it hides all the occurrences of “John Doe” by replacing it with the word “[censored]”.

The output of the code is as follows.

Find and Replace Case-Sensitive Text or Phrase in PDF using C#

You can perform the case-sensitive search & redaction. The following code replaces the case-sensitive existence of the word “John Doe” but not “john doe” in C#.

The output of the code is as follows.

Replace Text in PDF with Regular Expressions (RegEx) using C#

You can also replace any specific text pattern using regular expressions. The following steps allow you to redact PDF after the search using regular expression (RegEx) within your .NET application.

The following code shows how to find a certain text pattern in a PDF document using RegEx and later replace/hide it with some other text using C#.

The output of the above code is as follows.

Replace the Text with Colored Box in C#

If you just want to hide the searched content (private information) of your PDF file, you can simply put a cover on it. The API allows you to hide the searched text. The following C# code places the black rectangle over the mentioned private text.

The output of the above code is as follows.

Get a Free API License

You can get a free temporary license in order to use the API without the evaluation limitations.

Conclusion

To conclude, we learned how to find certain text in PDF files using different search techniques. Later we discussed how to redact PDF files either by replacing or hiding the text within the .NET application using C#. More precisely, we simply searched for the words, phrases, search with case sensitivity, and by using regular expressions in C#. Lastly, we replaced the search results with either some other text or by hiding it with a rectangle box over it.

For more details about the API, visit the documentation. For queries, contact us via the forum.

See Also