Find and Remove Watermarks from Documents in C#

Today, we will have a look at how to find and remove watermarks from documents in C#. There can be text and image-based watermarks in a document. We can easily search and programmatically remove such watermarks from many PDF, Word, Excel, PowerPoint, and Visio supported documents.

Following topics will be covered in this article:

Find and Remove Watermarks from Documents using GroupDocs API

.NET API for Watermark Removal

Watermark API for .NET - GroupDocs

GroupDocs.Watermark for .NET is a fast and efficient watermarking API that requires no additional software. It allows adding watermarks to documents and images in such a way that would be hard for third-party tools to remove. It also lets C# developers easily remove watermarks from many Microsoft and OpenOffice file formats of word-processing documents, spreadsheets, presentations, Visio drawings, and PDF documents in .NET applications. All the supported file formats are mentioned in the documentation.

Now, I will be showing examples that will be finding and removing watermarks. So, it will be better if you prepare the environment beforehand by following any of the suitable options:

PM> Install-Package GroupDocs.Watermark

Find Watermarks in Documents using C#

Watermarker, PossibleWatermarkCollection (the collection of PossibleWatermark) are the classes of the API to find various kinds of watermarks in documents with various search criteria and remove them quickly. Following are the steps for the basic search of all the watermarks in any provided document using C#. You can further refine your search for watermarks and this is shown later in this article.

  • Create the Watermarker class object with the source document file.
  • Call the Search method. It will return all the possible watermarks from the document.
  • Traverse the watermarks collection to display data or perform any action on each watermark.

Remove Watermarks from Documents in C#

From all the searched watermarks, we can remove any watermark or all the watermarks at once. The main thing here, whether you have successfully found the watermark(s) that you want to delete or not. What if there are lots of different types of watermarks in a document? The API gives various options to refine your search for watermarks. The following code removes the watermark from a PDF document by specifying the index of collection using C#.

More Search Criteria for Watermarks

There are many other ways to find watermarks with certain criteria. After the selective search, we can remove the watermark(s) from the collection by using Remove, RemoveAt, or Clear method accordingly. Here are some of the ways to find watermarks from the provided documents:

  • Find and remove watermarks with specific text
  • Search watermarks with RegEx (Regular Expression) and remove
  • Search watermark with specified text formatting
  • Find and remove hyperlink watermarks

Find and Remove Watermarks with Specific Text

You can search for text watermarks by specifying the exact string using the following C# code:

 // Find possible watermarks containing the specified text
TextSearchCriteria textSearchCriterion = new TextSearchCriteria("© 2020");
PossibleWatermarkCollection possibleWatermarks = watermarker.Search(textSearchCriterion);

Search for Watermarks with RegEx and Remove

If there is some pattern in the watermark’s text, you can provide regular expression (RegEx) to search for these watermarks and can remove later accordingly using the following C# code. This code will fetch all the watermarks with ©YYYY.

// Search Watermarks by Regular Expression
Regex regex = new Regex(@"^© \d{4}$");
TextSearchCriteria textSearchCriterion = new TextSearchCriteria(regex);
PossibleWatermarkCollection possibleWatermarks = watermarker.Search(textSearchCriterion);

Find and Remove Watermarks with Specific Text Formatting

You can also find the watermarks having some specific text formatting like Font name, min/max font size, bold/italic/underlined, etc.

TextFormattingSearchCriteria criterion = new TextFormattingSearchCriteria()
{
    FontName = "Arial",
    MinFontSize = 19,
    MaxFontSize = 42,
    FontBold = true
};
PossibleWatermarkCollection watermarks = watermarker.Search(criterion);
watermarks.Clear();

Find and Remove Hyperlink Watermarks

You can use the RegEx to find text watermarks having hyperlinks in the content. Later you can check in the collection if there are hyperlink watermarks in the search result. These can be removed by any of the removal methods. The following C# code removes all the watermarks with hyperlinks.

PossibleWatermarkCollection watermarks = watermarker.Search(new TextSearchCriteria(new Regex(@"anyurl\.com")));
for (int i = watermarks.Count - 1; i >= 0; i--)
{
    // Is watermark the hyperlink?
    if (watermarks[i] is HyperlinkPossibleWatermark)
    {
        watermarks.RemoveAt(i);
    }
}

There are many other ways to refine your search for watermarks. You can visit documentation for more detail. For queries, visit the forum.

Conclusion

I believe that you will now be more confident in finding and removing text watermarks as well as image watermarks from Word documents, Excel spreadsheets, Powerpoint presentations, PDF documents, and Visio drawings using C# within your .NET applications.

See Also