Today, we will have a look at how to find and remove watermarks from documents in C#. There can be text and image-based watermarks in a document. We can easily search and programmatically remove such watermarks from many PDF, Word, Excel, PowerPoint, and Visio supported documents.
Following topics will be covered in this article:
- .NET API for removing watermarks
- Find watermarks in documents using C#
- Remove watermarks from documents in C#
.NET API for Watermark Removal
GroupDocs.Watermark for .NET is a fast and efficient watermarking API that requires no additional software. It allows adding watermarks to documents and images in such a way that would be hard for third-party tools to remove. It also lets C# developers easily remove watermarks from many Microsoft and OpenOffice file formats of word-processing documents, spreadsheets, presentations, Visio drawings, and PDF documents in .NET applications. All the supported file formats are mentioned in the documentation.
Now, I will be showing examples that will be finding and removing watermarks. So, it will be better if you prepare the environment beforehand by following any of the suitable options:
- NuGet
- Direct Download: MSI installer and DLLs
- Package Manager Console:
PM> Install-Package GroupDocs.Watermark
Find Watermarks in Documents using C#
Watermarker, PossibleWatermarkCollection (the collection of PossibleWatermark) are the classes of the API to find various kinds of watermarks in documents with various search criteria and remove them quickly. Following are the steps for the basic search of all the watermarks in any provided document using C#. You can further refine your search for watermarks and this is shown later in this article.
- Create the Watermarker class object with the source document file.
- Call the Search method. It will return all the possible watermarks from the document.
- Traverse the watermarks collection to display data or perform any action on each watermark.
Remove Watermarks from Documents in C#
From all the searched watermarks, we can remove any watermark or all the watermarks at once. The main thing here, whether you have successfully found the watermark(s) that you want to delete or not. What if there are lots of different types of watermarks in a document? The API gives various options to refine your search for watermarks. The following code removes the watermark from a PDF document by specifying the index of collection using C#.
More Search Criteria for Watermarks
There are many other ways to find watermarks with certain criteria. After the selective search, we can remove the watermark(s) from the collection by using Remove, RemoveAt, or Clear method accordingly. Here are some of the ways to find watermarks from the provided documents:
- Find and remove watermarks with specific text
- Search watermarks with RegEx (Regular Expression) and remove
- Search watermark with specified text formatting
- Find and remove hyperlink watermarks
Find and Remove Watermarks with Specific Text
You can search for text watermarks by specifying the exact string using the following C# code:
// Find possible watermarks containing the specified text TextSearchCriteria textSearchCriterion = new TextSearchCriteria("© 2020"); PossibleWatermarkCollection possibleWatermarks = watermarker.Search(textSearchCriterion);
Search for Watermarks with RegEx and Remove
If there is some pattern in the watermark’s text, you can provide regular expression (RegEx) to search for these watermarks and can remove later accordingly using the following C# code. This code will fetch all the watermarks with ©YYYY.
// Search Watermarks by Regular Expression Regex regex = new Regex(@"^© \d{4}$"); TextSearchCriteria textSearchCriterion = new TextSearchCriteria(regex); PossibleWatermarkCollection possibleWatermarks = watermarker.Search(textSearchCriterion);
Find and Remove Watermarks with Specific Text Formatting
You can also find the watermarks having some specific text formatting like Font name, min/max font size, bold/italic/underlined, etc.
TextFormattingSearchCriteria criterion = new TextFormattingSearchCriteria() { FontName = "Arial", MinFontSize = 19, MaxFontSize = 42, FontBold = true }; PossibleWatermarkCollection watermarks = watermarker.Search(criterion); watermarks.Clear();
Find and Remove Hyperlink Watermarks
You can use the RegEx to find text watermarks having hyperlinks in the content. Later you can check in the collection if there are hyperlink watermarks in the search result. These can be removed by any of the removal methods. The following C# code removes all the watermarks with hyperlinks.
PossibleWatermarkCollection watermarks = watermarker.Search(new TextSearchCriteria(new Regex(@"anyurl\.com"))); for (int i = watermarks.Count - 1; i >= 0; i--) { // Is watermark the hyperlink? if (watermarks[i] is HyperlinkPossibleWatermark) { watermarks.RemoveAt(i); } }
There are many other ways to refine your search for watermarks. You can visit documentation for more detail. For queries, visit the forum.
Conclusion
I believe that you will now be more confident in finding and removing text watermarks as well as image watermarks from Word documents, Excel spreadsheets, Powerpoint presentations, PDF documents, and Visio drawings using C# within your .NET applications.