Repetition of data can diminish the worth of the content. Working as a writer, you must follow DRY (don’t repeat yourself) principle. The statistics such as word count or the number of occurrences of each word can let you analyze the content but it’s hard to do it manually for multiple documents. So in this article, I’ll demonstrate how to programmatically count words and the number of occurrences of each word in PDF, Word, Excel, PowerPoint, Ebook, Markup, and Email document formats using C#. For extracting text from documents, I’ll be using GroupDocs.Parser for .NET which is a powerful document parsing API.
Steps to count words and their occurrences in C#
1. Create a new project.
2. Install GroupDocs.Parser for .NET using NuGet Package Manager.
3. Add the following namespaces.
4. Create an instance of the Parser class and load the document.
5. Extract the text from the document into a TextReader object using Parser.GetText() method.
6. Split up the text into words, save them into a string array and perform word count.
7. Order the words by their occurrence count and display the results.
Complete Code
Results
Read more about GroupDocs.Parser for .NET API here. Leave your questions or queries on our forum.