Taxonomic Classification of Raw Text using C# – (IAB-2 & Document Taxonomy)

Earlier, we discussed how we can automate the analysis and classify complete documents programmatically. It is often required to classify just some part of the document or only a few statements. In this article, we will identify the best possible taxonomic categories of the selected text. We will learn how we can classify text according to IAB-2 and document taxonomies using C#.

The following topics are covered below:

.NET API for Taxonomic Classification of Text

GroupDocs.Classification for .NET is the API that allows different techniques for the classification of text content within .NET applications. We will use this API to find the best possible taxonomic categories of the provided text using C# in examples.

You can download the DLLs or MSI installer from the downloads section or install the API in your .NET application via NuGet.

PM> Install-Package GroupDocs.Classification

Text Classification with IAB-2 Taxonomy using C#

IAB-2 categorizes the content into defined taxonomic categories and then classifies it based on the analysis. The following are the steps for taxonomic classification of text with IAB-2 taxonomy using C#.

  • Instantiate the classifier using Classifier class.
  • Define the text for taxonomic analysis.
  • Set the Taxonomy as IAB2.
  • Set the number of best results count as a result of classification. (Optional)
  • Get the taxonomic categories of the provided text by calling Classify method with the defined parameters.
  • Print the BestResults from the classification response of the Classify method.

The following C# source code shows how to classify text using IAB-2 taxonomy and get the top categories with the best match.

 Class: Healthy_Living,      Probability: 0.4144087
 Class: Medical_Health,     Probability: 0.2108202
 Class: Science,                 Probability: 0.1584931

Text Classification with Document Taxonomy using C#

Documents taxonomy classifies the content into different document classes, such as advertisements, invoices, news, resume, letters, emails, etc. The following are the steps for taxonomic classification of text with document taxonomy using C#.

  • Instantiate the Classifier.
  • Load the text for taxonomic analysis.
  • Define the number of best results count as a result of classification. (Optional)
  • Set the Taxonomy as Documents.
  • Get the taxonomic groups by calling Classify method with the above defined parameters.
  • Print the BestResults from the classification response of the Classify method.

The following C# source code shows how to classify text content and get some of its top taxonomic categories using document taxonomy.

 Class: ADVE,      Probability: 0.9999645
 Class: Report,     Probability: 3.461805E-05

Get a Free License

You can get a free temporary license in order to use the API without the evaluation limitations.

Conclusion

To sum up, we learned to classify various kinds of documents using different taxonomies. In the examples, we classified the text as per IAB-2 and the document taxonomies using C#. After going through the series of posts, you can build your own .NET classification application to classify documents as well as text with different taxonomies and configurations.

For more about the API, visit the documentation. For queries, contact us via the forum.

See Also