Text Classification Systems Essay

1055 Words 5 Pages
Currently, there are many classification systems. Broadly speaking, these systems fall into two main categories. These are binary and multiclass systems. Binary classification systems are only concerned with classifying documents into two main categories or groups. Classification systems of this kind are used to distinguish between just two classes of objects. As Maranis and Bebenko (2009) explain, these systems provide Yes/No answer to the question: Does this document belong to class X? In this, such systems can be useful in classifying emails where they are classified whether spam or not, or commercial transactions where they are determined to be fraudulent or not. In such applications, it is more likely and easier to use binary …show more content…
Word order or compositional semantics, for instance, do not have any significance in classification or clustering performance. The focus of this chapter is on the use of statistical methods.
 Statistical text clustering
Statistical text clustering developed greatly in the 1990s, when it emerged as a subtask of Information Retrieval (IR) applications (Joachims and Sebastiani, 2002). The hallmark of that a development has been the dramatic improvement of the effectiveness of text clustering systems. The last two decades have witnessed an unprecedented revolution in developing mechanized solutions for organizing the vast quantity of unstructured digital documents and providing powerful tools for turning this unstructured repository into a structured one (Sebastiani, 2006).
The world of knowledge has witnessed over recent years a rapid increase in the amount of sorted data in all fields of knowledge due to the continuous improvement of methods for digitally storing data. As a response to the growing overflow of information which has made it difficult for many search engines to fill people’s needs, various computer-based clustering and classification methods have been developed. Concerns have been raised by IR researchers and internet users about the poor matching of queries and the results generated by search engines. IR researchers have worked in consequence

Related Documents