LPIS Home Page
Google Search

Title: On the Utility of Incremental Feature Selection for the Classification of Textual Data Streams
Author(s): I. Katakis, G. Tsoumakas, I. Vlahavas.
Availability: Click here to download the PDF (Acrobat Reader) file (11 pages).
Keywords: Text Mining, Text Classification, Feature Based Classifiers, Dynamic Feature Space, Dynamic Feature Selection, Data Streams, Concept Drift.
Appeared in: 10th Panhellenic Conference on Informatics (PCI 2005), P. Bozanis and E.N. Houstis (Eds.), Springer-Verlag, LNCS 3746, pp. 338-348, Volos, Greece, 11-13 November, 2005.
Abstract: In this paper we argue that incrementally updating the fea- tures that a text classification algorithm considers is very important for real-world textual data streams, because in most applications the distri- bution of data and the description of the classification concept changes over time. We propose the coupling of an incremental feature ranking method and an incremental learning algorithm that can consider differ- ent subsets of the feature vector during prediction (what we call a feature based classifier), in order to deal with the above problem. Experimental results with a longitudinal database of real spam and legitimate emails shows that our approach can adapt to the changing nature of streaming data and works much better than classical incremental learning algo- rithms.
See also :

        This paper has been cited by the following:

1 S. Günal, S. Ergin, M.B. Gülmezoğlu, Ö.N. Gerek, “On Feature Extraction for Spam E-Mail Detection”, Proc. International Workshop on Multimedia Content Representation, Classification and Security, MRCS 2006, Istanbul, Turkey, September 11-13, 2006, (LNCS Vol. 4105/2006, pp 635-642)
2 B. Wenerstrom, C. Giraud-Carrir, “Temporal Data Mining in Dynamic Feature Spaces”, Proc. Int. Conf. on Data Mining, pp. 1141-1145, Hong Kong, 18-22 December, 2006
3 B. Wenerstrom, “Temporal data mining in a dynamic feature space”, MSc Thesis, Brigham Young University, Provo, UT, USA, May 2006
4 C. Rohr, D. Tjondronegoro, "Aggregated cross-media news visualization and personalization", Proceeding of the 1st ACM international conference on Multimedia information retrieval, p. 371-378, Vancouver, British Columbia, Canada, 2008.
5 Ανυφαντής Δ., Αυτόματο Φιλτράρισμα ανεπιθύμητης ηλεκτρονικής αλληλογραφίας με χρήση μεθόδων μηχανικής μάθησης, Μεταπτυχιακή Εργασία, Τμήμα Μηχανικών Η/Υ, Πανεπιστήμιο Πατρών, 2008.
6 Kalinov, P., Stantic, B., Sattar, A. (2010) "Building a Dynamic Classifier for Large Text Data Collections", Proceedings of the 21st Australasian Database Conference (ADC2010), Brisbane, Australia, January 2010.
7 Bártolo Gomes, J., Gaber, M.M., Sousa, P.A.C., Menasalvas, E. (2011) Context-aware collaborative data stream mining in ubiquitous devices, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7014 LNCS, pp. 22-33