Title: |
On the Utility of Incremental Feature Selection for the Classification of Textual Data Streams |
Author(s): |
I. Katakis, G. Tsoumakas, I. Vlahavas.
|
Availability: |
Click here to download the PDF (Acrobat Reader) file (11 pages).
|
Keywords: |
Text Mining, Text Classification, Feature Based Classifiers, Dynamic Feature Space, Dynamic Feature Selection, Data Streams, Concept Drift.
|
Appeared in: |
10th Panhellenic Conference on Informatics (PCI 2005), P. Bozanis and E.N. Houstis (Eds.), Springer-Verlag, LNCS 3746, pp. 338-348, Volos, Greece, 11-13 November, 2005.
|
Abstract: |
In this paper we argue that incrementally updating the fea-
tures that a text classification algorithm considers is very important for
real-world textual data streams, because in most applications the distri-
bution of data and the description of the classification concept changes
over time. We propose the coupling of an incremental feature ranking
method and an incremental learning algorithm that can consider differ-
ent subsets of the feature vector during prediction (what we call a feature
based classifier), in order to deal with the above problem. Experimental
results with a longitudinal database of real spam and legitimate emails
shows that our approach can adapt to the changing nature of streaming
data and works much better than classical incremental learning algo-
rithms. |
See also : |
|
This paper has been cited by the following:
1 |
S. Günal, S. Ergin, M.B. Gülmezoğlu, Ö.N. Gerek, “On Feature Extraction for Spam E-Mail Detection”, Proc. International Workshop on Multimedia Content Representation, Classification and Security, MRCS 2006, Istanbul, Turkey, September 11-13, 2006, (LNCS Vol. 4105/2006, pp 635-642) |
2 |
B. Wenerstrom, C. Giraud-Carrir, “Temporal Data Mining in Dynamic Feature Spaces”, Proc. Int. Conf. on Data Mining, pp. 1141-1145, Hong Kong, 18-22 December, 2006 |
3 |
B. Wenerstrom, “Temporal data mining in a dynamic feature space”, MSc Thesis, Brigham Young University, Provo, UT, USA, May 2006 |
4 |
C. Rohr, D. Tjondronegoro, "Aggregated cross-media news visualization and personalization", Proceeding of the 1st ACM international conference on Multimedia information retrieval, p. 371-378, Vancouver, British Columbia, Canada, 2008. |
5 |
Ανυφαντής Δ., Αυτόματο Φιλτράρισμα ανεπιθύμητης ηλεκτρονικής αλληλογραφίας με χρήση μεθόδων μηχανικής μάθησης, Μεταπτυχιακή Εργασία, Τμήμα Μηχανικών Η/Υ, Πανεπιστήμιο Πατρών, 2008. |
6 |
Kalinov, P., Stantic, B., Sattar, A. (2010) "Building a Dynamic Classifier for Large Text Data Collections", Proceedings of the 21st Australasian Database Conference (ADC2010), Brisbane, Australia, January 2010.
|
7 |
Bártolo Gomes, J., Gaber, M.M., Sousa, P.A.C., Menasalvas, E. (2011) Context-aware collaborative data stream mining in ubiquitous devices, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7014 LNCS, pp. 22-33 |
|