Machine Learning / Data Mining
Periodicity Detection in Temporal Sequences
Periodicity is a particularly interesting feature, which is often inherent in real world time series data sets. In this article we propose a data mining technique for detecting multiple partial and approximate periodicities. Our approach is exploratory and follows a filter/refine paradigm. In the filter phase we introduce an autocorrelation-based algorithm that produces a set of candidate partial periodicities. The algorithm is extended to capture ap-proximate periodicities. In the refine phase we effectively prune invalid periodicities. We conducted a series of experiments with various real world data sets to test the performance and verify the quality of the results.
Knowledge Discovery from Biological Data
The recent technological advances have assisted the conduct of large scale experiments and research projects, leading to an exponential growth of biological data. The use of computers is demanded for the organization, the maintenance and the analysis of these data. This is the mission of bioinformatics, an interdisciplinary field at the intersection of biology, computer science and information technology. The analysis of DNA and protein sequences, the study of gene expression and the accurate prediction of protein structure are some of the most important problems in bioinformatics. Scientists, in order to analyse these data, often have to use exploratory methods instead of confirming already suspected hypotheses. Data mining is a research area that aims to provide the analysts with novel and efficient computational tools to overcome the obstacles and constraints posed by the traditional statistical methods. Selection, preprocessing and transformation of the data, selection and application of the data mining algorithm, interpretation of the results and evaluation of the produced knowledge are essential steps in the knowledge discovery process. The goal of this research activity is to provide novel or extend existing knowledge discovery methods for the analysis of biological data. The application of knowledge discovery in order to gain new biological insights is also within the scope of this research area.
Learning for Planning
This research activity involves learning domain knowledge for selection among different planning methods based on data concerning the performance (speed and quality) of these methods. We have modelled the problem of planning method selection using various approaches and applied it for a) the automatic configuration of the planning parameters of planning systems and b) the selection among different planning systems.
This research activity involves methods for combining multiple predictive models in order to increase the accuracy of single models. We have used statistical methods for selecting the best subset from an ensemble of classification algorithms and combined the decisions of the selected models through weighted voting. The results of this approach have very high accuracy and low computational complexity compared to other state-of-the-art classifier combination methods.
Text Learning is the application of Machine Learning techniques on Textual Data. Thanks to the growth of the Internet and the World Wide Web, this area of Machine Learning has now the largest application field in existence. Basic text learning applications are document organization, author identification, document summarization and text filtering. Those applications, among with many many others can be applied in several domains such as the world wide web, e-mail messages, newsgroups, blogs or even large e-libraries.
We currently focus our research on textual data streams, such as e-mails and news feeds and the problem of concept drift which occurs in these type of applications. This phenomenon appears in classification problems where the concept of the target class changes over time. In spam filtering for example, junk mail messages are being changed all the time by the spammers in order to avoid conventional filters, or in an application where a classifier is assigned to separate interesting from uninteresting messages in a newsgroup, the user's interests may vary from time to time. We therefore, try to develop better incremental algorithms which can follow this alteration of concept by constantly updating the classifier in use but at the same time selecting the best feature subset for classification.
Reinforcement Learning addresses the problem of how a learning system (agent) can learn a behavior through trial-and-error interactions with a dynamic environment. The basic idea is that the agent is evaluated by a reinforcement signal that is taken from the environment as assessment of its performance. The goal of the agent is to maximize the long-run sum of values of the reinforcement signal.
We currently we focus our research on the problem of combining multiple predictive models into an ensemble using Reinforcement Learning.