LPIS Home Page
Google Search

Title: PolyA-iEP: A Data Mining Method for the Effective Prediction of Polyadenylation Sites
Author(s): G. Tzanis, I. Kavakiotis, I. Vlahavas.
Availability: Click here to download the PDF (Acrobat Reader) file.
Keywords: data mining, machine learning, classification, emerging pattern, bioinformatics, polyadenylation.
Appeared in: Expert Systems with Applications, Elsevier, 38(10), pp. 1239812408, 2011.
Abstract: This paper presents a study on polyadenylation site prediction, which is a very important problem in bioinformatics and medicine, promising to give a lot of answers especially in cancer research. We describe a method, called PolyA-iEP, that we developed for predicting polyadenylation sites and we present a systematic study of the problem of recognizing mRNA 3΄ ends which contain a polyadenylation site using the proposed method. PolyA-iEP is a modular system consisting of two main components that both contribute substantially to the descriptive and predictive potential of the system. In specific, PolyA-iEP exploits the advantages of emerging patterns, namely high understandability and discriminating power and the strength of a distance-based scoring method that we propose. The extracted emerging patterns may span across many elements around the polyadenylation site and can provide novel and interesting biological insights. The outputs of these two components are finally combined by a classifier in a highly effective framework, which in our setup reaches 93.7% of sensitivity and 88.2% of specificity. PolyA-iEP can be parameterized and used for both descriptive and predictive analysis. We have experimented with Arabidopsis thaliana sequences for evaluating our method and we have drawn important conclusions.
See also :

        This paper has been cited by the following:

1 Contrast Data Mining: Concepts, Algorithms, and Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series. Guozhu Dong (Editor), James Bailey (Editor) (2012)
2 Spits Warnars, "Attribute Oriented Induction of High-level Emerging Patterns," grc, pp.525-530, 2012 IEEE International Conference on Granular Computing, 2012
3 Wu, X., Ji, G, and Zeng, Y. (2012). In silico prediction of mRNA poly(A) sites in Chlamydomonas reinhardtii. Molecular Genetics and Genomics, Springer-Verlag.
4 Wu, X., Ji, G., Quinn Li, Q., and Zhou, S. (2012). Comprehensive recognition of messenger RNA polyadenylation patterns in plants, African Journal of Biotechnology, 11 (14), pp. 3215-3234.
5 Rogers, M.F. (2013). From RNA-SEQ to Gene Annotation Using the Splicegrapher Method. PhD Thesis, Department of Computer Science, Colorado State University, Fort Collins, Colorado.
6 Ji G, Guan J, Zeng Y, Li QQ, Wu X. (2014) Genome-wide identification and predictive modeling of polyadenylation sites in eukaryotes.Brief Bioinform (2014) Apr 1. doi: 10.1093/bib/bbu011