Improving Long Non-Coding RNA Prediction through Recursive Feature Elimination and XGBoost

dc.authorid0009-0009-7632-0274
dc.contributor.authorAlizada, Freshta
dc.contributor.authorAltuntas, Volkan
dc.date.accessioned2026-02-08T15:15:52Z
dc.date.available2026-02-08T15:15:52Z
dc.date.issued2025
dc.departmentBursa Teknik Üniversitesi
dc.description.abstractIn recent years, advancements in high-throughput technologies have uncovered numerous concealed layers known as Non-Coding Ribonucleic Acids (ncRNAs), shifting the protein-centric view of genomes. NcRNAs, previously considered insignificant segments of the genome, are now recognized as essential functional components in prokaryotic and eukaryotic organisms. Long non-coding RNAs (lncRNAs) are a unique category of ncRNAs with 200 nucleotides length, which are instrumental in key biological functions, including cellular differentiation, regulatory mechanisms, and epigenetic modifications. Despite the similarities between lncRNAs and messenger RNAs (mRNAs), there is a fundamental difference: mRNAs encode proteins, whereas lncRNAs do not. This study aims to distinguish these two RNA classes from each other by designing a robust machine learning (ML) pipeline employing Recursive Feature Elimination (RFE) for dimensionality reduction of dataset and XGBoost (XGB) classification model. Whereas previous studies trained and tested machine learning models using the complete set of dataset features, we employ the RFE technique to reduce the number of features, thereby we achieve a more optimal dataset with relevant features. To evaluate the predictive performance of our pipeline, we used error rate, accuracy, precision, recall, and F1-score. Compared to three existing lncRNA identification tools in the literature, our pipeline demonstrated superior prediction accuracy and precision at 93.42% and 94.19% respectively.
dc.identifier.doi10.2339/politeknik.1627668
dc.identifier.issn1302-0900
dc.identifier.issn2147-9429
dc.identifier.urihttps://doi.org/10.2339/politeknik.1627668
dc.identifier.urihttps://hdl.handle.net/20.500.12885/6005
dc.identifier.wosWOS:001495358600001
dc.identifier.wosqualityQ4
dc.indekslendigikaynakWeb of Science
dc.language.isoen
dc.publisherGazi Univ
dc.relation.ispartofJournal of Polytechnic-Politeknik Dergisi
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzWOS_KA_20260207
dc.subjectRecursive Feature Elimination
dc.subjectXGBoost
dc.subjectlncRNAs
dc.subjectBioinformatics
dc.subjectMachine Learning
dc.titleImproving Long Non-Coding RNA Prediction through Recursive Feature Elimination and XGBoost
dc.typeArticle

Dosyalar