Improving Long Non-Coding RNA Prediction through Recursive Feature Elimination and XGBoost
Küçük Resim Yok
Tarih
2025
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Gazi Univ
Erişim Hakkı
info:eu-repo/semantics/openAccess
Özet
In recent years, advancements in high-throughput technologies have uncovered numerous concealed layers known as Non-Coding Ribonucleic Acids (ncRNAs), shifting the protein-centric view of genomes. NcRNAs, previously considered insignificant segments of the genome, are now recognized as essential functional components in prokaryotic and eukaryotic organisms. Long non-coding RNAs (lncRNAs) are a unique category of ncRNAs with 200 nucleotides length, which are instrumental in key biological functions, including cellular differentiation, regulatory mechanisms, and epigenetic modifications. Despite the similarities between lncRNAs and messenger RNAs (mRNAs), there is a fundamental difference: mRNAs encode proteins, whereas lncRNAs do not. This study aims to distinguish these two RNA classes from each other by designing a robust machine learning (ML) pipeline employing Recursive Feature Elimination (RFE) for dimensionality reduction of dataset and XGBoost (XGB) classification model. Whereas previous studies trained and tested machine learning models using the complete set of dataset features, we employ the RFE technique to reduce the number of features, thereby we achieve a more optimal dataset with relevant features. To evaluate the predictive performance of our pipeline, we used error rate, accuracy, precision, recall, and F1-score. Compared to three existing lncRNA identification tools in the literature, our pipeline demonstrated superior prediction accuracy and precision at 93.42% and 94.19% respectively.
Açıklama
Anahtar Kelimeler
Recursive Feature Elimination, XGBoost, lncRNAs, Bioinformatics, Machine Learning
Kaynak
Journal of Polytechnic-Politeknik Dergisi
WoS Q Değeri
Q4












