PWFS: A scalable parallel Python module for wrapper feature selection

dc.contributor.authorEren, Hakan Alp
dc.contributor.authorOkyay, Savaş
dc.contributor.authorAdar, Nihat
dc.date.accessioned2026-02-08T15:05:28Z
dc.date.available2026-02-08T15:05:28Z
dc.date.issued2025
dc.departmentBursa Teknik Üniversitesi
dc.description.abstractIn the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requires an enormous computational load. Although the exhaustive search ensures the best feature selection, a lifetime would not be enough after certain large feature counts. This study proposes a generic, scalable open-source parallel Python module to find the best wrapper feature subset in a fully optimized execution time, especially for reasonable feature counts. This parallel wrapper feature selection module, PWFS, is independent of machine learning algorithms and can function with user-defined methods. The framework promises maximum benefit on the machine learning side by empowering parallel performance and efficiency. The system design is built on the most efficient message-passing communication, where the framework distributes the computational load equally among the parallel agents via feature masking. The module is validated on two workstations, one of which is hyper-threading capable. An overall performance gain of 19.77% is achieved with hyper-threading. Various scenarios and experiments yield different speedups and efficiencies up to 96.74%, validating the flexible design of the proposed parallel framework. The source code of the module is available at https://github.com/haeren/parallel-feature-selector and https://pypi.org/project/parallel-feature-selector/.
dc.description.abstractIn the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requires an enormous computational load. Although the exhaustive search ensures the best feature selection, a lifetime would not be enough after certain large feature counts. This study proposes a generic, scalable open-source parallel Python module to find the best wrapper feature subset in a fully optimized execution time, especially for reasonable feature counts. This parallel wrapper feature selection module, PWFS, is independent of machine learning algorithms and can function with user-defined methods. The framework promises maximum benefit on the machine learning side by empowering parallel performance and efficiency. The system design is built on the most efficient message-passing communication, where the framework distributes the computational load equally among the parallel agents via feature masking. The module is validated on two workstations, one of which is hyper-threading capable. An overall performance gain of 19.77% is achieved with hyper-threading. Various scenarios and experiments yield different speedups and efficiencies up to 96.74%, validating the flexible design of the proposed parallel framework. The source code of the module is available at https://github.com/haeren/parallel-feature-selector and https://pypi.org/project/parallel-feature-selector/.
dc.identifier.doi10.61112/jiens.1639780
dc.identifier.endpage719
dc.identifier.issn2791-7630
dc.identifier.issue2
dc.identifier.startpage704
dc.identifier.urihttps://doi.org/10.61112/jiens.1639780
dc.identifier.urihttps://hdl.handle.net/20.500.12885/4671
dc.identifier.volume5
dc.language.isoen
dc.publisherİdris Karagöz
dc.relation.ispartofYenilikçi Mühendislik ve Doğa Bilimleri
dc.relation.ispartofJournal of Innovative Engineering and Natural Science
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_DergiPark_20260207
dc.subjectHigh Performance Computing
dc.subjectYüksek Performanslı Hesaplama [EN] Machine Learning Algorithms
dc.subjectMakine Öğrenmesi Algoritmaları [EN] Data Mining and Knowledge Discovery
dc.subjectVeri Madenciliği ve Bilgi Keşfi [EN] Computer Software
dc.subjectBilgisayar Yazılımı
dc.titlePWFS: A scalable parallel Python module for wrapper feature selection
dc.title.alternativePWFS: A scalable parallel Python module for wrapper feature selection
dc.typeArticle

Dosyalar