PWFS: A scalable parallel Python module for wrapper feature selection
| dc.contributor.author | Eren, Hakan Alp | |
| dc.contributor.author | Okyay, Savaş | |
| dc.contributor.author | Adar, Nihat | |
| dc.date.accessioned | 2026-02-08T15:05:28Z | |
| dc.date.available | 2026-02-08T15:05:28Z | |
| dc.date.issued | 2025 | |
| dc.department | Bursa Teknik Üniversitesi | |
| dc.description.abstract | In the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requires an enormous computational load. Although the exhaustive search ensures the best feature selection, a lifetime would not be enough after certain large feature counts. This study proposes a generic, scalable open-source parallel Python module to find the best wrapper feature subset in a fully optimized execution time, especially for reasonable feature counts. This parallel wrapper feature selection module, PWFS, is independent of machine learning algorithms and can function with user-defined methods. The framework promises maximum benefit on the machine learning side by empowering parallel performance and efficiency. The system design is built on the most efficient message-passing communication, where the framework distributes the computational load equally among the parallel agents via feature masking. The module is validated on two workstations, one of which is hyper-threading capable. An overall performance gain of 19.77% is achieved with hyper-threading. Various scenarios and experiments yield different speedups and efficiencies up to 96.74%, validating the flexible design of the proposed parallel framework. The source code of the module is available at https://github.com/haeren/parallel-feature-selector and https://pypi.org/project/parallel-feature-selector/. | |
| dc.description.abstract | In the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requires an enormous computational load. Although the exhaustive search ensures the best feature selection, a lifetime would not be enough after certain large feature counts. This study proposes a generic, scalable open-source parallel Python module to find the best wrapper feature subset in a fully optimized execution time, especially for reasonable feature counts. This parallel wrapper feature selection module, PWFS, is independent of machine learning algorithms and can function with user-defined methods. The framework promises maximum benefit on the machine learning side by empowering parallel performance and efficiency. The system design is built on the most efficient message-passing communication, where the framework distributes the computational load equally among the parallel agents via feature masking. The module is validated on two workstations, one of which is hyper-threading capable. An overall performance gain of 19.77% is achieved with hyper-threading. Various scenarios and experiments yield different speedups and efficiencies up to 96.74%, validating the flexible design of the proposed parallel framework. The source code of the module is available at https://github.com/haeren/parallel-feature-selector and https://pypi.org/project/parallel-feature-selector/. | |
| dc.identifier.doi | 10.61112/jiens.1639780 | |
| dc.identifier.endpage | 719 | |
| dc.identifier.issn | 2791-7630 | |
| dc.identifier.issue | 2 | |
| dc.identifier.startpage | 704 | |
| dc.identifier.uri | https://doi.org/10.61112/jiens.1639780 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.12885/4671 | |
| dc.identifier.volume | 5 | |
| dc.language.iso | en | |
| dc.publisher | İdris Karagöz | |
| dc.relation.ispartof | Yenilikçi Mühendislik ve Doğa Bilimleri | |
| dc.relation.ispartof | Journal of Innovative Engineering and Natural Science | |
| dc.relation.publicationcategory | Makale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı | |
| dc.rights | info:eu-repo/semantics/openAccess | |
| dc.snmz | KA_DergiPark_20260207 | |
| dc.subject | High Performance Computing | |
| dc.subject | Yüksek Performanslı Hesaplama [EN] Machine Learning Algorithms | |
| dc.subject | Makine Öğrenmesi Algoritmaları [EN] Data Mining and Knowledge Discovery | |
| dc.subject | Veri Madenciliği ve Bilgi Keşfi [EN] Computer Software | |
| dc.subject | Bilgisayar Yazılımı | |
| dc.title | PWFS: A scalable parallel Python module for wrapper feature selection | |
| dc.title.alternative | PWFS: A scalable parallel Python module for wrapper feature selection | |
| dc.type | Article |












