PWFS: A scalable parallel Python module for wrapper feature selection

Eren, Hakan Alp; Okyay, Savaş; Adar, Nihat

PWFS: A scalable parallel Python module for wrapper feature selection

Tarih

2025

Yazarlar

Eren, Hakan Alp

Okyay, Savaş

Adar, Nihat

Yayıncı

İdris Karagöz

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

In the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requires an enormous computational load. Although the exhaustive search ensures the best feature selection, a lifetime would not be enough after certain large feature counts. This study proposes a generic, scalable open-source parallel Python module to find the best wrapper feature subset in a fully optimized execution time, especially for reasonable feature counts. This parallel wrapper feature selection module, PWFS, is independent of machine learning algorithms and can function with user-defined methods. The framework promises maximum benefit on the machine learning side by empowering parallel performance and efficiency. The system design is built on the most efficient message-passing communication, where the framework distributes the computational load equally among the parallel agents via feature masking. The module is validated on two workstations, one of which is hyper-threading capable. An overall performance gain of 19.77% is achieved with hyper-threading. Various scenarios and experiments yield different speedups and efficiencies up to 96.74%, validating the flexible design of the proposed parallel framework. The source code of the module is available at https://github.com/haeren/parallel-feature-selector and https://pypi.org/project/parallel-feature-selector/.
In the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requires an enormous computational load. Although the exhaustive search ensures the best feature selection, a lifetime would not be enough after certain large feature counts. This study proposes a generic, scalable open-source parallel Python module to find the best wrapper feature subset in a fully optimized execution time, especially for reasonable feature counts. This parallel wrapper feature selection module, PWFS, is independent of machine learning algorithms and can function with user-defined methods. The framework promises maximum benefit on the machine learning side by empowering parallel performance and efficiency. The system design is built on the most efficient message-passing communication, where the framework distributes the computational load equally among the parallel agents via feature masking. The module is validated on two workstations, one of which is hyper-threading capable. An overall performance gain of 19.77% is achieved with hyper-threading. Various scenarios and experiments yield different speedups and efficiencies up to 96.74%, validating the flexible design of the proposed parallel framework. The source code of the module is available at https://github.com/haeren/parallel-feature-selector and https://pypi.org/project/parallel-feature-selector/.

Anahtar Kelimeler

High Performance Computing, Yüksek Performanslı Hesaplama [EN] Machine Learning Algorithms, Makine Öğrenmesi Algoritmaları [EN] Data Mining and Knowledge Discovery, Veri Madenciliği ve Bilgi Keşfi [EN] Computer Software, Bilgisayar Yazılımı

Kaynak

Yenilikçi Mühendislik ve Doğa Bilimleri
Journal of Innovative Engineering and Natural Science

Cilt

5

Sayı

2

Bağlantı

https://doi.org/10.61112/jiens.1639780
https://hdl.handle.net/20.500.12885/4671

Koleksiyon

Öksüz Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

PWFS: A scalable parallel Python module for wrapper feature selection

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon