A Near-Real Time Automatic Audio Classification: Special Case for Hacivat and Karagoz Shadow Play

Küçük Resim Yok

Tarih

2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Speaker identification plays a key role in various applications, such as security, biometrics, and human-computer interaction. As a specific task under the domain of audio classification, speaker identification aims to recognize individuals based on their voice characteristics. This paper presents a comparison between three widely adopted neural network architectures and evaluates their performance as classifiers for a real-time speaker identification system. A custom-collected dataset was gathered using publicly shared YouTube videos of a single speaker imitating multiple characters from traditional Turkish shadow play Karagoz and Hacivat. Both MFCC and Log-Mel filterbank energy features were used during the training of CRNN, 2D-CNN and Bi-LSTM architectures. Among these architectures, 2D-CNN achieved the highest accuracy with a value of 94.4% and was approximately 2.7 times faster than its closest follower Bi-LSTM during real-time testing on RTX 4070 Super GPU. © 2025 IEEE.

Açıklama

2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025 -- 2025-09-10 through 2025-09-12 -- Bursa -- 214381

Anahtar Kelimeler

Audio Classification, Bidirectional Long Short Term Memory, Convolutional Neural Networks, Convolutional Recurrent Neural Networks, Speaker Identification

Kaynak

WoS Q Değeri

Scopus Q Değeri

N/A

Cilt

Sayı

Künye