Arşiv logosu
  • Türkçe
  • English
  • Giriş
    Yeni kullanıcı mısınız? Kayıt için tıklayın. Şifrenizi mi unuttunuz?
Arşiv logosu
  • Koleksiyonlar
  • DSpace İçeriği
  • Analiz
  • Türkçe
  • English
  • Giriş
    Yeni kullanıcı mısınız? Kayıt için tıklayın. Şifrenizi mi unuttunuz?
  1. Ana Sayfa
  2. Yazara Göre Listele

Yazar "Hanilçi, Cemal" seçeneğine göre listele

Listeleniyor 1 - 20 / 30
Sayfa Başına Sonuç
Sıralama seçenekleri
  • Küçük Resim Yok
    Öğe
    A comparative analysis of magnitude and phase spectrum-based features for replay attack detection by using deep learning
    (Gumushane University, 2025) Bekiryazıcı, Sule; Hanilçi, Cemal; Ozcan, Neyir
    The rapid advancement of digital technologies and the growing demand for security have significantly increased interest in biometric authentication systems. These systems authenticate individuals based on their physical or behavioral traits, offering a high level of security. Among them, Automatic Speaker Verification (ASV) systems stand out due to their user-friendly design and natural interaction capabilities. However, these systems remain vulnerable to spoofing attacks, particularly replay attacks. Such attacks involve deceiving the system by playing back a previously recorded speech sample and are considered a major threat due to their low cost and practical feasibility. This study systematically investigates the effectiveness of amplitude and phase-based spectral features extracted from speech signals in detecting such replay attacks. A total of eight amplitude-based and three phase-based features were derived and evaluated on the ASVspoof-2017, ASVspoof-2019 (Physical Access), and ASVspoof-2021 (Physical Access) datasets. Each feature set represents the spectral or phase characteristics of speech from different perspectives, aiming to capture artifacts introduced by reverberation, distortion, and re-recording—common indicators of replay attacks. Two deep learning-based classifiers, ResNet and LCNN architectures, were employed for the classification task. System performance was assessed using Equal Error Rate (EER) and tandem Detection Cost Function (t-DCF) metrics. Experimental results demonstrate that spectral features, particularly those in higher frequency bands, provide strong discriminatory power in identifying spoofed speech signals. The findings contribute a comprehensive comparison of both feature diversity and model performance, offering a valuable perspective to the existing literature on robust countermeasures against replay attacks in Automatic Speaker Verification systems. © 2025, Gumushane University. All rights reserved.
  • Küçük Resim Yok
    Öğe
    A Study on Turkish Text - Dependent Speaker Recognition
    (Ieee, 2017) Çeliktaş, Havva; Hanilçi, Cemal
    Speaker recognition is a pattern recognition task which has long been studied, but the accuracies are still far from the desired levels. The majority of the studies on speaker recognition demonstrates the results obtained from databases in which English voices are used. Since there are very few studies on Turkish speech, the performance of the known successful methods in Turkish voices are uncertain. Therefore, in this study, the performance on the Turkish text - dependent system is investigated by using Gaussian Mixture Model - Universal Background Model (GMM - UBM) method which is a well known method in speaker recognition systems. In the experimental studies, Turkish speaker recognition database consisting of 46 speakers (36 males and 10 females) is used. Equal error rate (EER) is used to measure system performance. The equal error rate for GMM - UBM method was found to be 5.73%. It has been observed in the experiments that the speaker verification performance of GMM - UBM classifier on Turkish database is encouraging.
  • Küçük Resim Yok
    Öğe
    AN EXPERIMENTAL STUDY ON AUDIO REPLAY ATTACK DETECTION USING DEEP NEURAL NETWORKS
    (Ieee, 2018) Bakar, Bekir; Hanilçi, Cemal
    Automatic speaker verification (ASV) systems can be easily spoofed by previously recorded speech, synthesized speech and speech signal that artificially generated by voice conversion techniques. In order to increase the reliability of the ASV systems, detecting spoofing attacks whether a given speech signal is genuine or spoofed plays an important role. In this paper, we consider the detection of replay attacks which is the most accessible attack type against ASV systems. To this end, we utilize a deep neural network (DNN) based classifier using features extracted from the long-term average spectrum. The experiments are conducted on the latest edition of Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2017) database. The results are compared with the ASVspoof 2017 baseline system which consists of Gaussian mixture model (GMM) classifier with constant-Q transform cepstral coefficients (CQCC) front-end as well as the GMM with standard mel-frequency cepstrum coefficients (MFCC) features. Experimental results reveal that DNN considerably outperforms the well-known and successful GMM classifier. It is found that long term average spectrum (LTAS) based features are superior to CQCC and MFCC in terms of equal error rate (EER). Finally, we find that high-frequency components convey much more discriminative information for replay attack detection independent of features and classifiers.
  • Küçük Resim Yok
    Öğe
    An Experimental Study on Audio Replay Attack Detection Using Deep Neural Networks
    (Institute of Electrical and Electronics Engineers Inc., 2018) Bakar, Bekir; Hanilçi, Cemal
    Automatic speaker verification (ASV) systems can be easily spoofed by previously recorded speech, synthesized speech and speech signal that artificially generated by voice conversion techniques. In order to increase the reliability of the ASV systems, detecting spoofing attacks whether a given speech signal is genuine or spoofed plays an important role. In this paper, we consider the detection of replay attacks which is the most accessible attack type against ASV systems. To this end, we utilize a deep neural network (DNN) based classifier using features extracted from the long-term average spectrum. The experiments are conducted on the latest edition of Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2017) database. The results are compared with the ASVspoof 2017 baseline system which consists of Gaussian mixture model (GMM) classifier with constant-Q transform cepstral coefficients (CQCC) front-end as well as the GMM with standard mel-frequency cepstrum coefficients (MFCC) features. Experimental results reveal that DNN considerably outperforms the well-known and successful GMM classifier. It is found that long term average spectrum (LTAS) based features are superior to CQCC and MFCC in terms of equal error rate (EER). Finally, we find that high-frequency components convey much more discriminative information for replay attack detection independent of features and classifiers. © 2018 IEEE.
  • Küçük Resim Yok
    Öğe
    Angular Margin Softmax Loss and Its Variants for Double Compressed AMR Audio Detection
    (Association for Computing Machinery, Inc, 2021) Büker, Aykut; Hanilçi, Cemal
    Double compressed (DC) adaptive multi-rate (AMR) audio detection is an important but challenging audio forensic task which has received great attention over the last decade. Although the majority of the existing studies extract hand-crafted features and classify these features using traditional pattern matching algorithms such as support vector machines (SVM), recently convolutional neural network (CNN) based DC AMR audio detection system was proposed which yields very promising detection performance. Similar to any traditional CNN based classification system, CNN based DC AMR recognition system uses standard softmax loss as the training criterion. In this paper, we propose to use angular margin softmax loss and its variants for DC AMR detection problem. Although using angular margin softmax was originally proposed for face recognition, we adapt it to the CNN based end-to-end DC audio detection system. The angular margin softmax basically introduces a margin between two classes so that the system can learn more discriminative embeddings for the problem. Experimental results show that adding angular margin penalty to the traditional softmax loss increases the average DC AMR audio detection from 95.83% to 100%. It is also found that the angular margin softmax loss functions boost the DC AMR audio detection performance when there is a mismatch between training and test datasets.
  • Küçük Resim Yok
    Öğe
    ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge
    (Ieee-Inst Electrical Electronics Engineers Inc, 2017) Wu, Zhizheng; Yamagishi, Junichi; Kinnunen, Tomi; Hanilçi, Cemal; Sahidullah, Mohammed; Sizov, Aleksandr
    Concerns regarding the vulnerability of automatic speaker verification (ASV) technology against spoofing can undermine confidence in its reliability and form a barrier to exploitation. The absence of competitive evaluations and the lack of common datasets has hampered progress in developing effective spoofing countermeasures. This paper describes the ASV Spoofing and Countermeasures (ASVspoof) initiative, which aims to fill this void. Through the provision of a common dataset, protocols, and metrics, ASVspoof promotes a sound research methodology and fosters technological progress. This paper also describes the ASVspoof 2015 dataset, evaluation, and results with detailed analyses. A review of postevaluation studies conducted using the same dataset illustrates the rapid progress stemming from ASVspoof and outlines the need for further investigation. Priority future research directions are presented in the scope of the next ASVspoof evaluation planned for 2017.
  • Küçük Resim Yok
    Öğe
    Classifiers for Synthetic Speech Detection: A Comparison
    (Isca-Int Speech Communication Assoc, 2015) Hanilçi, Cemal; Kinnunen, Tomi; Sahidullah, Md; Sizov, Aleksandr
    Automatic speaker verification (ASV) systems are highly vulnerable against spoofing attacks, also known as imposture. With recent developments in speech synthesis and voice conversion technology, it has become important to detect synthesized or voice-converted speech for the security of ASV systems. In this paper, we compare five different classifiers used in speaker recognition to detect synthetic speech. Experimental results conducted on the ASVspoof 2015 dataset show that support vector machines with generalized linear discriminant kernel (GLDS-SVM) yield the best performance on the development set with the EER of 0.12 % whereas Gaussian mixture model (GMM) trained using maximum likelihood (ML) criterion with the EER of 3.01 % is superior for the evaluation set.
  • Yükleniyor...
    Küçük Resim
    Öğe
    Deep convolutional neural networks for double compressed AMR audio detection
    (John Wiley and Sons Inc, 2021) Büker, Aykut; Hanilçi, Cemal
    Detection of double compressed (DC) adaptive multi-rate (AMR) audio recordings is a challenging audio forensic problem and has received great attention in recent years. Here, the authors propose to use convolutional neural networks (CNN) for DC AMR audio detection. The CNN is used as (i) an end-to-end DC AMR audio detection system and (ii) a feature extractor. The end-to-end system receives the audio spectrogram as the input and returns the decision whether the input audio is single compressed (SC) or DC. As a feature extractor in turn, it is used to extract discriminative features and then these features are modelled using support vector machines (SVM) classifier. Our extensive analysis conducted on four different datasets shows the success of the proposed system and provides new findings related to the problem. Firstly, double compression has a considerable impact on the high frequency components of the signal. Secondly, the proposed system yields great performance independent of the recording device or environment. Thirdly, when previously altered files are used in the experiments, 97.41% detection rate is obtained with the CNN system. Finally, the cross-dataset evaluation experiments show that the proposed system is very effective in case of a mismatch between training and test datasets.
  • Küçük Resim Yok
    Öğe
    Double Compressed AMR Audio Detection Using Long-Term Features and Deep Neural Networks
    (Ieee, 2019) Büker, Aykut; Hanilçi, Cemal
    Detecting double compressed audio files is an important problem for audio forensics applications such as audio forgery detection and determining the authenticity of an audio file appearing as an evidence. In this paper, we focus on detecting double compressed adaptive multi-rate (AMR) audio using deep neural network (DNN) classifier and long-term average spectrum (LTAS) and long-term average cepstrum (LTAC) features. Experiments conducted on TIMIT database show that compression rate has a significant impact on the performance. LTAS and LTAC features yield similar performance with slight differences. Removing unvoiced audio frames is found to reduce the detection accuracy and multi-condition training does not bring any performance improvement.
  • Küçük Resim Yok
    Öğe
    Double Compressed AMR Audio Detection Using Spectral Features With Temporal Segmentation
    (Institute of Electrical and Electronics Engineers Inc., 2021) Büker, Aykut; Hanilçi, Cemal
    Double compressed (DC) AMR audio detection is an important audio forensic problem which is used to authenticate the originality of an auido recording. Majority of the existing studies use audio features extracted from the AMR encoder parameters such as linear prediction (LP) coefficients. Recently, we proposed to use the long-term average spectrum (LTAS) features for DC AMR audio detection and promising results were achieved. In this paper, we propose a novel feature extraction techniques which does not require any prior knowledge about the details of the encoding and decoding processes of the AMR codec. The proposed features are extracted from the temporal segmentation of the short-term Fourier transform (STFT) representation of the audio signal. The proposed features are then classified using deep neural network (DNN) classifier. Experimental results conducted on two different databases show that the proposed features considerably outperform the long-term average spectrum (LTAS) features. The average detection rate is improved from 92.44% to 96.48% on MDSVC dataset and from 80.95% to 83.67% on TIMIT database with the proposed features.
  • Yükleniyor...
    Küçük Resim
    Öğe
    EFFECT OF LANGUAGE MISMATCH ON TURKISH SPEAKER VERIFICATION
    (2017) Hanilçi, Cemal
    Bu çalışmada, arkaplan verisi ile gerçekleştirme verisi arasında konuşulan dil anlamında bir uyumsuzluk olması durumunda Türkçe konuşmalar için konuşmacı tanıma performansı incelenmiştir. Gauss karışım modeli - genel arkaplan modeli sınıflandırıcısı ile mel-frekansı kepstral katsayıları konuşmacılara özgü öznitelikler olarak seçilmiştir. 47 erkek ve 26 bayan konuşmacıdan oluşan Türkçe veritabanı ile yapılan deneylerde görülmüştür ki arkaplan modelini eğitmek için kullanılan seslerin dili ile konuşmacı doğrulama deneylerinde kullanılan dil farklı olduğunda konuşmacı doğrulama performansı dramatik bir şekilde düşmektedir. Örneğin, erkek konuşmacılar için Türkçe ses verileri ile arkaplan modeli eğitildiğinde %1.73 eşit hata oranı elde edilirken, İngilizce sesler ile eğitildiğinde %12.34 eşit hata oranı elde edilmiştir.
  • Küçük Resim Yok
    Öğe
    Enhancing Audio Replay Attack Detection with Silence-Based Blind Channel Impulse Response Estimation
    (Springer Science and Business Media Deutschland GmbH, 2026) Bekiryazıcı, Sule; Hanilçi, Cemal; Ozcan, Neyir
    Replay attacks pose a major threat to automatic speaker verification (ASV) systems, considerably degrading performance. Since replayed utterances are captured and reproduced using external microphones and speakers, they inherently reflect these acoustic influences. Such acoustic distortions serve as valuable cues for differentiating between genuine and spoofed speech, provided they can be effectively extracted and modeled. In this context, blind channel impulse response estimation has been shown to be an effective approach in replay attack detection, as it enables the characterization of the acoustic path through which the signal has propagated without requiring explicit knowledge of the original source or environment. Furthermore, prior studies have highlighted the importance of silence segments in this task, noting that these regions, being free of speech content, primarily capture the characteristics of the transmission channel. As such, silence segments offer a unique and robust opportunity for extracting channel-related features that are less influenced by speaker variability and phonetic content, thereby improving the discriminability between bonafide and replayed signals. In this paper, we argue that channel impulse response estimates derived from silence parts contain more discriminative information than those obtained from the entire signal or voiced parts. To exploit this insight, we propose to use log-magnitude channel frequency response estimated from the silence parts for replay attack detection. Experiments on ASVspoof 2019 and 2021 datasets show that utilizing silence-based channel response features reduces the EER from 4.21% to 3.17% and from 29.16% to 24.43%, respectively, compared to using the entire signal. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
  • Küçük Resim Yok
    Öğe
    Features and Classifiers for Replay Spoofing Attack Detection
    (Ieee, 2017) Hanilçi, Cemal
    Automatic speaker verification (ASV) systems are known to be highly vulnerable against spoofing attacks. Various successful countermeasures have recently been proposed to detect spoofing attacks originating from speech synthesis (SS) and voice conversion (VC). However, detecting replay attacks, the most easily implementable spoofing attacks against ASV systems, has gained less attention. Thus, in this paper we present an experimental comparison of various feature extraction techniques and classifiers for replay attack detection. In total, six magnitude spectrum and three phase spectrum based features are used for feature extraction. For classification in turn, four different techniques are utilized. Experiments are conducted on recently released ASVspoof 2017 replay attack detection challenge. Experimental results reveals that magnitude spectrum features considerably outperform phase based features independent of the classifier. Comparative results using four different classifiers indicate that i-vector cosine scoring yields lower equal error rates (EERs) than other methods.
  • Küçük Resim Yok
    Öğe
    Features and Regression Techniques for Crowd Density Estimation: A Comparison
    (Ieee, 2019) Kurnaz, Oğuzhan; Hanilçi, Cemal
    Crowd density estimation is an important problem for the security applications and it is a regression task consisting of feature extraction and regression steps. In this paper, we compare different features and regression techniques for crowd density estimation. To this end 200 images randomly selected from UCSD pedestrian dataset is used in the experiments. Experimental results show that features extracted from gray level co-occurance matrix (GLCM) gives the best performance however the selection of the regression technique depends on the performance criterion. Applying perspective normalization as a pre-processing step and feature elimination as a post-processing step considerably improve the performance.
  • Küçük Resim Yok
    Öğe
    Fine-Tuning ECAPA-TDNN For Turkish Speaker Verification
    (Institute of Electrical and Electronics Engineers Inc., 2024) Demirtaş, Selim Can; Hanilçi, Cemal
    Compared to Turkish speech databases, English speech databases are significantly larger, featuring many more speakers. This creates a trade-off between data adequacy and language for Turkish ASV systems. This paper explores this trade-off by comparing three different approaches using the state-of-the-art ECAPA-TDNN model: utilizing the pre-trained English ECAPA-TDNN model, training the ECAPA-TDNN model from scratch with the Turkish Common Voice dataset, and fine-tuning the pre-trained English ECAPA-TDNN model with Turkish data. Experimental results reveal that the pre-trained English ECAPA-TDNN model outperforms the model trained from scratch on Turkish data and the fine-tuned model in terms of the equal error rate (EER) criterion. However, the fine-tuning approach demonstrates the best performance according to the minimum detection cost function (min-DCF) metric when security is prioritized over user convenience. © 2024 IEEE.
  • Yükleniyor...
    Küçük Resim
    Öğe
    Gauss Karışım Modeli ve Genlik Spektrumu Öznitelikleri ile Sesteki Gizli Bilginin Sezimlenmesi
    (2017) Hanilçi, Cemal
    Son yıllarda, dijital verilerin kullanımının önemli ölçüde artması ile dijital ortam verilerinin gizli haberleşme için kullanılması oldukça yaygınlaşmıştır. Bununla birlikte, dijital verilerdeki gizli mesajın tespiti (steganaliz) çalışmaları da aynı ölçüde önem kazanmaktadır. Bu çalışmanın amacı, literatürde konuşma işleme uygulamalarında yaygın olarak kullanılan Gauss karışım modeli (GKM) sınıflandırıcısı ve Mel-frekansı kepstrum katsayıları (MFKK) özniteliklerini kullanarak dijital ses (konuşma) dosyalarındaki gizli mesaj varlığını belirlemektir. 4380 adet konuşma sinyalinin kullanıldığı deneysel çalışmalardan MFKK öznitelikleri ve GKM sınıflandırıcısının gizli mesaj tespiti probleminde yaygın olarak kullanılan destek vektör makineleri (DVM) sınıflandırıcısından daha iyi sonuç verdiği görülmektedir
  • Küçük Resim Yok
    Öğe
    Güvenilir Konuşmacı Doğrulama Için Elverişsiz Durumlarda Saldırı Tespiti
    (2019) Hanilçi, Cemal
    Güvenilir Konuşmacı Doğrulama için Elverişsiz Durumlarda Saldırı Tespiti başlıklı proje kapsamında konuşmacı doğrulama sistemlerine ses sentezleme ve ses dönüştürme yolu ile yapılabilecek yanıltma saldırılarının otomatik olarak tespit edilmesi problemi ele alınmıştır. Söz konusu problemin seçilmesinin temel amacı günümüzde yaygın olarak kullanılan otomatik kişi tanıma sistemlerinin hemen hemen tamamının karşı karşıya kaldığı yanıltma saldırıları ve mevcut sistemlerin bu saldırılara karşı savunmasız olduğu gerçeğidir. Bu nedenle yaygın biyometrik kişi tanıma sistemlerinden biri olan konuşmacı doğrulama sistemleri de günümüzde hızla gelişen ve birçok ücretsiz açık kaynak kodlu ses sentezleme ve ses dönüştürme yöntemleri ile oluşturulan yapay/sentetik seslere karşı oldukça savunmasızdır. Bu projede ilk olarak 2015 yılında düzenlenen ve proje yürütücüsünün de düzenleme ekibinde yer aldığı uluslararası Otomatik Konuşmacı Doğrulama Yanıltma ve Saldırı Tespiti yarışması için hazırlanan ASVspoof 2015 veritabanı ile 2017 yılında düzenlenen aynı isimli yarışma için oluşturulan ASVspoof 2017 veritabanı kullanılmıştır. Projede konuşmacı doğrulama sistemlerine yapılabilecek yanıltma saldırılarını tespit edebilmek amacı ile farklı öznitelik çıkarma ve sınıflandırma algoritmaları incelenmiştir. Yapılan araştırmalarda ses sentezleme ve ses dönüştürme saldırılarının tespit edilmesinde faz tabanlı özniteliklerin, genlik spektrumundan elde edilen özniteliklere nazaran çok daha üstün performans gösterdiği tespit edilmiştir. Ancak toplamsal/konvolüsyonel gürültü durumunda ise genlik spektrumu özniteliklerinin faz özniteliklerinden daha güçlü olduğu ortaya çıkmıştır. Sınıflandırma aşamasında ise klasik Gauss Karışım Modeli (GMM) daha karmaşık ve modern i-vector ve SVM sınıflandırıcılarından çok daha iyi saldırı tespiti performansı göstermiştir. İ-vector sınıflandırıcısı yanıltma saldırısı tespiti problemi için yeniden tasarlandığında, i-vector çıkarıcının sentetik ve gerçek seslerle birlikte eğitildiği durumda, i-vector sınıflandırıcısının performansında ciddi artışlar gözlenmiştir. Son olarak, derin sinir ağları ile kaydedilmiş seslerin yeniden oynatılarak gerçekleştirilebileek saldırı tespiti incelenmiş olup, derin sinir ağlarının tekrar saldırılarını tespit etmede oldukça güçlü olduğu gösterilmiştir.
  • Küçük Resim Yok
    Öğe
    Lightweight CNN-Based Intrusion Detection for Automotive CAN Bus in Light Commercial Vehicles
    (Bursa Teknik Üniversitesi, 2025) Tüfekcioğlu, Emre; Hanilçi, Cemal; Gürkan, Hakan
    With the rapid advancement of digitalization and automation, modern vehicles, especially in the light commercial segment, have evolved into complex, interconnected platforms resembling mobile computing systems. This transformation has increased the dependency on in-vehicle communication networks and, as a result, exposed them to a wider range of cybersecurity threats. A fundamental aspect of the proposed method is the use of a lightweight CNN model specific for deployment in embedded automotive environments with limited computational resources and optimized for efficiency. Operating on low-power hardware platforms such as edge ECUs, the tiny device developed in this study works effectively unlike conventional deep learning architectures seeking high processing power and memory. Despite its minimal computational footprint, the model is capable of accurately distinguishing between legitimate and spoofed communication traffic, as well as detecting a variety of attack forms that target different CAN protocol components. The performance metrics of the model further highlight its effectiveness, achieving a ROC AUC Score of 0.9887, an Accuracy of 0.9887, a Precision of 0.9825, a Recall of 0.9952, and an F1-Score of 0.9888. Particularly for real-time on-vehicle intrusion detection systems, this harmony between performance and efficiency makes the strategy especially important. Just as importantly is the introduction of a specifically produced hybrid dataset, which is fundamental for system evaluation and training. The dataset aggregates synthetic generated attack scenarios with real-world spoofing, injection, and denial-of- service (DoS) conditions using actual CAN traffic acquired from a J1939-compliant light commercial vehicle. Standard 11-bit identities combined with industrial communication protocols help the dataset to reflect real-world vehicle dynamics across several ECUs under various scenarios. The model can learn fine-grained patterns often missed by conventional rule-based or manually engineered approaches by means of the image-like transformation of CAN messages—preserving bit-level and temporal information. In intelligent transportation systems, the lightweight CNN architecture and the strong dataset combine to create a scalable and deployable IDS framework that can improve in-vehicle cybersecurity.
  • Yükleniyor...
    Küçük Resim
    Öğe
    Linear prediction residual features for automatic speaker verification anti-spoofing
    (Springer, 2018) Hanilçi, Cemal
    Automatic speaker verification (ASV) systems are highly vulnerable against spoofing attacks. Anti-spoofing, determining whether a speech signal is natural/genuine or spoofed, is very important for improving the reliability of the ASV systems. Spoofing attacks using the speech signals generated using speech synthesis and voice conversion have recently received great interest due to the 2015 edition of Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015). In this paper, we propose to use linear prediction (LP) residual based features for anti-spoofing. Three different features extracted from LP residual signal were compared using the ASVspoof 2015 database. Experimental results indicate that LP residual phase cepstral coefficients (LPRPC) and LP residual Hilbert envelope cepstral coefficients (LPRHEC) obtained from the analytic signal of the LP residual yield promising results for anti-spoofing. The proposed features are found to outperform standard Mel-frequency cepstral coefficients (MFCC) and Cosine Phase (CosPhase) features. LPRPC and LPRHEC features give the smallest equal error rates (EER) for eight spoofing methods out of ten spoofing attacks in comparison to MFCC and CosPhase features.
  • Yükleniyor...
    Küçük Resim
    Öğe
    Mixture Linear Prediction in Speaker Verification Under Vocal Effort Mismatch
    (Ieee-Inst Electrical Electronics Engineers Inc, 2014) Pohjalainen, Jouni; Hanilçi, Cemal; Kinnunen, Tomi; Alku, Paavo
    This paper describes an approach to robust signal analysis using iterative parameter re-estimation of a mixture autoregressive (AR) model. The model's focus can be adjusted by initialization of the target and non-target states. The variant examined in this study uses an i.i.d. mixture AR model and is designed to tackle the spectral biasing effect caused by the voice excitation in speech signals with variable fundamental frequency. In our speaker verification experiments, this method performed competitively against standard spectrum analysis techniques in non-mismatch conditions and showed significant improvements in vocal effort mismatch conditions.
  • «
  • 1 (current)
  • 2
  • »

| Bursa Teknik Üniversitesi | Kütüphane | Açık Erişim Politikası | Rehber | OAI-PMH |

Bu site Creative Commons Alıntı-Gayri Ticari-Türetilemez 4.0 Uluslararası Lisansı ile korunmaktadır.


Mimar Sinan Mahallesi Mimar, Sinan Bulvarı, Eflak Caddesi, No: 177, 16310, Yıldırım, Bursa, Türkiye
İçerikte herhangi bir hata görürseniz lütfen bize bildirin

DSpace 7.6.1, Powered by İdeal DSpace

DSpace yazılımı telif hakkı © 2002-2026 LYRASIS

  • Çerez ayarları
  • Gizlilik politikası
  • Son Kullanıcı Sözleşmesi
  • Geri bildirim Gönder