Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise

Hanilçi, Cemal; Kinnunen, Tomi; Sahidullah, Md; Sizov, Aleksandr

Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise

dc.authorid	0000-0002-9174-0367	en_US
dc.contributor.author	Hanilçi, Cemal
dc.contributor.author	Kinnunen, Tomi
dc.contributor.author	Sahidullah, Md
dc.contributor.author	Sizov, Aleksandr
dc.date.accessioned	2021-03-20T20:14:22Z
dc.date.available	2021-03-20T20:14:22Z
dc.date.issued	2016
dc.department	BTÜ, Mühendislik ve Doğa Bilimleri Fakültesi, Elektrik Elektronik Mühendisliği Bölümü	en_US
dc.description	Sahidullah, Md/0000-0002-0624-2903	en_US
dc.description.abstract	Automatic speaker verification (ASV) technology is recently finding its way to end-user applications for secure access to personal data, smart services or physical facilities. Similar to other bioinatric technologies, speaker verification is vulnerable to spoofing attacks where an attacker masquerades as a particular target speaker via impersonation, replay, text-to-speech (TTS) or voice conversion (VC) techniques to gain illegitimate access to the system. We focus on TTS and VC that represent the most flexible, high-end spoofing attacks. Most of the prior studies on synthesized or converted speech detection report their findings using high-quality clean recordings. Meanwhile, the performance of spoofing detectors in the presence of additive noise, an important consideration in practical ASV implementations, remains largely unknown. To this end, our study provides a comparative analysis of existing state-of-the-art, off-the-shelf synthetic speech detectors under additive noise contamination with a special focus on front-end processing that has been found critical. Our comparison includes eight acoustic feature sets, five related to spectral magnitude and three to spectral phase information. All the methods contain a number of internal control parameters. Except for feature post-processing steps (deltas and cepstral mean normalization) that we optimized for each method, we fix the internal control parameters to their default values based on literature, and compare all the variants using the exact same dimensionality and back-end system. In addition to the eight feature sets, we consider two alternative classifier back-ends: Gaussian mixture model (GMM) and i-vector, the latter with both cosine scoring and probabilistic linear discriminant analysis (PLDA) scoring. Our extensive analysis on the recent ASVspoof 2015 challenge provides new insights to the robustness of the spoofing detectors. Firstly, unlike in most other speech processing tasks, all the compared spoofing detectors break down even at relatively high signal-to-noise ratios (SNRs) and fail to generalize to noisy conditions even if performing excellently on clean data. This indicates both difficulty of the task, as well as potential to over-fit the methods easily. Secondly, speech enhancement preprocessing is not found helpful. Thirdly, GMM back-end generally outperforms the more involved i-vector back-end. Fourthly, concerning the compared features, the Mel-frequency cepstral coefficient (MFCC) and subband spectral centroid magnitude coefficient (SCMC) features perform the best on average though the winner method depends on SNR and noise type. Finally, a study with two score fusion strategies shows that combining different feature based systems improves recognition accuracy for known and unknown attacks in both clean and noisy conditions. In particular, simple score averaging fusion, as opposed to weighted fusion with logistic loss weight optimization, was found to work better, on average. For clean speech, it provides 88% and 28% relative improvements over the best standalone features for known and unknown spoofing techniques, respectively. If we consider the best score fusion of just two features, then RPS serves as a complementary agent to one of the magnitude features. To sum up, our study reveals a significant gap between the performance of state-of-the-art spoofing detectors between clean and noisy conditions. (C) 2016 Elsevier B.V. All rights reserved.	en_US
dc.description.sponsorship	Academy of FinlandAcademy of FinlandEuropean Commission [253120, 283256]; Research European Agency (REA) of the European Commission [647850]; Scientific and Technological Research Council of Turkey (TUBITAK)Turkiye Bilimsel ve Teknolojik Arastirma Kurumu (TUBITAK) [115E916]	en_US
dc.description.sponsorship	This project has been primarily supported by the Academy of Finland (projects 253120 and 283256). The paper also reflects some results from the OCTAVE Project (#647850), funded by the Research European Agency (REA) of the European Commission, in its framework programme Horizon 2020. The views expressed in this paper are those of the authors and do not engage any official position of the European Commission. The work of Cemal Hanilci is supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under project #115E916	en_US
dc.identifier.doi	10.1016/j.specom.2016.10.002
dc.identifier.endpage	97	en_US
dc.identifier.issn	0167-6393
dc.identifier.issn	1872-7182
dc.identifier.scopus	2-s2.0-84996931198
dc.identifier.scopusquality	Q1
dc.identifier.startpage	83	en_US
dc.identifier.uri	http://doi.org/10.1016/j.specom.2016.10.002
dc.identifier.uri	https://hdl.handle.net/20.500.12885/1041
dc.identifier.volume	85	en_US
dc.identifier.wos	WOS:000390507000008
dc.identifier.wosquality	Q2
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.institutionauthor	Hanilçi, Cemal
dc.language.iso	en	en_US
dc.publisher	Elsevier	en_US
dc.relation.ispartof	Speech Communication	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Speaker recognition	en_US
dc.subject	Anti spoofing	en_US
dc.subject	Countermeasures	en_US
dc.subject	Additive noise	en_US
dc.title	Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise
dc.type	Article

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: Hanilci-2016-Spoofing-detection-goes-noisy-an-an.pdf
Boyut:: 738.29 KB
Biçim:: Adobe Portable Document Format
Açıklama:: Tam Metin / Full Text

İndir

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Elektrik Elektronik Mühendisliği Bölümü Yayın Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu