Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise

dc.authorid0000-0002-9174-0367en_US
dc.contributor.authorHanilçi, Cemal
dc.contributor.authorKinnunen, Tomi
dc.contributor.authorSahidullah, Md
dc.contributor.authorSizov, Aleksandr
dc.date.accessioned2021-03-20T20:14:22Z
dc.date.available2021-03-20T20:14:22Z
dc.date.issued2016
dc.departmentBTÜ, Mühendislik ve Doğa Bilimleri Fakültesi, Elektrik Elektronik Mühendisliği Bölümüen_US
dc.descriptionSahidullah, Md/0000-0002-0624-2903en_US
dc.description.abstractAutomatic speaker verification (ASV) technology is recently finding its way to end-user applications for secure access to personal data, smart services or physical facilities. Similar to other bioinatric technologies, speaker verification is vulnerable to spoofing attacks where an attacker masquerades as a particular target speaker via impersonation, replay, text-to-speech (TTS) or voice conversion (VC) techniques to gain illegitimate access to the system. We focus on TTS and VC that represent the most flexible, high-end spoofing attacks. Most of the prior studies on synthesized or converted speech detection report their findings using high-quality clean recordings. Meanwhile, the performance of spoofing detectors in the presence of additive noise, an important consideration in practical ASV implementations, remains largely unknown. To this end, our study provides a comparative analysis of existing state-of-the-art, off-the-shelf synthetic speech detectors under additive noise contamination with a special focus on front-end processing that has been found critical. Our comparison includes eight acoustic feature sets, five related to spectral magnitude and three to spectral phase information. All the methods contain a number of internal control parameters. Except for feature post-processing steps (deltas and cepstral mean normalization) that we optimized for each method, we fix the internal control parameters to their default values based on literature, and compare all the variants using the exact same dimensionality and back-end system. In addition to the eight feature sets, we consider two alternative classifier back-ends: Gaussian mixture model (GMM) and i-vector, the latter with both cosine scoring and probabilistic linear discriminant analysis (PLDA) scoring. Our extensive analysis on the recent ASVspoof 2015 challenge provides new insights to the robustness of the spoofing detectors. Firstly, unlike in most other speech processing tasks, all the compared spoofing detectors break down even at relatively high signal-to-noise ratios (SNRs) and fail to generalize to noisy conditions even if performing excellently on clean data. This indicates both difficulty of the task, as well as potential to over-fit the methods easily. Secondly, speech enhancement preprocessing is not found helpful. Thirdly, GMM back-end generally outperforms the more involved i-vector back-end. Fourthly, concerning the compared features, the Mel-frequency cepstral coefficient (MFCC) and subband spectral centroid magnitude coefficient (SCMC) features perform the best on average though the winner method depends on SNR and noise type. Finally, a study with two score fusion strategies shows that combining different feature based systems improves recognition accuracy for known and unknown attacks in both clean and noisy conditions. In particular, simple score averaging fusion, as opposed to weighted fusion with logistic loss weight optimization, was found to work better, on average. For clean speech, it provides 88% and 28% relative improvements over the best standalone features for known and unknown spoofing techniques, respectively. If we consider the best score fusion of just two features, then RPS serves as a complementary agent to one of the magnitude features. To sum up, our study reveals a significant gap between the performance of state-of-the-art spoofing detectors between clean and noisy conditions. (C) 2016 Elsevier B.V. All rights reserved.en_US
dc.description.sponsorshipAcademy of FinlandAcademy of FinlandEuropean Commission [253120, 283256]; Research European Agency (REA) of the European Commission [647850]; Scientific and Technological Research Council of Turkey (TUBITAK)Turkiye Bilimsel ve Teknolojik Arastirma Kurumu (TUBITAK) [115E916]en_US
dc.description.sponsorshipThis project has been primarily supported by the Academy of Finland (projects 253120 and 283256). The paper also reflects some results from the OCTAVE Project (#647850), funded by the Research European Agency (REA) of the European Commission, in its framework programme Horizon 2020. The views expressed in this paper are those of the authors and do not engage any official position of the European Commission. The work of Cemal Hanilci is supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under project #115E916en_US
dc.identifier.doi10.1016/j.specom.2016.10.002en_US
dc.identifier.endpage97en_US
dc.identifier.issn0167-6393
dc.identifier.issn1872-7182
dc.identifier.scopusqualityQ1en_US
dc.identifier.startpage83en_US
dc.identifier.urihttp://doi.org/10.1016/j.specom.2016.10.002
dc.identifier.urihttps://hdl.handle.net/20.500.12885/1041
dc.identifier.volume85en_US
dc.identifier.wosWOS:000390507000008en_US
dc.identifier.wosqualityQ2en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.institutionauthorHanilçi, Cemal
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.relation.ispartofSpeech Communicationen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectSpeaker recognitionen_US
dc.subjectAnti spoofingen_US
dc.subjectCountermeasuresen_US
dc.subjectAdditive noiseen_US
dc.titleSpoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noiseen_US
dc.typeArticleen_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Hanilci-2016-Spoofing-detection-goes-noisy-an-an.pdf
Boyut:
738.29 KB
Biçim:
Adobe Portable Document Format
Açıklama:
Tam Metin / Full Text