Toward robust replay attack detection in Automatic Speaker Verification: A study of spectrum estimation and channel magnitude response modeling

dc.contributor.authorBekiryazici, Sule
dc.contributor.authorHanilci, Cemal
dc.contributor.authorOzcan, Neyir
dc.date.accessioned2026-02-08T15:15:11Z
dc.date.available2026-02-08T15:15:11Z
dc.date.issued2026
dc.departmentBursa Teknik Üniversitesi
dc.description.abstractAutomatic Speaker Verification (ASV) systems are increasingly adopted for biometric authentication but remain highly vulnerable to spoofing, particularly replay attacks. Existing countermeasures (CMs) for replay attack detection rely predominantly on discrete Fourier transform (DFT)-based spectral features, which are sensitive to noise and channel distortions common in physical access (PA) scenarios. This work presents the first comprehensive study of Channel Magnitude Response (CMR) representations for replay detection, explicitly analyzing the impact of spectrum estimation and feature design. The contribution of this work are fourfold: (i) CMR estimation is generalized beyond MFCCs to LFCC and CQCC features, with LFCC-based CMRs offering superior discrimination; (ii) alternative spectrum estimators - linear prediction (LP) and multitaper (MT) - are integrated into the CMR pipeline, yielding substantial gains over conventional DFT (iii) robustness is investigated under silence-free (voiced-only) conditions, mitigating known biases in ASVspoof datasets and (iv) a systematic evaluation of CMR is provided on the recently released ReplayDF corpus, a challenging benchmark combining replay and synthetic speech variability. Experiments on ASVspoof 2017, 2019, 2021, and ReplayDF using both baseline classifiers (ResNet18 and LCNN) and stronger models (Res2Net50 and SE-Res2Net50) show that the proposed approach consistently outperforms conventional features. Particularly, LFCC-CMR features with LP spectra achieve an Equal Error Rate (EER) as low as 1.34% on ASVspoof 2019 (PA), representing considerable relative improvements over traditional methods. Moreover, CMR-based systems retain high performance even when silent segments are removed, unlike conventional approaches. These results establish CMR with principled spectral modeling as a robust and generalizable framework for replay attack detection, opening new directions for resilient spoofing countermeasures.
dc.description.sponsorshipScientific and Technological Research Council of Turkiye (TUBITAK) [123E384]
dc.description.sponsorshipThis study was supported by Scientific and Technological Research Council of Turkiye (TUBITAK) under Grant Number 123E384. The authors thank TUBITAK for their support.
dc.identifier.doi10.1016/j.csl.2025.101906
dc.identifier.issn0885-2308
dc.identifier.issn1095-8363
dc.identifier.scopus2-s2.0-105023580455
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1016/j.csl.2025.101906
dc.identifier.urihttps://hdl.handle.net/20.500.12885/5652
dc.identifier.volume98
dc.identifier.wosWOS:001632372400001
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherAcademic Press Ltd- Elsevier Science Ltd
dc.relation.ispartofComputer Speech and Language
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzWOS_KA_20260207
dc.subjectSpoofing countermeasures
dc.subjectReplay attack detection
dc.subjectBlind channel magnitude response
dc.subjectSpectrum estimation
dc.titleToward robust replay attack detection in Automatic Speaker Verification: A study of spectrum estimation and channel magnitude response modeling
dc.typeArticle

Dosyalar