Enhancing Audio Replay Attack Detection with Silence-Based Blind Channel Impulse Response Estimation

dc.contributor.authorBekiryazıcı, Sule
dc.contributor.authorHanilçi, Cemal
dc.contributor.authorOzcan, Neyir
dc.date.accessioned2026-02-08T15:11:03Z
dc.date.available2026-02-08T15:11:03Z
dc.date.issued2026
dc.departmentBursa Teknik Üniversitesi
dc.description27th International Conference on Speech and Computer, SPECOM 2025 -- 2025-10-13 through 2025-10-15 -- Szeged -- 340939
dc.description.abstractReplay attacks pose a major threat to automatic speaker verification (ASV) systems, considerably degrading performance. Since replayed utterances are captured and reproduced using external microphones and speakers, they inherently reflect these acoustic influences. Such acoustic distortions serve as valuable cues for differentiating between genuine and spoofed speech, provided they can be effectively extracted and modeled. In this context, blind channel impulse response estimation has been shown to be an effective approach in replay attack detection, as it enables the characterization of the acoustic path through which the signal has propagated without requiring explicit knowledge of the original source or environment. Furthermore, prior studies have highlighted the importance of silence segments in this task, noting that these regions, being free of speech content, primarily capture the characteristics of the transmission channel. As such, silence segments offer a unique and robust opportunity for extracting channel-related features that are less influenced by speaker variability and phonetic content, thereby improving the discriminability between bonafide and replayed signals. In this paper, we argue that channel impulse response estimates derived from silence parts contain more discriminative information than those obtained from the entire signal or voiced parts. To exploit this insight, we propose to use log-magnitude channel frequency response estimated from the silence parts for replay attack detection. Experiments on ASVspoof 2019 and 2021 datasets show that utilizing silence-based channel response features reduces the EER from 4.21% to 3.17% and from 29.16% to 24.43%, respectively, compared to using the entire signal. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
dc.description.sponsorshipTürkiye Bilimsel ve Teknolojik Araştırma Kurumu, TUBITAK, (123E384)
dc.identifier.doi10.1007/978-3-032-07956-5_24
dc.identifier.endpage344
dc.identifier.isbn9789819698936
dc.identifier.isbn9789819698042
dc.identifier.isbn9789819698110
dc.identifier.isbn9789819698905
dc.identifier.isbn9783032004949
dc.identifier.isbn9789819512324
dc.identifier.isbn9783032026019
dc.identifier.isbn9783032008909
dc.identifier.isbn9783031915802
dc.identifier.isbn9789819698141
dc.identifier.issn0302-9743
dc.identifier.scopus2-s2.0-105020240587
dc.identifier.scopusqualityQ3
dc.identifier.startpage333
dc.identifier.urihttps://doi.org/10.1007/978-3-032-07956-5_24
dc.identifier.urihttps://hdl.handle.net/20.500.12885/5215
dc.identifier.volume16187 LNCS
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer Science and Business Media Deutschland GmbH
dc.relation.ispartofLecture Notes in Computer Science
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzScopus_KA_20260207
dc.subjectASVspoof 2019
dc.subjectASVspoof 2021
dc.subjectReplay attack detection
dc.subjectResNet
dc.titleEnhancing Audio Replay Attack Detection with Silence-Based Blind Channel Impulse Response Estimation
dc.typeConference Object

Dosyalar