A Ground-Truth-Free Framework for Validating Emotions in Generative AI Speech Synthesis

dc.contributor.authorÖzcan, Ahmet Remzi
dc.date.accessioned2026-02-08T15:11:11Z
dc.date.available2026-02-08T15:11:11Z
dc.date.issued2026
dc.departmentBursa Teknik Üniversitesi
dc.description.abstractEvaluating emotional expressivity in synthetic speech is challenging due to the absence of ground-truth affective labels and the reliance on costly human perceptual studies. This paper introduces a prototype-based framework that integrates affect-specialized Emotion2Vec embeddings with general-purpose acoustic and linguistic representations from WavLM to enable scalable and system-agnostic evaluation. Embeddings are projected into a shared latent space where each emotion category is represented by a learnable prototype, supporting both categorical classification and a continuous similarity-based metric, the Emotion Adherence Score (EAS). While categorical performance varied across systems, EAS remained consistently high, highlighting its robustness in capturing graded affective fidelity. On a 1,400-utterance corpus spanning four heterogeneous TTS systems, the proposed method achieved substantial improvements over a strong embedding baseline, increasing accuracy from 51.43% to 77.50% and macro-F1 from 0.5109 to 0.7736. Human ratings further supported EAS, showing a moderate positive correlation with human judgments. Overall, the proposed framework provides a principled and scalable approach for benchmarking emotional expressivity in TTS, bridging categorical and continuous perspectives and reducing reliance on ground-truth labels and large-scale listening tests. © 2013 IEEE.
dc.identifier.doi10.1109/ACCESS.2026.3656800
dc.identifier.scopus2-s2.0-105028225336
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2026.3656800
dc.identifier.urihttps://hdl.handle.net/20.500.12885/5294
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relation.ispartofIEEE Access
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzScopus_KA_20260207
dc.subjectEmotion Adherence Score
dc.subjectEmotional Text-to-Speech
dc.subjectPrototype-based Learning
dc.subjectSelf-supervised Speech Representations
dc.subjectSpeech Emotion Recognition
dc.titleA Ground-Truth-Free Framework for Validating Emotions in Generative AI Speech Synthesis
dc.typeArticle

Dosyalar