A Ground-Truth-Free Framework for Validating Emotions in Generative AI Speech Synthesis

Özcan, Ahmet Remzi

A Ground-Truth-Free Framework for Validating Emotions in Generative AI Speech Synthesis

dc.contributor.author	Özcan, Ahmet Remzi
dc.date.accessioned	2026-02-08T15:11:11Z
dc.date.available	2026-02-08T15:11:11Z
dc.date.issued	2026
dc.department	Bursa Teknik Üniversitesi
dc.description.abstract	Evaluating emotional expressivity in synthetic speech is challenging due to the absence of ground-truth affective labels and the reliance on costly human perceptual studies. This paper introduces a prototype-based framework that integrates affect-specialized Emotion2Vec embeddings with general-purpose acoustic and linguistic representations from WavLM to enable scalable and system-agnostic evaluation. Embeddings are projected into a shared latent space where each emotion category is represented by a learnable prototype, supporting both categorical classification and a continuous similarity-based metric, the Emotion Adherence Score (EAS). While categorical performance varied across systems, EAS remained consistently high, highlighting its robustness in capturing graded affective fidelity. On a 1,400-utterance corpus spanning four heterogeneous TTS systems, the proposed method achieved substantial improvements over a strong embedding baseline, increasing accuracy from 51.43% to 77.50% and macro-F1 from 0.5109 to 0.7736. Human ratings further supported EAS, showing a moderate positive correlation with human judgments. Overall, the proposed framework provides a principled and scalable approach for benchmarking emotional expressivity in TTS, bridging categorical and continuous perspectives and reducing reliance on ground-truth labels and large-scale listening tests. © 2013 IEEE.
dc.identifier.doi	10.1109/ACCESS.2026.3656800
dc.identifier.scopus	2-s2.0-105028225336
dc.identifier.scopusquality	Q1
dc.identifier.uri	https://doi.org/10.1109/ACCESS.2026.3656800
dc.identifier.uri	https://hdl.handle.net/20.500.12885/5294
dc.indekslendigikaynak	Scopus
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.relation.ispartof	IEEE Access
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	Scopus_KA_20260207
dc.subject	Emotion Adherence Score
dc.subject	Emotional Text-to-Speech
dc.subject	Prototype-based Learning
dc.subject	Self-supervised Speech Representations
dc.subject	Speech Emotion Recognition
dc.title	A Ground-Truth-Free Framework for Validating Emotions in Generative AI Speech Synthesis
dc.type	Article

Koleksiyon

Scopus İndeksli Yayınlar Koleksiyonu

A Ground-Truth-Free Framework for Validating Emotions in Generative AI Speech Synthesis

Dosyalar

Koleksiyon