A Ground-Truth-Free Framework for Validating Emotions in Generative AI Speech Synthesis

Özcan, Ahmet Remzi

A Ground-Truth-Free Framework for Validating Emotions in Generative AI Speech Synthesis

Tarih

2026

Yazarlar

Özcan, Ahmet Remzi

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Evaluating emotional expressivity in synthetic speech is challenging due to the absence of ground-truth affective labels and the reliance on costly human perceptual studies. This paper introduces a prototype-based framework that integrates affect-specialized Emotion2Vec embeddings with general-purpose acoustic and linguistic representations from WavLM to enable scalable and system-agnostic evaluation. Embeddings are projected into a shared latent space where each emotion category is represented by a learnable prototype, supporting both categorical classification and a continuous similarity-based metric, the Emotion Adherence Score (EAS). While categorical performance varied across systems, EAS remained consistently high, highlighting its robustness in capturing graded affective fidelity. On a 1,400-utterance corpus spanning four heterogeneous TTS systems, the proposed method achieved substantial improvements over a strong embedding baseline, increasing accuracy from 51.43% to 77.50% and macro-F1 from 0.5109 to 0.7736. Human ratings further supported EAS, showing a moderate positive correlation with human judgments. Overall, the proposed framework provides a principled and scalable approach for benchmarking emotional expressivity in TTS, bridging categorical and continuous perspectives and reducing reliance on ground-truth labels and large-scale listening tests. © 2013 IEEE.

Anahtar Kelimeler

Emotion Adherence Score, Emotional Text-to-Speech, Prototype-based Learning, Self-supervised Speech Representations, Speech Emotion Recognition

Kaynak

IEEE Access

Scopus Q Değeri

Q1

Bağlantı

https://doi.org/10.1109/ACCESS.2026.3656800
https://hdl.handle.net/20.500.12885/5294

Koleksiyon

Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

A Ground-Truth-Free Framework for Validating Emotions in Generative AI Speech Synthesis

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon