Hierarchical multi-head attention LSTM for polyphonic symbolic melody generation

Kasif, Ahmet; Sevgen, Selcuk; Ozcan, Alper; Catal, Cagatay

Hierarchical multi-head attention LSTM for polyphonic symbolic melody generation

dc.authorid	0000-0003-1443-1779
dc.authorid	0000-0003-0959-2930
dc.contributor.author	Kasif, Ahmet
dc.contributor.author	Sevgen, Selcuk
dc.contributor.author	Ozcan, Alper
dc.contributor.author	Catal, Cagatay
dc.date.accessioned	2026-02-08T15:14:59Z
dc.date.available	2026-02-08T15:14:59Z
dc.date.issued	2024
dc.department	Bursa Teknik Üniversitesi
dc.description.abstract	Creating symbolic melodies with machine learning is challenging because it requires an understanding of musical structure and the handling of inter-dependencies and long-term dependencies. Learning the relationship between events that occur far apart in time in music poses a considerable challenge for machine learning models. Another notable feature of music is that notes must account for several inter-dependencies, including melodic, harmonic, and rhythmic aspects. Baseline methods, such as RNNs, LSTMs, and GRUs, often struggle to capture these dependencies, resulting in the generation of musically incoherent or repetitive melodies. As such, in this study, a hierarchical multi-head attention LSTM model is proposed for creating polyphonic symbolic melodies. This enables our model to generate more complex and expressive melodies than previous methods, while still being musically coherent. The model allows learning of long-term dependencies at different levels of abstraction, while retaining the ability to form inter-dependencies. The study has been conducted on two major symbolic music datasets, MAESTRO and Classical-Music MIDI, which feature musical content encoded on MIDI. The artistic nature of music poses a challenge to evaluating the generated content and qualitative analysis are often not enough. Thus, human listening tests are conducted to strengthen the evaluation. Qualitative analysis conducted on the generated melodies shows significantly improved loss scores on MSE over baseline methods, and is able to generate melodies that were both musically coherent and expressive. The listening tests conducted using Likert-scale support the qualitative results and provide better statistical scores over baseline methods.
dc.description.sponsorship	Qatar University
dc.description.sponsorship	The authors sincerely thank their universities for their support and essential research infrastructure, contributing to the success of this study.
dc.identifier.doi	10.1007/s11042-024-18491-7
dc.identifier.endpage	30317
dc.identifier.issn	1380-7501
dc.identifier.issn	1573-7721
dc.identifier.issue	10
dc.identifier.scopus	2-s2.0-85184399495
dc.identifier.scopusquality	Q1
dc.identifier.startpage	30297
dc.identifier.uri	https://doi.org/10.1007/s11042-024-18491-7
dc.identifier.uri	https://hdl.handle.net/20.500.12885/5544
dc.identifier.volume	83
dc.identifier.wos	WOS:001157545000005
dc.identifier.wosquality	N/A
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.language.iso	en
dc.publisher	Springer
dc.relation.ispartof	Multimedia Tools and Applications
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	WOS_KA_20260207
dc.subject	Symbolic music
dc.subject	Music generation
dc.subject	Recurrent networks
dc.subject	Multi-head attention
dc.title	Hierarchical multi-head attention LSTM for polyphonic symbolic melody generation
dc.type	Article

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Hierarchical multi-head attention LSTM for polyphonic symbolic melody generation

Dosyalar

Koleksiyon