Hierarchical multi-head attention LSTM for polyphonic symbolic melody generation

dc.authorid0000-0003-1443-1779
dc.authorid0000-0003-0959-2930
dc.contributor.authorKasif, Ahmet
dc.contributor.authorSevgen, Selcuk
dc.contributor.authorOzcan, Alper
dc.contributor.authorCatal, Cagatay
dc.date.accessioned2026-02-08T15:14:59Z
dc.date.available2026-02-08T15:14:59Z
dc.date.issued2024
dc.departmentBursa Teknik Üniversitesi
dc.description.abstractCreating symbolic melodies with machine learning is challenging because it requires an understanding of musical structure and the handling of inter-dependencies and long-term dependencies. Learning the relationship between events that occur far apart in time in music poses a considerable challenge for machine learning models. Another notable feature of music is that notes must account for several inter-dependencies, including melodic, harmonic, and rhythmic aspects. Baseline methods, such as RNNs, LSTMs, and GRUs, often struggle to capture these dependencies, resulting in the generation of musically incoherent or repetitive melodies. As such, in this study, a hierarchical multi-head attention LSTM model is proposed for creating polyphonic symbolic melodies. This enables our model to generate more complex and expressive melodies than previous methods, while still being musically coherent. The model allows learning of long-term dependencies at different levels of abstraction, while retaining the ability to form inter-dependencies. The study has been conducted on two major symbolic music datasets, MAESTRO and Classical-Music MIDI, which feature musical content encoded on MIDI. The artistic nature of music poses a challenge to evaluating the generated content and qualitative analysis are often not enough. Thus, human listening tests are conducted to strengthen the evaluation. Qualitative analysis conducted on the generated melodies shows significantly improved loss scores on MSE over baseline methods, and is able to generate melodies that were both musically coherent and expressive. The listening tests conducted using Likert-scale support the qualitative results and provide better statistical scores over baseline methods.
dc.description.sponsorshipQatar University
dc.description.sponsorshipThe authors sincerely thank their universities for their support and essential research infrastructure, contributing to the success of this study.
dc.identifier.doi10.1007/s11042-024-18491-7
dc.identifier.endpage30317
dc.identifier.issn1380-7501
dc.identifier.issn1573-7721
dc.identifier.issue10
dc.identifier.scopus2-s2.0-85184399495
dc.identifier.scopusqualityQ1
dc.identifier.startpage30297
dc.identifier.urihttps://doi.org/10.1007/s11042-024-18491-7
dc.identifier.urihttps://hdl.handle.net/20.500.12885/5544
dc.identifier.volume83
dc.identifier.wosWOS:001157545000005
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofMultimedia Tools and Applications
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzWOS_KA_20260207
dc.subjectSymbolic music
dc.subjectMusic generation
dc.subjectRecurrent networks
dc.subjectMulti-head attention
dc.titleHierarchical multi-head attention LSTM for polyphonic symbolic melody generation
dc.typeArticle

Dosyalar