Corpus-based symbolic music generation: Data, representation, models, evaluation

Resource type
Thesis type
(Thesis) Ph.D.
Date created
Author: Ens, Jeffrey
Enabled by advances in artificial intelligence, research exploring the computational simulation of creative behaviours has produced human competitive generative systems in a variety of creative domains. This thesis focuses on developing generative music systems using machine learning, and evaluating these systems using statistical methods. After introducing related work in the area of generative music systems, we describe the MetaMIDI Dataset, the dataset that we will use for training, which is comprised of over 440,000 MIDI files. We adapt a pre-existing Audio-MIDI matching technique to match files in our MIDI dataset with audio previews of tracks available via the Spotify public API, since each Spotify track is associated with rich set of metadata, including information such as the artist, genre, arousal, valence and danceability. Furthermore, we provide an assessment of the accuracy of the Audio-MIDI matching technique, highlighting areas for future improvement. Then we describe two analytic evaluation methods that we developed: CAEMSI, which is domain-agnostic; and StyleRank, which is designed for music generative systems specifically. CAEMSI is a Cross-domain Analytic Evaluation Methodology for Style-Imitation systems. Note that style-imitation systems are simply generative systems that are trained to produce artifacts in a particular style. In the context of evaluating style-imitation systems, we are often interested in determining if there is a statistically significant difference or equivalence between the training data and a set of artifacts generated by the system. To this end, we outline a statistical method to measure the equivalence and difference between two sets of artifacts, given an arbitrary similarity or distance measure. Using normalized compression distance, we conduct experiments which demonstrate that CAEMSI frequently detects a significant difference between the work of two different visual artists and detects a significant equivalence between two disjoint sets of work from the same visual artist. The same test is repeated for music composers, with similar results. StyleRank is a system for ranking symbolic musical excerpts based on their similarity to a style. Note that we consider a style to simply be the stylistic characteristics delineated by an arbitrary collection of symbolic musical excerpts, which we refer to as the corpus. Musical excerpts are represented using a variety of features, and a Random Forest is trained to discriminate between the corpus and the set of excerpts we wish to rank. An embedding is extracted from a trained Random Forest, from which the rankings are directly derived. We outline two experiments which demonstrate that: StyleRank can proficiently distinguish between the musical styles of different composers; and that StyleRank is congruent with human perception, using data collected from thousands of participants in an online listening study. We anticipate that this system will be useful for researchers who wish to evaluate the performance of many systems, or investigate the effects of various hyper-parameters, employing experimental designs which would be incompatible with a listening test experimental design. Motivated by lack of consensus within the research community, we make some recommendations for the listening experiment design, examining the role of two parameters, the proportion of questions and the proportion of participants, both of which are measured relative to the total number of observations. Using experimental data collected from previous studies, we compare the power and reliability of various experimental designs to arrive at substantiated recommendations regarding these proportions. Finally, we propose the Multi-Track Music Machine (MMM), a generative system trained using the MetaMIDI dataset, that is designed to support co-creative music composition workflows. MMM supports the infilling of musical material on the track and bar level, and can condition generation on particular attributes including: instrument type, note density, polyphony level, and note duration. In order to integrate these features, we employ a different type of representation for musical material, creating a time-ordered sequence of musical events for each track and concatenating several tracks into a single sequence, rather than using a single time-ordered sequence where the musical events corresponding to different tracks are interleaved. We present experimental results which demonstrate that MMM is able to consistently avoid duplicating the musical material it was trained on, generate music that is stylistically similar (as measured using StyleRank) to the training dataset, and that attribute controls can be employed to enforce various constraints on the generated material. We also outline several real world applications of MMM, including the production of musical albums, and collaborations with industry partners that explore integrating MMM into real-world products.
201 pages.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Pasquier, Philippe
Download file Size
etd22360.pdf 2.13 MB

Views & downloads - as of June 2023

Views: 100
Downloads: 2