Application and evaluation of affective adaptive generative music for video games

Resource type
Thesis type
(Thesis) Ph.D.
Date created
Author: Plut, Cale
Music is an element in almost every video game. Because games are interactive and music is generally linear, writing music that matches the actions and events of gameplay presents a unique challenge. Adaptive music partially addresses this, but creating adaptive music requires extra labour and restricts elements of the composition. Generative music, created with some degree of autonomy from it's input, presents a possible tool for addressing these drawbacks. Depending on the particulars of the system, these systems are capable of creating large amounts of music, very quickly, that fit a given set of constraints. Additionally, while training and creating generative models can be an expensive process that requires large amounts of computing power, the generation of music is generally computationally lightweight. Theoretically, generative music systems may be capable of creating highly adaptive music with far less labour cost than manual composition. We design, implement, and evaluate an application of affective adaptive generative video game music. We first survey uses of generative music in games. While academic approaches generally create novel algorithms for real-time affective adaptive music composition, there is a large gap between the integration of academic systems in games and common industry approaches to using music in games. We therefore focus on the application of generative music in common game music frameworks. Academic approaches to generative music generally use a model of emotion to control the affective expression of the accompanying score. We investigate this approach, and find that audience members perceive affective musical adaptivity. To use emotion as an intermediary between gameplay and music, we split this mediation into two tasks: We describe the perceived emotion of gameplay in an emotional model, and we control the perceived emotion of music to match the emotional model. To describe the perceived emotion of gameplay, we create the Predictive Gameplay-based Layered Affect Model (PreGLAM). PreGLAM is inspired by research in affective non-player-characters, and uses a cognitive appraisal model to respond to gameplay events. PreGLAM essentially acts as an audience member, watching the gameplay, and modeling a perceived valence, arousal, and tension value. We empirically evaluate PreGLAM, and find that it significantly outperforms a random walk time series in matching ground-truth annotations of perceived gameplay emotion. To create our generative score, we use the Multi-track Music Machine (MMM) transformer model to generate variation stems from a composed adaptive musical score. Because MMM generates variations based on an input clip of music, we control the emotional expression of MMMs output by controlling the emotional expression of the input. To do so, we create a parametric composition guide to compose an adaptive score that expresses three levels of affective perception in a three-dimensional Valence-Arousal-Tension (VAT) model of emotion, titled the "IsoVAT" guide. The IsoVAT guide is based on a collation of multiple cross-discipline surveys of empirical music-emotion research (MER), and describes how alterations in musical features affect the listener's perceived affect in a VAT model. We empirically evaluate the IsoVAT guide by following it to compose a corpus of 90 clips which are evaluated across 3 different study designs. We expand our adaptive score using MMM. We adaptively re-sequence individual tracks from our generative variations, creating almost 14 trillion unique musical arrangements in our generative adaptive score. We also write a linear score to serve as a baseline, which is produced using the same synthesis and performance techniques as the adaptive and generative scores. We empirically evaluate our musical scores using real-time annotations of perceived emotion, as well as with a post-hoc questionnaire. Our findings indicate that our application of generative music in games comparably maintains perceived emotional congruency of previous applications, while outperforming previous applications in perceptions of immersion.
228 pages.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Pasquier, Philippe
Attachment Size
etd21917.pdf 9.26 MB