Advances in soundscape and music emotion recognition

Author: 
Date created: 
2020-07-02
Identifier: 
etd20932
Keywords: 
Affective Computing, Soundscape Recording, Music, Sound Design, Perceived Emotion, Machine Learning
Abstract: 

A soundscape is an acoustic environment perceived in context by human beings. A soundscape recording is a recording of the sound present at a given location at a given time, obtained with one or more fixed or moving microphones. Soundscape recordings play essential roles in the experience of video games, virtual reality and film. Artificial soundscapes created by professional sound designers can evoke a specific emotion in target audiences to better immerse them in multimedia content. The research in soundscape emotion recognition (SER) investigates computational systems that recognize the perceived emotion of soundscape recordings. Similarly, music emotion recognition is building computational systems that recognize the perceived emotion of music recordings.We concentrate on using novel artificial intelligence algorithms to analyze soundscape recordings and music recordings from the perspective of affective computing. The contributions of this thesis are as follows: First, we conduct empirical studies to demonstrate that listeners agree with each other regarding the perceived emotion of soundscape and music, and that it is possible to build a human-competitive model to predict the emotion perceived. Second, we curate and collect a soundscape dataset and multiple music datasets annotated with perceived emotion using crowdsourcing techniques. Third, we experiment with SER algorithms based on deep learning techniques. An evaluation of our SER models demonstrates that they perform better than each listener and state-of-the-art models. Fourth, we investigate quantifiable trends in the effect of mixing on the perceived emotion of soundscape recordings. Fifth, we build a music emotion recognition model for experimental music to investigate the ranking-based emotion recognition task. Finally, we utilize models built for SER and sound event detection to analyze and compare Chinese and Western classical music. Certain similarities between Chinese classical music and soundscape recordings permit transferability between deep learning models. These contributions present methods for automating the soundscape and music emotion recognition tasks.

Document type: 
Thesis
Rights: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
File(s): 
Supervisor(s): 
Philippe Pasquier
Department: 
Communication, Art & Technology: School of Interactive Arts and Technology
Thesis type: 
(Thesis) Ph.D.
Statistics: