Advances in soundscape and music emotion recognition

Thesis type
(Thesis) Ph.D.
Date created
Author: Fan, Jianyu
A soundscape is an acoustic environment perceived in context by human beings. A soundscape recording is a recording of the sound present at a given location at a given time, obtained with one or more fixed or moving microphones. Soundscape recordings play essential roles in the experience of video games, virtual reality and film. Artificial soundscapes created by professional sound designers can evoke a specific emotion in target audiences to better immerse them in multimedia content. The research in soundscape emotion recognition (SER) investigates computational systems that recognize the perceived emotion of soundscape recordings. Similarly, music emotion recognition is building computational systems that recognize the perceived emotion of music recordings.We concentrate on using novel artificial intelligence algorithms to analyze soundscape recordings and music recordings from the perspective of affective computing. The contributions of this thesis are as follows: First, we conduct empirical studies to demonstrate that listeners agree with each other regarding the perceived emotion of soundscape and music, and that it is possible to build a human-competitive model to predict the emotion perceived. Second, we curate and collect a soundscape dataset and multiple music datasets annotated with perceived emotion using crowdsourcing techniques. Third, we experiment with SER algorithms based on deep learning techniques. An evaluation of our SER models demonstrates that they perform better than each listener and state-of-the-art models. Fourth, we investigate quantifiable trends in the effect of mixing on the perceived emotion of soundscape recordings. Fifth, we build a music emotion recognition model for experimental music to investigate the ranking-based emotion recognition task. Finally, we utilize models built for SER and sound event detection to analyze and compare Chinese and Western classical music. Certain similarities between Chinese classical music and soundscape recordings permit transferability between deep learning models. These contributions present methods for automating the soundscape and music emotion recognition tasks.
In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Simon Fraser University's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to to learn how to obtain a License from RightsLink.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Pasquier, Philippe
Attachment Size
input_data\21346\etd20932.pdf 16.74 MB