A multi-modal perceptual system for social robots: a brain-inspired architecture

Resource type
Thesis type
(Thesis) Ph.D.
Date created
2018-11-02
Authors/Contributors
Abstract
The main theme of this research study is machine perception. We are particularly interested in developing a perceptual system for social robots. These robots are designed to communicate with humans the same way they interact with each other. We argue that in order to meet this stern criterion, it is sensible that such robots are capable of perceiving their environment in a similar fashion to humans. The thesis focusses on developing a framework for designing a human-oriented perceptual system for social robots. The research is intrinsically interdisciplinary and requires integration of ideas from psychology, psychophysics, and neuroscience about human perception with robotics engineering. First, the skeleton of the architecture is developed motivated by the understanding of the hierarchical structure of primate sensory cortex. The key sub-systems of the architecture and interrelationship among them are shaped by insights from biological, computational, and psychological understanding of human perception. In particular, the multi-modal sensory information processing in the sensory cortex, the spatial-temporal binding criteria, and limited human's channel capacity of processing information. The system encapsulates the parallel distributed processing of real-world stimuli through several sensor modalities and encoding them into features vectors which in turn are processed via a number of dedicated processing units through hierarchical paths. The proposed perceptual system is context independent and can be applied to many on-going problems in social robotics. Next, a customized version of the system is developed to address the problem of person recognition in social settings. The system utilizes the information from visual and auditory modalities via a non-invasive methodology as opposed to reported person recognition systems that generally invasive. We adopt spiking neural network to integrate information form the available sensor modalities to provide a plausible and realistic computational model that facilitate a real-time response in various challenging scenarios of person recognition in social settings. In the last stage, a robust speaker recognition system has been designed in the light of the above framework. The system exploits prosodic feature to reduce the population size as well as integrates the advantages of multi-feature system and multi-classifiers system to overcome the challenges of speaker recognition in noisy environments and availability of only short utterances.
Document
Extent
124 pages.
Identifier
etd19925
Copyright statement
Copyright is held by the author(s).
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Rad, Ahmad
Language
English
Attachment Size
etd19925.pdf 8.56 MB