With recent advances, robots have become more affordable and intelligent, which expands their application domain and number of consumers. Having robots around us in our daily lives creates a demand for an interaction system for communicating humans' intentions and commands to robots. We are interested in interactions that are easy, intuitive, and do not require the human to use any additional equipment. We present a robust real-time system for visual detection of hands and faces in RGB and gray-scale images based on a Deep Convolutional Neural Network. This system is designed to meet the requirements of a hands-free interface to UAVs described below that could be used for communicating to other robots equipped with a monocular camera using only hands and face gestures without any extra instruments. This work is accompanied by a novel hands-and-faces detection dataset gathered and labelled from a wide variety of sources including our own Human-UAV interaction videos, and several third-party datasets. By training our model on all these data, we obtain qualitatively good detection results in terms of both accuracy and speed on a commodity GPU. The same detector gives state-of-the-art accuracy and speed in a hand-detection benchmark and competitive results in a face detection benchmark. To demonstrate its effectiveness for Human-Robot Interaction we describe its use as the input to a novel, simple but practical gestural Human-UAV interface for static gesture detection based on hand position relative to the face. A small vocabulary of hand gestures is used to demonstrate our end-to-end pipeline for un-instrumented human-UAV interaction useful for entertainment or industrial applications. All software, training and test data produced for this thesis is released as an Open Source contribution.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Vaughan, Richard
Member of collection