Constructions of high-performance face recognition pipeline and embedded deep learning framework

Date created: 
Face Recognition
Deep Learning
Convolutional Neural Network
CNN Hardware Implementation
Receptive Field
Discriminative Features Learning

Face recognition has been very popular in many research and commercial studies. Due to the uniqueness of human faces, a robust face recognition system can be an alternative to biometrics such as the fingerprint or eye iris recognition in security systems. Recent development in deep learning contributed to many of the success in solving difficult computer vision tasks, including face recognition. In this thesis, a thorough study is presented to walk through the construction of a robust face recognition pipeline and to evaluate the components in each stage of the pipeline. The pipeline consists of four components, face detection module, face alignment module, metric space face feature extraction module, and feature identification module. Different implementations of each module are presented and compared. The performance of each implementation of the system is evaluated on multiple datasets. The combination of a coarse-to-fine convolutional neural network (CNN) based face detection, geometric-based face alignment and discriminative features learning with additive angular margin method are found to achieve the highest accuracies in all datasets. One drawback of this face recognition pipeline is that it consumes a lot of computational resources, making it hard to be deployed on embedded hardware. It would be beneficial to develop a method that allows advanced deep learning algorithms to be run on resource-limited hardware, such that many of the existing devices can become intelligent with low cost. In this thesis, a novel lapped CNN (LCNN) architecture that is suitable for resource-limited embedded systems is developed. The LCNN uses a divide-and-conquer approach to apply convolution to a high-resolution image on embedded hardware. The LCNN first applies convolution to sub-patches of the image, then merges the resulting outputs to form the actual convolution. The resulting output is identical to that of applying a larger-scale convolution to the entire high-resolution image, except that the convolution operations on the sub-patches can be processed sequentially or parallelly by resource-limited hardware.

Document type: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
Jie Liang
Applied Sciences: School of Engineering Science
Thesis type: 
(Thesis) M.A.Sc.