Dameh, Taher

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2012-06-11

Authors/Contributors

Author (aut): Dameh, Taher

Abstract

We propose a distributed method to compute similarity (also known as kernel and Gram) matrices used in various kernel-based machine learning algorithms. Current methods for computing similarity matrices have quadratic time and space complexities, which make them not scalable to large-scale data sets. To reduce these quadratic complexities, the proposed method first partitions the data into smaller subsets using various families of locality sensitive hashing, including random project and spectral hashing. Then, the method computes the similarity values among points in the smaller subsets to result in approximated similarity matrices. We analytically show that the time and space complexities of the proposed method are subquadratic. We implemented the proposed method using the Message Passing Interface (MPI) framework and ran it on a cluster. Our results with real large-scale data sets show that the proposed method does not significantly impact the accuracy of the computed similarity matrices and it achieves substantial savings in running time and memory requirements.

Keywords

Identifier

etd7248

Copyright statement

Copyright is held by the author.

Permissions

The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact summit-permissions@sfu.ca.

Scholarly level

Graduate student (Masters)

Supervisor or Senior Supervisor

Thesis advisor (ths): Hefeeda, Mohamed

Member of collection

Computing Science Theses

Download file	Size
etd7248_TDameh.pdf	1.01 MB

Distributed Kernel Matrix approximation and implementation using MPI.

Keywords

Views & downloads - as of June 2023