Decipherment of Evasive or Encrypted Offensive Text

Author: 
Date created: 
2016-07-20
Identifier: 
etd9692
Keywords: 
Natural Language Processing
Decipherment
Spelling Correction
Malicious Words Filtering
Abstract: 

Automated filters are commonly used in on line chat to stop users from sending malicious messages such as age-inappropriate language, bullying, and asking users to expose personal information. Rule based filtering systems are the most common way to deal with this problem but people invent increasingly subtle ways to disguise their malicious messages to bypass such filtering systems. Machine learning classifiers can also be used to identify and filter malicious messages, but such classifiers rely on training data that rapidly becomes out of date and new forms of malicious text cannot be classified accurately. In this thesis, we model the disguised messages as a cipher and apply automatic decipherment techniques to decrypt corrupted malicious text back into plain text which can be then filtered using rules or a classifier. We provide experimental results on three different data sets and show that decipherment is an effective tool for this task.

Document type: 
Thesis
Rights: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
File(s): 
Senior supervisor: 
Anoop Sarkar
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.
Statistics: