Malicious URL detection is an important task in Internet security intelligence. Existing works rely on inspecting web page content and URL text to determine whether a URL is malicious or not. There are a lot of new malicious URLs emerging on the web every day, which make it inefficient and not scalable to scan URL one by one using traditional methods. In this thesis, we harness the power of big data to detect unknown malicious URLs based on known ones with the help of Internet access logs. Using our method, we can find out not only related malicious URLs, but also possibly infected devices. We also discuss how to scale up our method on huge data sets, up to hundreds of gigabytes in our experiment. Our extensive empirical study using the real data sets from Fortinet, a leader in Internet security industry, shows the effectiveness and efficiency of our method.
Copyright is held by the author.
The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact firstname.lastname@example.org.
Supervisor or Senior Supervisor
Thesis advisor: Pei, Jian
Member of collection