Skip to main content

Mining page farms and its application in link spam detection

Resource type
Thesis type
(Thesis) M.Sc.
Date created
2007
Authors/Contributors
Author: Zhou, Bin
Abstract
Understanding the general relations of Web pages and their environments is important with a few interesting applications such as Web spam detection. In this thesis, we study the novel problem of page farm mining and its application in link spam detection. A page farm is the set of Web pages contributing to (a major portion of) the PageRank score of a target page. We show that extracting page farms is computationally expensive, and propose heuristic methods. We propose the concept of link spamicity based on page farms to evaluate the degree of a Web page being link spam. Using a real sample of more than 3 million Web pages, we analyze the statistics of page farms. We examine the effectiveness of our spamicity-based link spam detection methods using a newly available real data set of spam pages. The empirical study results strongly indicate that our methods are effective.
Document
Copyright statement
Copyright is held by the author.
Permissions
The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact summit-permissions@sfu.ca.
Scholarly level
Language
English
Member of collection
Download file Size
etd2804.pdf 2.27 MB

Views & downloads - as of June 2023

Views: 0
Downloads: 0