Resource type
Thesis type
(Thesis) M.Sc.
Date created
2007
Authors/Contributors
Author: Zhou, Bin
Abstract
Understanding the general relations of Web pages and their environments is important with a few interesting applications such as Web spam detection. In this thesis, we study the novel problem of page farm mining and its application in link spam detection. A page farm is the set of Web pages contributing to (a major portion of) the PageRank score of a target page. We show that extracting page farms is computationally expensive, and propose heuristic methods. We propose the concept of link spamicity based on page farms to evaluate the degree of a Web page being link spam. Using a real sample of more than 3 million Web pages, we analyze the statistics of page farms. We examine the effectiveness of our spamicity-based link spam detection methods using a newly available real data set of spam pages. The empirical study results strongly indicate that our methods are effective.
Document
Copyright statement
Copyright is held by the author.
Scholarly level
Language
English
Member of collection
Download file | Size |
---|---|
etd2804.pdf | 2.27 MB |