Aspect-based opinion mining in online reviews

Date created: 
2013-03-22
Identifier: 
etd7688
Keywords: 
Opinion Mining
Aspect Extraction
Rating Prediction
Topic Modeling
Abstract: 

Other people's opinions are important piece of information for making informed decisions. Today the Web has become an excellent source of consumer opinions. However, as the volume of opinionated text is growing rapidly, it is getting impossible for users to read all reviews to make a good decision. Reading different and possibly even contradictory opinions written by different reviewers even make them more confused. In the same way, monitoring consumer opinions is getting harder for the manufactures and providers. These needs have inspired a new line of research on mining customer reviews, or opinion mining. Aspect-based opinion mining, is a relatively new sub-problem that attracted a great deal of attention in the last few years. Extracted aspects and estimated ratings clearly provides more detailed information for users to make decisions and for suppliers to monitor their consumers. In this thesis, we address the problem of aspect-based opinion mining and seek novel methods to improve limitations and weaknesses of current techniques. We first propose a method, called Opinion Digger, that takes advantages of syntactic patterns to improve the accuracy of frequency-based technique. We then move on to model-based approaches and propose an LDA-based model, called ILDA, to jointly extract aspects and estimate their ratings. In our next work, we compare ILDA with a series of increasingly sophisticated LDA models representing the essence of the major published methods in the literature. A comprehensive evaluation of these models indicates that while ILDA works best for items with large number of reviews, it performs poorly when the size of the training dataset is small, i.e., for cold start items. The cold start problem is critical as in real-life data sets around 90% of items are cold start. We address this problem in our last work and propose a LDA-based model, called FLDA. It models items and reviewers by a set of latent factors and learns them using reviews of an item category. Experimental results on real life data sets show that FLDA achieve significant gain for cold start items compared to the state-of-the-art models.

Document type: 
Thesis
Rights: 
Copyright remains with the author. The author granted permission for the file to be printed and for the text to be copied and pasted.
File(s): 
Senior supervisor: 
Martin Ester
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.
Statistics: