Skip to main content

SFU Opinion and Comments Corpus

Resource type
Date created
The SFU Opinion and Comments Corpus (SOCC) is a corpus for the analysis of online news comments. Our corpus contains comments and the articles from which the comments originated. The articles are all opinion articles, not hard news articles. The corpus is larger than any other currently available comments corpora, and has been collected with attention to preserving reply structures and other metadata. In addition to the raw corpus, we also present annotations for four different phenomena: constructiveness, toxicity, negation and its scope, and appraisal. The data is divided into two main parts: raw data and annotated data. The raw data contains three CSVs: gnm_artcles.csv, gnm_comments.csv, and gnm_comment_threads.csv. The annotated data contains annotations for constructiveness, negation, and appraisal. The details of our different corpora and how to use them are on the following GitHub page. To access this data, please contact
Ethics approval
None required
Member of collection

Views & downloads - as of June 2023

Views: 0
Downloads: 0