Big Data and quality data for fake news and misinformation detection

Torabi Asr, Fatemeh; Taboada, Maite

doi:10.1177/2053951719843310

Resource type

Article

Date created

2019-05-23

Authors/Contributors

Author: Torabi Asr, Fatemeh

Author: Taboada, Maite

Abstract

Fake news has become an important topic of research in a variety of disciplines including linguistics and computer science. In this paper, we explain how the problem is approached from the perspective of natural language processing, with the goal of building a system to automatically detect misinformation in news. The main challenge in this line of research is collecting quality data, i.e., instances of fake and real news articles on a balanced distribution of topics. We review available datasets and introduce the MisInfoText repository as a contribution of our lab to the community. We make available the full text of the news articles, together with veracity labels previously assigned based on manual assessment of the articles’ truth content. We also perform a topic modelling experiment to elaborate on the gaps and sources of imbalance in currently available datasets to guide future efforts. We appeal to the community to collect more data and to make it available for research purposes.

Keywords

Published as

Torabi Asr, F., & Taboada, M. (2019). Big Data and quality data for fake news and misinformation detection. Big Data & Society. DOI: 10.1177/2053951719843310.

Publication details

Publication title

Big Data Society

Document title

Big Data and quality data for fake news and misinformation detection

Date

2019

Publisher DOI

10.1177/2053951719843310

Rights (standard)