Despite the many great advantages of social media and online forums in bringing people, communities, and groups together, other problems have emerged when using these sites including hate speech and abusive behavior online. Unfortunately, these platforms can be used as spaces to bully, harass, assault or even plan to carry on a kinetic action against others. Most of the data that comes from these sources is noisy, unstructured and unlabeled, which makes designing supervised classifiers a task that requires a lot of human effort for labeling and going through the data to determine the severity of toxicity in it. Also, the human toll of working with this data may include negative psychological effects on the person after reading a potentially large amount of data. For these reasons, our goal is to provide a framework to help perform an exploration of such unstructured data to be able to determine the important topics, features, sentiment, and entities involved without the need to manually read all the text, including providing the capability for the automatic redaction of toxic terminology. The net result would be an improved environment and exposure for people that need to analyze this data to explore these documents and identify documents of interest in a less harmful way. We use different state-of-the-art natural language and machine learning techniques to design a pipeline that takes in unstructured noisy data and converts it into actionable structured data that incorporates visualization. We also design a simple and modifiable scoring scheme that combines all the features of the multidimensional analysis and returns a score that can be used as a filtering metric to perform information retrieval on the documents, thus prioritizing those that require human intervention. We then provide an evaluation of the resulting system that incorporates a range of objective and subjective criteria.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Popowich, Fred
Member of collection