Linguistics, Department of

Receive updates for this collection

On the Difficulty of Defining “Difficult” in Second-Language Vowel Acquisition

Author: 
Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2021-08-05
Abstract: 

Hierarchies of difficulty in second-language (L2) phonology have long played a role in the postulation and evaluation of learning models. In L2 pronunciation teaching, hierarchies are assumed to be helpful in the development of instructional strategies based on anticipated areas of difficulty. This investigation addressed the practicality of defining a pedagogically useful hierarchy of difficulty for English tense and lax close vowels (/i I u ʊ/) produced by Cantonese speakers. Unlike their English counterparts, Cantonese close tense-lax pairs are allophonic variants with [i u] occurring before alveolars and [I ʊ] before velars. Each tense-lax pair represents a “phonemic split” in which members of a single L1 category are realized contrastively in L2. Despite evidence that English tense-lax distinctions are challenging for Cantonese speakers, no previous empirical work has closely considered the problem from the standpoint of vowel intelligibility across multiple phonetic contexts and in different words sharing the same rhyme. In a picture-based word-elicitation task, 18 Cantonese-speaking participants produced 31 high-frequency CV and CVC words. Vowels were evaluated for intelligibility by phonetically-trained judges. A series of mixed-effects binary logistic models were fitted to the scores, with vowel quality, phonetic context (rhyme) and word as factors, and length of Canadian residence and daily use of English as co-variates. As expected, the general hierarchy of difficulty for vowels that emerged (/i/ > /u/ > /ʊ/ > /I/) was complicated by large differences across phonetic contexts. Results were not readily explicable in terms of transfer; moreover, different words with the same rhyme were not produced with equal intelligibility. The most serious modeling complication was the sizeable inter-speaker variability in difficulties, which could not be accounted for by model co-variates. Although some difficulties were roughly systematic at the group level, it is argued that establishing a pedagogically useful hierarchy on such data would prove intractable. Rather, L2 learners might be better served by assessment and instructional targeting of their individual problem areas than by a focus on errors predicted from hierarchies of difficulty.

Document type: 
Article
File(s): 

The Gender Gap Tracker: Using Natural Language Processing To Measure Gender Bias in Media

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2021-01-29
Abstract: 

We examine gender bias in media by tallying the number of men and women quoted in news text, using the Gender Gap Tracker, a software system we developed specifically for this purpose. The Gender Gap Tracker downloads and analyzes the online daily publication of seven English-language Canadian news outlets and enhances the data with multiple layers of linguistic information. We describe the Natural Language Processing technology behind this system, the curation of off-the-shelf tools and resources that we used to build it, and the parts that we developed. We evaluate the system in each language processing task and report errors using real-world examples. Finally, by applying the Tracker to the data, we provide valuable insights about the proportion of people mentioned and quoted, by gender, news organization, and author gender. Data collected between October 1, 2018 and September 30, 2020 shows that, in general, men are quoted about three times as frequently as women. While this proportion varies across news outlets and time intervals, the general pattern is consistent. We believe that, in a world with about 50% women, this should not be the case. Although journalists naturally need to quote newsmakers who are men, they also have a certain amount of control over who they approach as sources. The Gender Gap Tracker relies on the same principles as fitness or goal-setting trackers: By quantifying and measuring regular progress, we hope to motivate news organizations to provide a more diverse set of voices in their reporting.

Document type: 
Article
File(s): 

Evaluation in Political Discourse Addressed to Women: Appraisal Analysis of Cosmopolitan's Coverage of the 2014 US Midterm Elections

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2017-08
Abstract: 

Before the US midterm elections of November 2014, the well-known women’s magazine Cosmopolitan decided to include politics in its contents. The editorial board stated that their aim was to encourage readers to vote and to be engaged with women’s rights advocay in the election process. To that end, Cosmopolitan created a new website, CosmoVotes, with content ranging from discussion of political issues to endorsement of specific candidates who were believed to advance women’s issues. Topics include labour rights, abortion, contraception, health, minimum wage and social equity.

This paper evaluates the discourse of this new section of the Cosmopolitan website, together with readers’ responses, concentrating on evaluative language. In particular, we are concerned with differences between the editorial position and readers’ responses as viewed through the Appraisal framework (Martin & White, 2005), and the role that verbal processes play in the expression of evaluative meanings. The corpus used for the analysis consists of a selection of articles and readers’ opinions from CosmoVotes. The methodology is based on annotation of Appraisal features and processes related to the interpersonal dimension of meaning. Those features reveal how attitudes are evaluated and capture ideological positionings in this discourse. Our results show that CosmoVotes has special characteristics, such as a predominance of high intensification in the readers’ opinions, and strong negative judgements and expressions, while the magazine’s pieces on political issues are more nuanced and eschew intensification.

Document type: 
Article
File(s): 

Big Data and Quality Data for Fake News and Misinformation Detection

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2019-05-23
Abstract: 

Fake news has become an important topic of research in a variety of disciplines including linguistics and computer science. In this paper, we explain how the problem is approached from the perspective of natural language processing, with the goal of building a system to automatically detect misinformation in news. The main challenge in this line of research is collecting quality data, i.e., instances of fake and real news articles on a balanced distribution of topics. We review available datasets and introduce the MisInfoText repository as a contribution of our lab to the community. We make available the full text of the news articles, together with veracity labels previously assigned based on manual assessment of the articles’ truth content. We also perform a topic modelling experiment to elaborate on the gaps and sources of imbalance in currently available datasets to guide future efforts. We appeal to the community to collect more data and to make it available for research purposes.

Document type: 
Article
File(s): 

History of Language Teaching Methods

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2018-07-28
Abstract: 

The earliest European written accounts of language teaching methods are from the 5th century AD, referring specifically to Latin. For centuries the language of the Romans was the primary foreign code throughout much of Europe, functioning as the language of scholarship, trade, and government. The founding of universities in the latter Middle Ages led to developing the Grammar-Translation Method, based on the centuries’ long tradition of reading Latin and Greek learned texts. In the 15th century, Europeans began shifting from Latin to using the continent’s modern languages more widely. By the 19th century, the Direct Method was developed, modeled on first language acquisition and addressing the greater need for speaking skills in e.g. French, German, and English. In the early 20th century, research largely in educational psychology led to developing the Audio-lingual Method in the 1940s. Believing language use was an issue of stimulus and response, teaching methods emphasized repetition and dialogue memorization. A decade later, Chomsky’s landmark research on cognitive aspects of language acquisition recognized that children do not acquire an inventory of linguistic stimuli and responses. Instead, deep processing in the brain enables them to generate sentences they have never heard before. This led to modernizing the Direct Method by incorporating cognitive dimensions of language learning. Since the 1970s, language is further recognized as a social phenomenon that inherently entails expressing, interpreting, and negotiating meaning. To foster such competence, the current approach of Communicative Language Teaching emphasizes having learners do meaningful activities involving the exchange of new information.

Document type: 
Article
File(s): 

Quinlingualism in the Maghreb? English Use in Moroccan Outdoor Advertising

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2019-05-27
Document type: 
Article
File(s): 

RST Signalling Corpus: A Corpus of Signals of Coherence Relations

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2018-03
Abstract: 

We present the RST Signalling Corpus (Das et al. in RST signalling corpus, LDC2015T10. https://catalog.ldc.upenn.edu/LDC2015T102015), a corpus annotated for signals of coherence relations. The corpus is developed over the RST Discourse Treebank (Carlson et al. in RST Discourse Treebank, LDC2002T07. https://catalog.ldc.upenn.edu/LDC2002T072002) which is annotated for coherence relations. In the RST Signalling Corpus, these relations are further annotated with signalling information. The corpus includes annotation not only for discourse markers which are considered to be the most typical (or sometimes the only type of) signals in discourse, but also for a wide array of other signals such as reference, lexical, semantic, syntactic, graphical and genre features as potential indicators of coherence relations. We describe the research underlying the development of the corpus and the annotation process, and provide details of the corpus. We also present the results of an inter-annotator agreement study, illustrating the validity and reproducibility of the annotation. The corpus is available through the Linguistic Data Consortium, and can be used to investigate the psycholinguistic mechanisms behind the interpretation of relations through signalling, and also to develop discourse-specific computational systems such as discourse parsing applications.

Document type: 
Article
File(s): 

On Being Negative

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2017-04-01
Abstract: 

This paper investigates the pragmatic expressions of negative evaluation (negativity) in two corpora: (i) comments posted online in response to newspaper opinion articles; and (ii) online reviews of movies, books and consumer products. We propose a taxonomy of linguistic resources that are deployed in the expression of negativity, with two broad groups at the top level of the taxonomy: resources from the lexicogrammar or from discourse semantics. We propose that rhetorical figures can be considered part of the discourse semantic resources used in the expression of negativity. Using our taxonomy as starting point, we carry out a corpus analysis, and focus on three phenomena: adverb + adjective combinations; rhetorical questions; and rhetorical figures. Although the analysis in this paper is corpus-assisted rather than corpus-driven, the final goal of our research is to make it quantitative, in extracting patterns and resources that can be detected automatically.

Document type: 
Article
File(s): 

The Semantics of Evaluational Adjectives: Perspectives from Natural Semantic Metalanguage and Appraisal

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2017
Abstract: 

We apply the Natural Semantic Metalanguage (NSM) approach (Goddard & Wierzbicka 2014) to the lexical-semantic analysis of English evaluational adjectives and compare the results with the picture developed in the Appraisal Framework (Martin & White 2005). The analysis is corpus-assisted, with examples mainly drawn from film and book reviews, and supported by collocational and statistical information from WordBanks Online. We propose NSM explications for 15 evaluational adjectives, arguing that they fall into five groups, each of which corresponds to a distinct semantic template. The groups can be sketched as follows: “First-person thought-plus-affect”, e.g. wonderful; “Experiential”, e.g. entertaining; “Experiential with bodily reaction”, e.g. gripping; “Lasting impact”, e.g. memorable; “Cognitive evaluation”, e.g. complex, excellent. These groupings and semantic templates are compared with the classifications in the Appraisal Framework’s system of Appreciation. In addition, we are particularly interested in sentiment analysis, the automatic identification of evaluation and subjectivity in text. We discuss the relevance of the two frameworks for sentiment analysis and other language technology applications.

Document type: 
Article
File(s): 

Evaluative Language Beyond Bags of Words: Linguistic Insights and Computational Applications

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2017
Abstract: 

The study of evaluation, affect, and subjectivity is a multidisciplinary enterprise, including sociology, psychology, economics, linguistics, and computer science. A number of excellent computational linguistics and linguistic surveys of the field exist. Most surveys, however, do not bring the two disciplines together to show how methods from linguistics can benefit computational sentiment analysis systems. In this survey, we show how incorporating linguistic insights, discourse information, and other contextual phenomena, in combination with the statistical exploitation of data, can result in an improvement over approaches that take advantage of only one of these perspectives. We first provide a comprehensive introduction to evaluative language from both a linguistic and computational perspective. We then argue that the standard computational definition of the concept of evaluative language neglects the dynamic nature of evaluation, in which the interpretation of a given evaluation depends on linguistic and extra-linguistic contextual factors. We thus propose a dynamic definition that incorporates update functions. The update functions allow for different contextual aspects to be incorporated into the calculation of sentiment for evaluative words or expressions, and can be applied at all levels of discourse. We explore each level and highlight which linguistic aspects contribute to accurate extraction of sentiment. We end the review by outlining what we believe the future directions of sentiment analysis are, and the role that discourse and contextual information need to play.

Document type: 
Article
File(s):