Parallel corpora, Often exploited for Machine Translation, have recently been used for mono- lingual purposes. Borrowing annotation from resource rich languages into resource-scarce languages is a technique known as Annotation Projection that uses parallel corpora and word alignment to transfer annotations; It has been introduced as an alternative to the tedious and time-consuming task of building hand-annotated corpora for new languages. This technique is especially eﬀective for semantic annotations such as Named Entity, since they are less aﬀected by translation. In this work we test the applicability of annotation projection to NER through two paradigms: One focusing on generating new German data and annotating it using English annotated data and another that focuses on adding new annotations to already existing German text and using them as training features. We accompany machine translation with annotation projection which not only removes the restriction to parallel corpora and expands the methodology but also allows the use of monolingual hand-annotated corpora, relieving the bottleneck of English-side annotations quality. We develop four training corpora by applying the two paradigms on two diﬀerent corpora: parallel and singular. We train an NER model on each corpus for evaluation and compare the model quality with a baseline. The results show that the projected annotations can be noisy and inconsistent. Therefore, using them as target annotations reduces corpus and model quality; Whereas, as features alongside the original annotations they signiﬁcantly improve the quality.
Copyright is held by the author.
The author granted permission for the file to be printed and for the text to be copied and pasted.
Member of collection