Born, Logan Orion

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2023-11-16

Authors/Contributors

Author: Born, Logan Orion

Abstract

In this thesis, we describe the first-ever large-scale computational analysis of the partially-deciphered proto-Elamite (PE) script. This script was used to write economic accounts which follow a very regular "spreadsheet" structure incorporating many numerals. This sets PE apart from prose corpora which have been considered in prior decipherment work, in ways that both enable and require exploration of new models and methodologies. In close collaboration with domain experts, we provide a thorough survey of this corpus and answer longstanding questions about its content. We describe novel approaches to multi-modal representation learning, which combine visual information from a VAE-inspired encoder with contextual features from a neural language model. We apply these models to evaluate hypotheses about the script's underlying character inventory, which remains very uncertain. By analyzing the representations learned by these models, we also deepen our understanding of the relationships between a set of visually complex signs known as complex graphemes, and discover a strict grammar which appears to govern their construction. We apply a novel variant of the bootstrapping classification algorithm to disambiguate numeric notations with uncertain magnitudes. This enables the first-ever statistical analysis of the corpus's numeric content, and of the relationships between the numeric and linguistic parts of these documents. Given that numeral notations comprise more than half of the attested corpus, this represents a significant advance in our understanding of the script. By applying sequence models to study the internal structure of these documents, we independently replicate claims about a structure called the "header", and adduce new evidence about the size of headers and their distribution across the corpus. In addition to these main results, we also describe a number of small, focused investigations into word order, the presence of affixal morphology, and other minor features of the texts.

Extent

234 pages.

Keywords

Identifier

etd22782

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Sarkar, Anoop

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd22782.pdf	11 MB

Applications of natural language processing to archaeological decipherment: A survey of proto-Elamite

Keywords

Views & downloads - as of June 2023