Text Mining the Coffin Texts

Text Mining the Coffin Texts (TM-CT) is a project financed by the Spanish Ministry for Science, Innovation and Universities, which runs at the University of Alcalá (2024-2026).

Objectives

The main objective of the TM-CT project is to link the MORTEXVAR database with the images from the reference publication of the hieroglyphic version of this corpus (CT I-VIII). TM-CT will provide open access to the full database of the Coffin Texts corpus, with - expectedly - all text variants available in transliteration (alphabetic chain) and translation into modern language(s). More specifically, TM-CT will seek to

1. Trace the text variability of the whole corpus through the linking of the material philological information already in place (transliteration, translation, in-document location, geographical and chronological distribution of the texts) with the original publication of the hieroglyphic texts (images).
2. Refine the study of the variations on the different witnesses by using the database to potentially identify textual, dialectal, diachronic and grammar indicators of change.
3. Generate inventories of spellings/signs using the OCR toolkit designed by the subproject OCR-PT-CT.

Approach

Earlier Ancient Egyptian mortuary texts constitute a privileged showcase for the profound ideological and material changes that occurred during the Middle Kingdom, especially between the second half of the Eleventh Dynasty and the first half of the Twelfth Dynasty. A corpus-driven approach appears as the more efficient and reliable method to provide a comprehensive assessment of the complex situation of the period. In such an approach, the corpus is not a mere object under study but also a control group; this, most importantly, implies that the corpus’ assessing value is privileged over any external model. Fundamental to corpus-driven approaches is corpus size and access to it. Earlier Ancient Egyptian mortuary texts are an ideal candidate because they include two large corpora: the authoritative editions of the Pyramid Texts (PT) and Coffin Texts (CT), which extend over six and seven volumes, respectively, plus one volume for the copies of PT on Middle Kingdom coffins. TM-CT will focus on the CT, an important amount of which is now available from the MORTEXVAR database beta version. This dataset will be enriched with the image dataset from the OCR-PT-CT and TTAE projects using a pre-trained artificial neural network based on the YOLO v3 strategy and Natural Language Process techniques.

The resulting database is expected to allow, for the first time, to analyse variability in the complete corpus and on the original text in parallel with the transliteration and translation, plus the material philology metadata, much of them already in place.

Team

One Egyptologist, one engineer, one database manager, and one database assistant will be in charge of the annotation, computer vision, and resulting database. The PI will coordinate, and the collaborators will provide feedback.

PI: Carlos Gracia Zamacona

Egyptologist: Jorke Grotenhuis

Engineer: Víctor Prado Sánchez

Database manager: César Guerra Méndez

Database assistant: Hamza Akoudad Ekajouan

Collaborators

Gersande Eschenbrenner Diemer, Universidad de Jaén: Egyptologist (wood analysis).

David Fuentes Jiménez, Universidad de Alcalá: Engineer (computer vision).

Álvaro Hernández Alonso, Universidad de Alcalá: Engineer (electronic design).

Anne Landborg, Uppsala Universitet: Egyptologist (Coffin Texts).

Leah Mascia, Universität Hamburg: Egyptologist (text materiality).

Antonio J. Morales, Universidad de Alcalá: Egyptologist (Pyramid Texts).

Sira Palazuelos Cagigas, Universidad de Alcalá: Engineer (natural language processing).

Daniel Pizarro Pérez, Universidad de Alcalá: Engineer (computer vision).

ES Financiado por la Unión Europea_PANTONE.jpg