Blog Post on the STSM “Lexical semantic change detection in Latin: a use-case on medical latin”, by Paola Marongiu at King’s College London, United Kingdom
The STSM (Short Term Scientific Mission) “Lexical semantic change detection in Latin: a use-case on medical Latin” took place from May 8, 2023, to May 29, 2023, at King’s College London (UK), under the supervision of Dr. Barbara McGillivray. The main objective of this STSM was to contribute to the work carried out within Working Group 4 of the Action (Lexical Semantic Change Detection), specifically in the use case on Humanities and Social Sciences (4.2.1).
The aim was to provide new insights on the application of word embedding algorithms to historical languages, by evaluating the results on a Latin corpus for the specific semantic field of the Latin medical lexicon.
We chose the lexical field of medicine and anatomy because it already contains several cases of lexical semantic change. For example, the word patella in Latin denotes a small bowl in the non-specialist lexicon, whereas in the specialist lexicon, it refers to a knee-cap. We performed the study on the LatinISE corpus (http://hdl.handle.net/11234/1-2506 ).
The main results of the STSM can be summarised as follows (code, report and supporting documents available at https://github.com/paoma370/Semantic-change-medical-Latin )
- We provided an overview of previous applications of word embedding to historical languages i.e., Latin and Ancient Greek, which could complement the state of the art for UC 4.2.1.
- I prepared a Gold Standard specific for the medical Latin lexicon, which can be used for the evaluation of word embedding algorithms on this specific use-case. The methodology can be replicated to create other GS on different use-cases or other historical languages e.g. Ancient Greek. The Gold Standard contains 25 lexical items which have been described in the literature as words that have specialised their meaning in the medical domain through various types of semantic change (e.g. metaphor for patella).
- The tests on the LatinISE corpus allowed me to find the best combination of parameters to obtain the most accurate results possible. The tests were run both on the entire corpus and on the two subcorpora (medical vs. non-medical).
- I performed a qualitative analysis on the list of 25 words. Although the limited size of the corpus of medical texts and the rarity of medical words represent a limitation for the study, the analysis of the closest neighbours reveals that for some words the algorithm is able to capture relevant semantic shifts. An example is the word malum ‘a bad thing’, which acquires in medical context the meaning ‘disease’. Among the closest neighbours of malum in the medical subcorpus appears sanesco ‘to recover’, which points to the semantic shift of malum towards the notion of ‘disease’.
The results of this STSM were presented at the Digital Classicist Summer Seminar, and the recording is available on YouTube https://www.youtube.com/watch?v=wDAMEVHKBmA
The period at King’s College London was an extremely enriching experience. It allowed me to work closely with two members of the COST Action, Dr. Barbara McGillivray and Dr. Fahad Khan. In this context, Dr. Khan, Dr. McGillivray and I were able to start discussing a future collaboration on a proposal for modelling lexical semantic change in ontologies, developing work that has already been carried out in the context of Nexus Use Case 4.2.1.