Blog Post on the STSM “Relation Acquisition from Large Language Models: Towards the population of LLOD cloud with Deep learning approaches”, by Hugo Gonçalo Oliveira and Lucía Pitarch at Zentrum für Translation Wissenschaft, Vienna

The STSM (Short Term Scientific Mission) “Population of LLOD cloud with Deep learning approaches: Metaphor conceptualization and multilingual lexical relation acquisition” took place in September 2023, at Zentrum für Translationswissenschaft (Austria), under the supervision of Dr. Dagmar Gromman.

The main objective of this STSM was contributing to task 4.3.1: BATS dataset translation. BATS stands for Bigger Analogy Test Set and was created by Gladkova et al, [1] to test analogical reasoning of Language Models. Current researchers have expanded its usage to other tasks such as probing the semantic knowledge encoded in Language Models. In nexus task UC.4.3.1. BATS was translated into over 15 languages providing a resource to explore cross-lingual knowledge in Language Models, which could then be used to populate ontologies. During the STSM several outcomes were produced: finalize the translation, validate the created dataset through relation acquisition and cross-lingual transfer experiments, and prepare the results for its submission to the LREC-COLING 2024 conference. Completing the mentioned tasks contributes to the following Nexus Linguarum tasks: 1.3 Cross-lingual data interlinking, access, and retrieval in LLOD (RDF representation of the dataset); 1.5 Development of the LLOD cloud for under-resourced languages and domains; 2.2 LLOD in Machine Translation (“translation-based analogy”); 3.3. Multilingual approaches, 2.1. Knowledge extraction, 3.2. Deep learning and 4.1.3. multilingual BATS dataset creation.

The future usage of the translated BATS dataset as well as its modeling as structured data, namely as RDF, was also discussed during the STSM, by doing so we aim at its linkage with other resources such as typological databases which might enable the study of other linguistic phenomena as lexical gaps or typological comparisons.

The STSM was a very nice experience to foster research and learn not only from Dr. Dagmar Gromann, Dr. Hugo Gonçalo Oliveira and Lucía Pitarch, but also during the Language Data and Knowledge conference where we could meet many other researchers. Overall, this month was wonderful. We shared discussions around the possibilities of mixing symbolic and neural approaches, we thought about the semantic knowledge encoded in language models, and how we could prove it, we had several brainstorms about different ideas we would love to explore, and we were even able to taste the famous Sacher cake.

[1] Anna Gladkova, Aleksandr Drozd, and Satoshi Matsuoka. Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL Student Research Workshop, 8–15. 2016.