Blog Post on the STSM “Domain Labels in Linked Lexicographic Resources” by Fahad Khan at Lisbon, Portugal
The name of my short term scientific mission was ‘Domain Labels in Linked Lexicographic Resources’. It took place in the final week of March, between 20/03/2023 and 30/03/2023, in the beautiful, sunny city of Lisbon, and included a one day visit to the historic university town of Coimbra.
During my STSM I was hosted by Professor Rute Costa at the Linguistics Research Centre of NOVA University Lisbon (CLUNL). My plan was to work with Professor Costa and her team of experts on the topic of modelling domain labels in computational lexicographic resources and in particular in linked data lexicographic resources. Domain labels are usage labels that are frequently found in dictionaries (and other kinds of lexicographic resources) to mark whole lexical entries or specific senses as belonging to a specialised domain of discourse with the understanding that these are not necessarily part of everyday linguistic usage. These domains can themselves be organised in taxonomies, something that makes lexicographic resources easier to navigate and query (as the PhD thesis of one of my CLUNL colleagues, Ana Salgado, has recently demonstrated) and which would also seem to make them good candidates for publication on the Semantic Web using standards such as SKOS or OWL together with OntoLex-Lemon. Indeed, the core aim of my STSM was to see how domain labels could be represented as linked lexical resources using these standards via the modelling of numerous real-life dictionary entries, something which places this work in the range of topics dealt with by WG1 of Nexus Linguarum.
I collaborated with Professor Costa’s team on the conversion of dictionay entries in TEI-XML into OntoLex-Lemon making use of a XSLT stylesheet developed by John McCrae and Laurent Romary, and modifying the resulting output in light of the very fruitful discussions which I had with my CLUNL colleagues. This joint work has become the basis for a draft set of guidelines for how to encode domain labels in linked data lexical resources in RDF which can be found here (https://github.com/
Moreover, these draft guidelines are intended to contribute to a forthcoming Nexus Linguarum deliverable Deliverable D1.2 on the production of such materials, guidelines and best practises, for a number of important linguistic linked data tasks. Professor Costa also very kindly invited me to give a talk at NOVA FCSH on modelling dictionaries as complex objects using top level ontologies, a broader topic than that of my STSM but from which it derives a more general context.
It was also my great pleasure to give a presentation at the University of Coimbra entitled ‘Unlocking Dictionaries with Ontologies’ with the subtitle of ‘How to Make Lexicographic Resources More Accessible to Computers by Adding More Semantics’ at the invitation of two professors of that august institution (the University of Coimbra is the oldest university in Portugal and one of the oldest in the world), namely, Hugo Gonçalo Oliveira and Manuel Portela. All in all, my visit to Portugal was a great success and especially from the scientific point of view. I was able to partake in many interesting discussions on leveraging Semantic Web technologies and standards in order to make lexicographic resources more accessible to users, both scholarly and non-scholarly ones.
My one regret is that I didn’t practice my Portuguese enough, but that (along with my colleagues and the pasteis de nata) is a wonderful excuse to go back.