Dates: 2021-09-06 to 2021-10-15
Duration: 6 weeks
Applicant: David Lindemann
Venue: Athens, Greece (held as a Virtual Mobility grant)
Host Institution: Institute for Language and Speech Processing (ILSP)
Host: Penny Labropoulou
Involved WGs: WG1
The objective of this VM grant was to prepare the ground for the integration of metadata for printed and electronic dictionaries in LexBib, a digital bibliography and Knowledge Graph project for the domain of Lexicography and Dictionary Research, which currently stores metadata for lexicography-related publications. To this end, we worked on the definition of a Dictionary Metadata (DM) model, re-using and extending relevant vocabularies. Our main sources are:
- the META-SHARE ontology (MS-OWL), an RDF vocabulary for the description of Language Resources, including Lexical resources (LR), until today mainly used for Natural Language Processing and Language Technology LR;
- the LexVoc vocabulary for lexicographic terms;
- the FRBR model and BIBO ontology for bibliographic citations.
We have determined the core classes of the new DM model, defining correspondences between FRBR and MetaShare. We have concluded that frbr:Expression is equivalent to ms:LexicalConceptualResource and frbr:Manifestation to ms:DatasetDistribution.
Combining the two approaches together may pose problems, as library catalogues include records for manifestations/distributions where all metadata categories are included in a flat list, while catalogues of datasets consist of records for expressions/resources, under which manifestations/distributions are represented, each with their own set of properties. On the other hand, the two-level structure of the datasets approach is well suited for our model, given that there are properties, such as type of dictionary, microstructure contents, etc., that describe the resource, irrespective of its physical representation(s). At the same time, dictionary metadata as needed for citation (BIBO) describe the manifestation, irrespective of whether other manifestations embody the same lexicographical expression.
We have organized specific properties and values in a new structure. Parts of the LexVoc vocabulary, developed by the grantee, will be re-used, as well as properties defined in the ontology underlying LexBib. Properties describing dictionary content (attached at resource level) point to items in a range defined according to LexVoc top-level concept or “facet”. For example, the “dictionary scope” property points to items defined as narrowers of top-level concept (facet) “dictionary scope”.
We have discussed on mapping properties appearing in the MS-OWL, FRBR and BIBO entity schemata to each other, and decided on the class to attach them, i.e. domain Resource (expression) vs. Distribution (manifestation). For most properties, this was straightforward. For properties used for bibliographic citations, though, the two-level structure poses problems. We have discussed that in more detail, looking at identifier, distribution medium, format and access URL, that are potential sources of conflict. We have also proposed mappings of controlled vocabulary terms, both in MS-OWL and LexVoc, and we have created a se of items modeled according to our new DM model in LexBib wikibase (see on LexBib).
We are now preparing a report to be published on LexBib, a detailed documentation of our new DM model, and a dedicated full paper, for dissemination in the lexicographic community.
[David Lindemann at Wikidata: http://www.wikidata.org/entity/Q57694630.]
The image shows the buildings hosting the Computer Science faculty in Madrid and AthenaRC in Athens. Nexus Linguarum does not sleep even during the dog days in the hot summers of Madrid and Athens