The main aim of this Action will be to promote synergies across Europe between linguists, computer scientists, terminologists, language professionals, and other stakeholders in industry and society, in order to investigate and extend the area of linguistic data science.

The Action pays special attention to the multilingual dimension of data as a key aspect to support cross-lingual and cross-cultural study and applications of linguistic data science across Europe and worldwide. The objectives of the action are defined with respect to advancements in the state of the art, research coordination, and capacity building.

Advancements in the State of the Art

NexusLinguarum will provide significant progress beyond the state of the art in several aspects:
  • Promotion of open linked data-based models for linguistic data such as Ontolex-lemon to de facto, community and official standards
  • Development of extensions of current models to support domains dealing with diachronic and social language variation
  • Discussion, validation and consolidation of existent best practices on Linguistic Linked Open Data (LLOD)
  • Definition of clear pipelines to convert linguistic data and language resources (LRs) into LLOD
  • Implementation of community-supported benchmarking and quality assessment procedures for the continuous monitoring of LLOD quality.
  • Development of novel collaborative methodologies for the creation, linkage and improvement of LRs along their lifecycle in a cost-effective fashion
  • Development of extenstions to methods based on machine learning/deep learning for the discovery of linguistic data features in large amounts of various types of multillingual textual data
These advancements will contribute to breaking language barriers in Europe and worldwide, which in turn:
  • benefits the Digital Single Market,
  • benefits cross-border e-commerce,
  • benefits Public Sector Information (PSI) across Europe
  • supports interdisciplinary research in multilingual challenges
  • increases cross-border cultural exchange, and
  • enhances support for minority languages.

Research coordination

To achieve the main objective described at the top of this section, the following specific objectives shall be accomplished:
  • To propose, agree upon and disseminate best practices and standards for linking data and services across languages, which involves
    • the quality of language resources,
    • the generation, publication, and discovery of LLOD,
    • cross-lingual linking of linguistic data,
    • LLOD lifecycle value chains, and
    • the analysis of linguistic data at a large scale.
  • To organize activities to foster collaboration and communication across communities, such as scientific workshops involving broader Pan-European communities to reach agreement on best practices; and standardization activities in the context of W3C, OKFN, and the International Organization for Standardization (ISO).
  • To collect and analyze relevant use cases for linguistic data science and to develop prototypes and demonstrators that will address some prototypical cases. The Action’s community-building mechanisms will serve to support this, along with the organization of technical hackathons and datathons.
  • To work out a curriculum for a Europe-wide master degree that the participating institutions could adopt to train a new generation of researchers in the area, thus introducing linguistic data science in a crossdiscipline
    academic infrastructure.

Capacity Building

  • To include experts from COST Member Countries, Near Neighbour Countries (NNC) and International Partner Countries (IPC) to consolidate the network of experts and to maximise the language diversity covered by the consortium.
  • To support Short-Term Scientific Missions (STSMs), of early career researchers from a variety of knowledge areas to foster collaboration and to learn new methods and techniques not available at their institutions.
  • To participate in and collaborate with international fora and organisations relevant for the targets of the Action (e.g., META-NET, CLARIN, ELRA, BDVA, W3C, ISO, DARIAH) and collaborate with new projects and initiatives (e.g., European Language Grid, Prêt-á-LLOD, ELEXIS).
  • To organise workshops, meetings and dissemination activities in order to stimulate knowledge- sharing
    across national boundaries and among experts, educators, policy-makers, industry members, and civil society in general.
  • To explicitly address numerous low-resource and minority languages and communities in Europe and outside Europe. To that end experts on such languages will be invited to join the Action activities.