The main aim of NexusLinguarum is to promote synergies across Europe between linguists, computer scientists, terminologists, and other stakeholders in industry and society, in order to investigate and extend the area of linguistic data science.


We understand linguistic data science as a subfield of the emerging “data science”, which focuses on the systematic analysis and study of the structure and properties of data at a large scale, along with methods and techniques to extract new knowledge and insights from it. Linguistic data science is a specific case, which is concerned with providing a formal basis to the analysis, representation, integration and exploitation of language data (syntax, morphology, lexicon, etc.). In fact, the specificities of linguistic data are an aspect largely unexplored so far in a big data context.


The activities of the Action aim to foster collaboration and knowledge-sharing between the Action members, and include Short-Term Scientific Missions (STSMs), WG meetings, conferences and workshops, training schools, and other dissemination events.

Second Plenary

The second plenary meeting of NexusLinguarum took place 26-27 October 2020 in Lisbon, at the Universidade Nova de Lisboa. The meeting took place in a hybrid setting, as many participants…

Read more


The results of the Action include reports, publications, pointers to relevant systems and resources, as well as collaborations and bridges to related initiatives.


NexusLinguarum is composed of five working groups (WGs 1-5), interoperating and providing mutual feedback between themselves. The core group is responsible for the coordination and management of the whole network, and for the dissemination of its results.


The results of the Action include reports, publications, pointers to relevant systems and resources, as well as collaborations and bridges to related initiatives.

7 documents
  • Giedre Valunaite Oleskeviciene, Chaya Liebeskind. (June, 2021). Multiword expressions as discourse markers in Hebrew and Lithuanian. Zenodo.
  • Christian Chiarcos, Tomas Mikolov et al.. (January, 2021). Lemmatized English Word2Vec data (Version 2020-06-03). Zenodo.
  • Jorge Gracia, Christian Fäth, Matthias Hartung, Max Ionov, Julia Bosque-Gil, Susana Veríssimo, Christian Chiarcos, Matthias Orlikowski. (November, 2020). Leveraging Linguistic Linked Data for Cross-Lingual Model Transfer in the Pharmaceutical Domain (Version pre-published version). Zenodo.
  • Aivaras Rokas, Sigita Rackevičienė, Andrius Utka. (September, 2020). Automatic Extraction of Lithuanian Cybersecurity Terms Using Deep Learning Approaches. Zenodo.
  • Ionov, Maxim, McCrae, John, Chiarcos, Christian, Declerck, Thierry, Bosque-Gil, Julia, Gracia, Jorge. (May, 2020). Proceedings of the LREC 2020 7th Workshop on Linked Data in Linguistics (Version 1.0). Zenodo.
  • Declerck, Thierry, McCrae, John, Hartung, Matthias, Gracia, Jorge, Chiarcos, Christian, Montiel, Elena, Cimiano, Philipp, Revenko, Artem, Lee, Deidre, Racioppa, Stefania, Nasir, Jamal, Orlikowski, Matthias, Lanau-Coronas, Marta, Fäth, Christian, Rico, Mariano, Elahi, Mohammad Fazleh, Khvalchik, Maria, Sauri, Roser, Gonzalez, Meritxell, Katharine Cooney. (May, 2020). Recent Developments for the Linguistic Linked Open Data Infrastructure (Version 2.0). Zenodo.
  • Declerck, Thierry. (May, 2020). Towards an Extension of the Linking of the Open Dutch WordNet with Dutch Lexicographic Resources (Version 1). Zenodo.