Dates:  August 30, 2021 to  September 9, 2021

Duration: 11 days

Applicant: Kostadin Mishev

Venue: Sofia, Bulgaria

Host Institution: Mozaika, Ltd,

Host: Dr Mariana Damova

Involved WGs: WG4


Kostadin Mishev, from Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje carried out an STSM at Mozaica Ltd, Sofia, working together with Dr. Mariana Damova. The aim was to exchange experience and know-how on using Machine Learning (ML) methods, such as contextual word embeddings from transformer models and distributed word representations for detecting and interpreting language phenomena in texts from survey data, TED talks, and social media in English, Lithuanian, Bulgarian, German, Macedonian and Portuguese.

During the STSM, the grantee had an opportunity to meet and interact, apart from Dr. Mariana Damova, with several senior researchers from the host institution.

The outcome of the STSM was providing various methodologies based on Natural Language Processing to evaluate language-agnostic and cross-lingual methods on Discourse Markers detection in multi-lingual datasets. In addition, methods from eXplainable AI were used to explain the model decisions when identifying the discourse markers in the sentence.

The main contribution of the STSM to the scientific objectives of NexusLinguarum Action was providing a framework for linguistic data science that addresses some prototypical cases of interpretation of language phenomena with social significance using the latest advents in semantic representation, transfer-learning and eXplainable AI.