Dates:  February 22, 2020 to  March 5, 2020

Duration: 12 days

Applicant: Giedre Valunaite Oleskeviciene

Venue: Jerusalem, Israel

Host Institution: Jerusalem College of Technology

Host: Dr. Chaya Liebeskind 

Involved WGs: WG1, WG4


Giedre Valunaite Oleskeviciene from Mykolas Romeris University carried out a STSM (Creating a multilingual corpus for formulaic language (multiword expressions) research) at Jerusalem College of Technology working together with Chaya Liebeskind, February 22 – March 5, 2020

The purpose of the STSM was extending the available resources and providing linguistic processing for several languages by creating a multilingual parallel corpus (including English Lithuanian and Hebrew) based on social media texts and working on multiword expressions in social media texts. First the parallel texts in English, Lithuanian and Hebrew were extracted from TED talks transcripts and then the sentences were aligned to make parallel corpus for further research. The corpus „TED-ELH Parallel Corpus“ contains 87230 aligned sentences and it is published in LINDAT/CLARIN-LT repository Then further, we focused on multiword expressions and narrowed our research focusing on multiword expressions which are used as discourse markers to ensure textual cohesion. We are expecting to filter, classify and analyze the translations of the multiword expressions used as discourse markers and prepare a publication.


College of Technology, Jerusalem, Israel