Dates: February 22, 2020 to March 5, 2020
Duration: 12 days
Applicant: Giedre Valunaite Oleskeviciene
Venue: Jerusalem, Israel
Host Institution: Jerusalem College of Technology
Host: Dr. Chaya Liebeskind
Involved WGs: WG1, WG4
DESCRIPTION
Giedre Valunaite Oleskeviciene from Mykolas Romeris University carried out a STSM (Creating a multilingual corpus for formulaic language (multiword expressions) research) at Jerusalem College of Technology working together with Chaya Liebeskind, February 22 – March 5, 2020
The purpose of the STSM was extending the available resources and providing linguistic processing for several languages by creating a multilingual parallel corpus (including English Lithuanian and Hebrew) based on social media texts and working on multiword expressions in social media texts. First the parallel texts in English, Lithuanian and Hebrew were extracted from TED talks transcripts and then the sentences were aligned to make parallel corpus for further research. The corpus „TED-ELH Parallel Corpus“ contains 87230 aligned sentences and it is published in LINDAT/CLARIN-LT repository http://hdl.handle.net/20.500.11821/34. Then further, we focused on multiword expressions and narrowed our research focusing on multiword expressions which are used as discourse markers to ensure textual cohesion. We are expecting to filter, classify and analyze the translations of the multiword expressions used as discourse markers and prepare a publication.
LOCATIONS
College of Technology, Jerusalem, Israel