Abusive language dataset annotation workshop

Meeting Dates: 28th September 2021

Venue: Skopje, North Macedonia (& online)

Organizer Institution: Ľ. Štúr Institute of Linguistics at the Slovak Academy of Sciences & Centre for Translation Studies at University of Vienna

Local Organizers: 

Organizing committee: Barbara Lewandowska-Tomaszczyk, Slavko Žitnik, Anna Bączkowska, Giedre Valunaite Oleskevicienė,  Chaya Liebeskind, Jelena Mitrovič, Kristina Despot



Image above by Julian Nyča – no changes. Licence: CC BY-SA 3.0. https://de.wikipedia.org/wiki/Skopje#/media/Datei:Skopje_%E2%80%93_Vardar_River.jpg


NexusLinguarum organized a workshop on Abusive language dataset annotation, 28 September 2021, which took place as a hybrid event in Skopje, North Macedonia.




9:00 – 11:00 Barbara Lewandowska-Tomaszczyk 

Presentation of insights and needs for a new offensive language corpora annotation – new taxonomy.

Annotation guidelines with examples of annotations – each participant will receive printed guidelines. (prepared by Barbara Lewandowska-Tomaszczyk, Slavko Žitnik, Anna Bączkowska, Giedre Valunaite Oleskevicienė,  Chaya Liebeskind, Jelena Mitrovič)

Searching for the translations of the offensive language categories from English to native languages of participants – Google Sheet.


11:00 – 11:30 Coffee break
11:30 – 13:30 Slavko Žitnik

Offensive language corpora overview and preparation of data for the annotation campaign.

General introduction into annotation campaigns (annotation tools, process, annotators, curators, monitoring, inter-rater agreement)

Hands-on tutorial on how to use INCEpTION annotation tool for the task of offensive language annotation – each participant will receive a username and password.

13:30 – 14:30 Lunch break
15:00 – 17:00 Barbara Lewandowska-Tomaszczyk, Anna Bączkowska (online)

Giedre Valunaite Oleskevicienė, Slavko Žitnik

Live annotation of the prepared corpora using the INCEpTION tool.

Offering guidance and support during the annotation, discussing different options for the annotation – may result in annotation guidelines update.

17.00 – 17.30 Coffee break
17.30 – 18.30 Slavko Žitnik

Review of annotation activities during the workshop.

Gathering interest and plan by the interested participants – how much time are they prepared to invest for the annotation after the workshop. Decision on deadlines, collaborative communication means – mailing list.

Barbara Lewandowska-Tomaszczyk

Open discussion and closing.