Taxonomy and Annotation of Offensive Language Workshop,
Jerusalem, May 23, 2022
On the 23rd of May, a workshop on Taxonomy and Annotation of Offensive Language: Implicitness was organized by Use Case 4.1.1 (Barbara Lewandowska-Tomaszczyk, Slavko Žitnik, Anna Bączkowska, Giedrė Valūnaitė Oleškevičienė, Chaya Liebeskind, Kristina Despot, Ana Ostroški Anić), within the framework of COST Action 18209 NexusLinguarum. The workshop was a hybrid event with the on-site part hosted at the Jerusalem College of Technology, Israel. It attracted close to 30 participants both from WG4 and other NexusLinguarum WGs.
Figure 1: Jerusalem, May 2022.
Figure 2: Jerusalem College of Technology.
Figure 3: Jerusalem College of Technology yard.
The first part was devoted to a presentation and discussion of the curation results of the research on explicit offense annotation data. Barbara and Slavko summarized the results of the annotation campaign of explicit forms of offensiveness on social media, which followed the previous workshop held in September 2021 in Skopje, Macedonia. The group annotated 500 text documents. Each document was annotated by two annotators and reviewed by a curator. The inter-rater agreement scores showed that the annotators mostly annotated different parts of offensive language, resulting in a rather low inter-annotator agreement and a high agreement between one of the annotators and the curator. We concluded the discussion with the following action plan: (a) to adapt and clarify the offensive category definitions and annotation guidelines, and (b) to plan alternatives for a more comprehensive annotation campaign of explicit offensive language.
A short overview of existing approaches to implicitness in NLP research was given by Kristina and Ana, who focused on different definitions of implicit offensive language and discussed two latest typologies of implicit offensive language. They reviewed the typologies and the definitions and proposed an inclusive taxonomy of implicit offensive language categories that might be used for machine recognition.
Figure 4:Research on implicitness review.
After their presentation, the speakers analyzed the sentences that had been marked as implicitly offensive in our dataset and categorized them according to the types of implicit offense as well as to the applied types of figurative expressions and markers. The discussion of several examples from the dataset served as an introduction into the presentation of the annotation taxonomy of implicit offensive language developed for further, fine-grained annotation of implicit offense.
The presentation of a taxonomy of implicit offensive language was given by Anna, showing also tagged examples of implicit offensive language which covered various implicit forms of offensive language, such as metaphors, irony, understatements and overstatements as well as rhetorical questions and similes and some conflated forms, for example metaphorical irony. Along with these main categories of implicit offence proposed by the workshop organisers, some additional categories (‘aspects’) were also posited, added to the main ones at a fine-grained level, and these included vulgarisms, hateful speech, threats, racist language, etc. The participants of the workshop, both online and on-site, had a chance to apply their skills as annotators to authentic texts prepared by the workshop organisers. The sample texts were retrieved from social media, mainly from blogs and Twitter with Slavko presenting the INCEpTION annotation software used in the UC annotation activities and proposed for a new annotation campaign planned for the implicit offensive language in social media. All the participants were discussing the proposed annotations of selected documents, which served as a fruitful resource of how annotators would understand the task along with the categories. The selected examples were rich in implicit offensive language semantics, and their final annotations were accepted by a quorum.
The concluding recommendations with reference to the annotator guidelines along with the taxonomy for implicit offensive language indicated their possible modifications to achieve higher annotator task explicitness and clarity.
Official website of the workshop: