Researching integrated explicit and implicit offensive language taxonomy

Dates: July 5, 2022 to July 10, 2022

Duration: 5 days

Applicant: Giedre Valunaite Oleskeviciene

Venue: Porto, Portugal

Host Institution: Centre of Linguistics of the University of Porto

Host: Purificação Silvano

Involved WGs: WG4

DESCRIPTION

The aim of the STSM was extending the available resources and providing the taxonomy for linguistic processing of offensive language identification working on refining the annotation model of explicit and implicit offensive language tagging by cooperating with the team of researchers from Centre of Linguistics of the University of Porto who are engaged in the scientific research on lexical resource construction, lexicons, text and speech, translation, computational linguistics, etc. The Centre of Linguistics of the University of Porto joined the initiative of creating an integrated model of explicit and implicit offensive language taxonomy and carried out annotation experiments on offensive discourses in selected English social media materials which the research team is working on in Use Case 4.1.1. The aim of the STSM was closely related to the main aim of NexusLinguarum action pointed out in the Memorandum of Understanding, which is construction of multilingual and semantically interoperable linguistic data extending the set of available resources. It was also related to the objectives of work group 4 working on the use cases and focusing especially on offensive language in media texts and work group 2 working towards low-resourced languages.

The aim of the STSM was achieved by analysing the offensive discourses on the publicly available 25 web-based hate speech datasets and performing the semantic annotation experiment in the INCEpTION tool. The steps of the research performed included the following. First, the integrated model of explicit and implicit offensive language taxonomy was applied to perform the semantic annotation on web-based hate speech datasets in the INCEpTION tool. Then, the experiment data was processed refining our annotation categories. The results led to the more refined offensive explicitness and implicitness categorization criteria.