Legal and Ethical Issues in Human Language Technologies

full-day Workshop at LREC 2022, Marseille, June 24, 2022

merged with the Workshop on Multilingual de-Identification of (sensitive) LRs

The workshop will be expected to be on-site, but a hybrid participation is possible


The proceedings of the workshop can be browsed here.

The PDF proceedings are available here and the BibTex entries are available under this link.

About the Workshop

LEGAL 2022

In recent years the use of deep learning technologies for language resources on the one hand and the demand for high quality data interactions on the other raises the need for data collections of interaction resources.

These data, despite their intangible nature, may be subject to legal constraints which need to be addressed in order to guarantee lawful access to the data. The legal framework takes time to reform, and therefore law is always lagging behind the technology. In recent years, considerable efforts have been made to adapt the legal framework to the advancements in technology while taking into account the interests of various stakeholders.

From the technological perspective the strict consideration of legal aspects imposes further questions besides pure recording technology and participant consent. How can identifying information be removed or anonymized, how reliable are predictions/models based on these data? What is the IPR status of models derived from language data, and is the very process of deriving such models within the ambit of IPR? Which impact does this have on the usability, computational costs?

The purpose of this workshop will be the attempt to build bridges between technology and legal experts and discuss current legal and ethical issues in the human language technology sector.

What to submit?

1500-2000 words extended abstracts (by 8 April 2022) are needed at first for submission. The full papers will be published as workshop proceedings along with the LREC main conference by ELRA. For these, the instructions of LREC need to be followed.

Topics of interest include:

Privacy and data protection
Ethics in multi-modal, sensorial data collection
Ethics in annotation (crowd-sourced) of private data
Copyright and other intellectual property rights
Public Sector Information
Open Science Policies
Industry 4.0 and Language Resources Technologies
Legal and technical Issues around anonymisation & pseudonomization
Transparency and explicability in Machine Learning
Ethics in Affective, Behavioral, and Social Computing
Nudging and manipulation with machines
Human-machine interaction for vulnerable populations
Freedom of information

Important Dates

LEGAL2022 Workshop Day: 24 June 2022

14 April 2022

Deadline for submission of extended abstracts

3 May 2022

Notification of acceptance

9 Mai 2022

Deadline for early bird registration

20 Mai 2022

Submission of final version of accepted papers


09:00 - 09:15 Opening Session: Introduction by Joint Workshop Chairs
09:15 - 10:10 Introduction Talk Major developments in the legal framework concerning language resources

Pawel Kamocki

10:10 - 10:30 Session A: COVID Issues and Policy Amendments
Sentiment Analysis and Topic Modeling for Public Perceptions of Air Travel: COVID Issues and Policy Amendments Avery Field, Aparna Varde and Pankaj Lal, virtual
10:30 - 11:00 Coffee Break
11:00 - 12:00 Session B: GDPR and Legal Aspects
Data Protection, Privacy and US Regulation Denise DiPersio
Pseudonymisation of Speech Data as an Alternative Approach to GDPR Compliance Pawel Kamocki and Ingo Siegert
Categorizing legal features in a metadata-oriented task: defining the conditions of use Mickaël Rigault, Victoria Arranz, Valérie Mapelli, Penny Labropoulou and Stelios Piperidis
12:00 - 13:00 Session C1: Data Protection: Anonymisation, De-Identification and Legal Aspects
About Migration Flows and Sentiment Analysis on Twitter data: building a bridge between technical and legal data protection approaches Thilo Gottschalk and Francesca Pichierri, virtual
Transparency and Explainability of a Machine Learning Model in the Context of Human Resource Management Sebastien Delecraz, Loukman Eltarr and Olivier Oullier
Public Interactions with Voice Assistants – Discussion of Different One-Shot Solutions to Preserve Speaker Privacy Ingo Siegert, Yamini Sinha, Gino Winkelmann, Oliver Jokisch and Andreas Wendemuth
13:00 - 14:00 Lunch Break
14:00 - 15:00 Keynote Keynote: Voice anonymisation and the GDPR

Brij Mohan Lal Srivastava Inria Nancy - Grand Est

15:00 - 16:00 Session C2: Data Protection: Anonymisation, De-Identification and Legal Aspects
Cross-Clinic De-Identification of Swedish Electronic Health Records: Nuances and Caveats Olle Bridal, Thomas Vakili and Marina Santini
Generating Realistic Synthetic Curricula Vitae for Machine Learning Applications under Differential Privacy Andrea Bruera, Francesco Aldà and Francesco Di Cerbo
MAPA Project: Ready-to-Go Open-Source Datasets and Deep Learning Technology to Remove Identifying Information from Text Documents Victoria Arranz, Khalid Choukri, Montse Cuadros, Aitor García Pablos, Lucie Gianola, Cyril Grouin, Manuel Herranz, Patrick Paroubek and Pierre Zweigenbaum
16:00 - 16:30 Coffee Break
16:30 - 17:30 Session D: Privacy and Ethical Challenges in Data
PriPA: A Tool for Privacy-Preserving Analytics of Linguistic Data Jeremie Clos, Emma McClaughlin, Pepita Barnard, Elena Nichele, Dawn Knight, Derek McAuley and Svenja Adolphs
Legal and Ethical Challenges in Recording Air Traffic Control Speech Mickaël Rigault, Claudia Cevenini, Khalid Choukri, Martin Kocour, Karel Veselý, Igor Szoke, Petr Motlicek, Juan Pablo Zuluaga-Gomez, Alexander Blatt, Dietrich Klakow, Allan Tart, Pavel Kolčárek and Jan Černocký
It is not Dance, is Data: Gearing Ethical Circulation of Intangible Cultural Heritage practices in the Digital Space Jorge Yánez and Amel Fraisse, virtual
17:30 - 18:00 Closing Ceremony
18:00 After Workshop on-site gathering

Organizers and Contact of the LEGAL Workshop:

Ingo Siegert, Otto-von-Guericke-Universität Magdeburg, Germany

Khalid Choukri, ELRA/ELDA, France

Mickaël Rigault, ELRA/ELDA, France

Pawel Kamocki, IDS Mannheim, Germany

Andreas Witt, IDS Mannheim, Germany

Krister Linden, University of Helsinki, Finland

Claudia Cevenini, University of Bologna, Italy

Organizers and Contact of the Multilingual de-Identification Workshop:

Victoria Arranz (ELDA/ELRA, France)

Montse Cuadros (Vicomtech, Spain)

Aitor Garcia Pablos (Vicomtech, Spain)

Cyril Grouin (LISN-CNRS, France)

Manuel Herranz (Pangeanic, Spain)