| Creators: |
Phan, Thuy Linh and Boyce, James and Xie, Hetiao and Namvar, Morteza and Risius, Marten |
| Title: |
Same Same but Different: Evaluating Hate Speech Detoxification through an LLM-based Agentic Framework |
| Item Type: |
Conference or Workshop Item |
| Event Title: |
(Proceedings of the) 46th International Conference on Information Systems (ICIS) “Achieving Digital Integration in the Age of AI" |
| Event Location: |
Nashville, Tennessee, USA |
| Event Dates: |
December, 14-17, 2025 |
| Projects: |
IDI |
| Type of Paper / Paper No.: |
ICIS2025-2633 (Short Paper) |
| Date: |
December 2025 |
| Divisions: |
Informationsmanagement |
| Abstract (ENG): |
Evaluating the effectiveness of hate speech detoxification is an emerging challenge, particularly as large language models (LLMs) become central to content moderation. While text detoxification (TD) presents a promising alternative to deletion or banning, current evaluation methods remain limited. Human evaluation is costly and inconsistent, and existing automatic metrics often fail to capture social sensitivity. We introduce SAFETD, a Structured Agentic Framework for Evaluation of TD, which simulates three agent roles to assess detoxified outputs from multiple perspectives. Our preliminary analysis reveals four outcome types and identifies a critical risk: the generation of implicit hate speech that appears neutral but retains harmful meaning. These findings expose underexplored trade-offs in TD and limitations in existing evaluation practices. SAFE-TD contributes a scalable, socially grounded approach to evaluating LLM-based TD, offering a foundation for more ethical and nuanced AI development for online safety. |
| Forthcoming: |
No |
| Language: |
English |
| Uncontrolled Keywords: |
Text Detoxification, Large Language Models (LLMs), Multi-Agent Evaluation, Generative AI, Hate Speech Moderation, Ethical AI |
| Citation: |
Phan, Thuy Linh and Boyce, James and Xie, Hetiao and Namvar, Morteza and Risius, Marten
(2025)
Same Same but Different: Evaluating Hate Speech Detoxification through an LLM-based Agentic Framework.
In: (Proceedings of the) 46th International Conference on Information Systems (ICIS) “Achieving Digital Integration in the Age of AI", December, 14-17, 2025, Nashville, Tennessee, USA, Paper ICIS2025-2633 (Short Paper).
|