Papers
arxiv:2402.11512

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

Published on Feb 18, 2024
Authors:
,
,
,
,
,

Abstract

Embeddings play a pivotal role in the efficacy of Large Language Models. They are the bedrock on which these models grasp contextual relationships and foster a more nuanced understanding of language and consequently perform remarkably on a plethora of complex tasks that require a fundamental understanding of human language. Given that these embeddings themselves often reflect or exhibit bias, it stands to reason that these models may also inadvertently learn this bias. In this work, we build on the seminal previous work and propose DeepSoftDe<PRE_TAG>bias</POST_TAG>, an algorithm that uses a neural network to perform 'soft de<PRE_TAG>biasing</POST_TAG>'. We exhaustively evaluate this algorithm across a variety of SOTA datasets, accuracy metrics, and challenging NLP tasks. We find that DeepSoftDe<PRE_TAG>bias</POST_TAG> outperforms the current state-of-the-art methods at reducing bias across gender, race, and religion.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.11512 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.11512 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.11512 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.