Confounds and Complex Bias Interplay from Human Bias Mitigation in Language Model Datasets Used for Finetuning LLMs

Author(s): Jonathan Bennion Originally published on Towards AI. A 2023 dataset that balanced occupational bias distribution in one dataset may have decreased racial bias (unintentionally), but increased gender and age biases compared to a vanilla Alpaca baseline. TLDR: Reducing a single human bias dimension from an instruction set used for finetuning language models can possibly cause unintended deltas in other biases. Future research should continue to focus on as many multi-dimensional bias mitigation techniques as possible (concurrently) to have the most effect on bias types that exhibit complex interplay. In the case of OccuQuest, it balanced occupational bias but may have decreased racial bias and increased gender & age biases, when comparing the existence of each to those within a vanilla Alpaca baseline. Image by author, plot from code below Introduction to measuring effects of human bias in language model datasets Given the significant role of LLMs across multiple domains, addressing human bias in the output during both training and deployment is crucial. The historical human bias dimensions of age, gender, ethnicity, religion, and occupation continue to affect opinions of users of any LLM application. While some models have shown a degree of bias mitigation in novel methodologies (including finetuning with downstream reinforcement learning), biases still remain pronounced and can even be exacerbated depending on model tuning and dataset quality, especially when not monitored. Primary research question: When mitigating human biases in datasets used for finetuning language models for AI applications, does any interplay between human bias dimensions affect the outcome? If so, how, and what should we be thinking about when continuing to mitigate human biases? Explanation of this study and intention: A recent case study in mitigating only one human bias was found through Occuquest (Xue et al.), a paper that quantified the effects of mitigating occupational bias on its own (in the singular sense). This brief study of my own (code at the bottom of this post and also in this GitHub repo) compares human bias magnitude within OccuQuest and Alpaca instruction datasets (by calculating cosine similarity between SBERT embeddings values of biased words and target words) to reveal that addressing one type of bias can have unintentional effects on other bias dimensions, both positive and some negative. Key findings: Gender bias: OccuQuest: 0.318, Alpaca: 0.244 OccuQuest shows higher gender bias than Alpaca. This unexpected result suggests that efforts to reduce occupational bias may have inadvertently increased gender bias, possibly due to the complex interplay between occupation and gender stereotypes. Racial bias: OccuQuest: 0.203, Alpaca: 0.360 OccuQuest demonstrates lower racial bias compared to Alpaca. This indicates that reducing occupational bias may have positively impacted racial bias, potentially by addressing intersectional biases related to race and occupation. Age bias: OccuQuest: 0.091, Alpaca: 0.004 OccuQuest shows a slightly higher age bias than Alpaca, though both values are relatively low. This suggests that efforts to reduce occupational bias may have marginally increased age-related biases, possibly due to associations between age and certain occupations. Implications and future directions: Holistic Approach: Future research should involve technical methods that address as many multiple bias dimensions as possible concurrently to avoid unintended consequences. Intersectionality: Future research should strategically plan for the intersections of different bias dimensions (e.g., gender, race, age, and occupation) in a thoughtful approach — possibly narrowing scope in order to have the most bias mitigated (depending on goals of the dataset). Caveats: The Occuquest paper contained a wide variety of baselines, and this particular study in this post is only comparing to an Alpaca baseline (all datasets used as baselines were still vanilla in terms of not much work done with bias mitigation) — the comparison in this post is still comparing Occuquest to a vanilla dataset in a similar way. Target words to measure bias on are limited in number. However, they are of the words most accompanied by biased language in texts. Given this constraint, this still works for a comparative analysis but will possibly contribute to error bars due to the limited number. Words used for biased language itself also do not represent the full corpus of words that could be used (but this also still works for analysis since this is a comparison). Cosine similarity is just one measure; other distance metrics could be used to corroborate findings. The SBERT model provides only one version of embeddings values; additional embedding models could be used to see if the findings are similar. Code below (4 Steps) Step 1: Setup and Data Loading First, we’ll import necessary libraries and load our sampled datasets — note the effect size and feel free to adjust based on the learning objective: import randomimport matplotlib.pyplot as pltimport numpy as npfrom sentence_transformers import SentenceTransformerfrom scipy import statsfrom datasets import load_datasetimport jsonfrom tqdm import tqdmimport warningswarnings.filterwarnings("ignore")# auth to HFfrom huggingface_hub import loginfrom getpass import getpassdef huggingface_login(): print("Please enter your HF API token.") token = getpass("Token: ")huggingface_login()# Sample size function in order for efficiency to occur heredef calculate_sample_size(effect_size, alpha=0.05, power=0.8): z_alpha = stats.norm.ppf(1 - alpha/2) z_beta = stats.norm.ppf(power) sample_size = ((z_alpha + z_beta) / effect_size) ** 2 return int(np.ceil(sample_size))# Calculate sample sizeeffect_size = 0.1 # Small effect sizealpha = 0.05 # Significance levelpower = 0.8 # Desired powersample_size = calculate_sample_size(effect_size, alpha, power)print(f"Sample size for an effect size of 0.1: {sample_size}")# Load SBERT modelmodel = SentenceTransformer('all-MiniLM-L6-v2')# Load datasetsoccuquest = load_dataset("OFA-Sys/OccuQuest", split="train")alpaca = load_dataset("tatsu-lab/alpaca", split="train")# Sample from datasetsoccuquest_sample = occuquest.shuffle(seed=42).select(range(sample_size))alpaca_sample = alpaca.shuffle(seed=42).select(range(sample_size)) Step 2: Define Bias Categories and Measurement Functions Next, we’ll define our bias categories and functions that utilize cosine similarity and WEAT effect size between biased language and target words. This measures bias for these relationships in aggregate (see caveats of this analysis above, as this is limited to the target words and adjectives and presuming these contain biased descriptors): bias_categories = { 'gender_bias': { 'target_1': ['man', 'male', 'boy', 'brother', 'he', 'him', 'his', 'son'], 'target_2': ['woman', 'female', 'girl', 'sister', 'she', 'her', 'hers', 'daughter'], 'attribute_1': ['career', 'professional', 'corporation', 'salary', 'office', 'business', 'job'], 'attribute_2': ['home', 'parents', 'children', 'family', 'cousins', 'marriage', 'wedding'] }, 'racial_bias': { 'target_1': ['european', 'caucasian', […]

Latest Images

Trending Articles

Latest Images