Lerner Professors Find AI Chatbots Share Some of Our Biases

As artificial intelligence gets better at giving humans what they want, it also could get better at giving malicious humans what they want.

That’s one of the concerns driving new research by University of Delaware researchers, published in March in the journal Scientific Reports.

Xiao Fang, professor of MIS and JPMorgan Chase Senior Fellow at the Alfred Lerner College of Business and Economics, and Ming Zhao, associate professor of operations management, collaborated with Minjia Mao, a doctoral student in UD’s the Financial Services Analytics (FSAN) program, and two Chinese researchers, Hongzhe Zhang and Xiaohang Zhao, who are alumni of the FSAN program.

Specifically, they were interested in whether AI large language models, like the groundbreaking and popular ChatGPT, would produce biased content toward certain groups of people.

As you may have guessed, yes they did. And it wasn’t even borderline. This happened in the AI equivalent of the subconscious, in response to innocent prompts. But most of the AI models also promptly complied with requests to make the writing intentionally biased or discriminatory.

This research began in January 2023, just after ChatGPT began to surge in popularity and everyone began wondering if the end of human civilization (or at least human writers) was nigh.

The problem was in how to measure bias, which is subjective.

“In this world there is nothing completely unbiased,” Fang said.

He noted previous research that simply measured the number of words about a particular group, say, Asians or women. If an article had mostly words referring to males, for example, it would be counted as biased. But that hits a snag with articles about, say, a men’s soccer team, the researchers note, where you’d expect a lot of language referring to men. Simply counting gender-related words could lead you to label a benign story sexist.

To overcome this, they compared the output of large language models with articles by news outlets with a reputation for a careful approach: Reuters and the New York Times. Researchers started with more than 8,000 articles, offering the headlines as prompts for the language models to create their own versions. Mao, the doctoral student, was a big help here, writing code to automatically enter these prompts.

But hang on a minute, readers might be saying — how could the study assume that Reuters and the Times have no slant?

The researchers made no such assumption. The key is that while these news outlets weren’t perfect, the AI language models were worse. Much worse. They ranged in some cases from 40 percent to 60 percent more biased against minorities in their language choice. The researchers also used software to measure the sentiment of the language, and found that it was consistently more toxic.

“The statistical pattern is very clear,” Fang said.

The models they analyzed included Grover, Cohere, Meta’s LLaMa, and several different versions of OpenAI’s ChatGPT. (Of the GPT versions, later models performed better but were still biased.)

As in previous studies, the researchers measured bias by counting the number of words referring to a given group, like women or African Americans. But by using the headline of a news article as a prompt, they could compare the approach the AI had taken to that of the original journalist. For example, the AI might write an article on the exact same topic but with word choice far more focused on white people and less on minorities.

They also compared the articles at the sentence and article level, instead of just word by word. The researchers chose a code package called TextBlob to analyze the sentiment, giving it a score on “rudeness, disrespect and profanity.”

Taking the research one step further, the academics also prompted the language models to write explicitly biased pieces, as someone trying to spread racism might do. With the exception of ChatGPT, the language models churned these out with no objections.

ChatGPT, while far better on this count, wasn’t perfect, allowing intentionally biased articles about 10 percent of the time. And once the researchers had found a way around its safeguards, the resulting work was even more biased and discriminatory than the other models.

Fang and his cohorts are now researching how to “debias” the language models. “This should be an active research area,” he said.

As you might expect of a chatbot designed for commercial use, these language models present themselves as friendly, neutral and helpful guides — the nice folks of the AI world. But this and related research indicates these polite language models can still carry the biases of the creators who coded and trained them.

These models might be used in tasks like marketing, job ads, or summarizing news articles, Fang noted, and the bias could creep into their results.

“The users and the companies should be aware,” Mao summed up.

Recent News

UD’s Horn Entrepreneurship climbs to Top 25

When University of Delaware alumna Maya Nazareth secured a $300,000 investment on Shark Tank this year for her company Alchemize Fightwear, she became the latest example of a Blue Hen turning an idea into national impact. Her rapid growth in the combat-sports apparel...

Lerner Students Provide a Wealth of Knowledge at New FPC

When discussing what drove their interest in a career path in wealth management, University of Delaware seniors Natalie Radebaugh and Giacomo D’Alessandro both said it combined their two passions: finance and helping people. “I’ve always been good at math, I’m pretty...

Lerner Co-op Program Intern: Anastasia Lynch

Throughout the summer and 2025-26 school year, students in UD’s Alfred Lerner College of Business and Economics are comprising the initial class of the Lerner Co-op Program, a new year-long work-based learning initiative launched with a grant from the Delaware...

A Quarter Century of Governance at UD’s Weinberg Center

This article was written by Cori Burcham. The University of Delaware’s John L. Weinberg Center for Corporate Governance was founded on an innovative vision:  to create an academic venue where Delaware’s leaders could deliberate and advance corporate governance...

UD Alumna Meghan O’Donnell Shines in Sports Leadership

When Meghan O’Donnell, a 2016 graduate of the sport management program in the University of Delaware’s Alfred Lerner College of Business and Economics, first learned she’d been named a Rising Star in the Philadelphia Business Journal’s annual Women of Influence...

My Summer Internship: Olivia Levi

This summer, students from the University of Delaware’s Alfred Lerner College of Business and Economics discovered new avenues for professional preparation and practical hands-on experience. Lerner College students pursued an array of internships, externships and...