LLMs have a strong bias against use of African American English (arstechnica.com)

submitted 3 months ago by BlackEco@lemmy.blackeco.com to c/technology@lemmy.world

51 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] bionicjoey@lemmy.ca 63 points 3 months ago

Makes sense. AAVE is mostly a spoken thing, LLMs are mostly trained on the corpus of written text on the internet and in books. It's pretty rare for people to write in an AAVE style in those contexts.

[-] givesomefucks@lemmy.world 9 points 3 months ago

Except it has no difficulty reading and understanding AAVE, because people use it online frequently...

Like, the article makes that abundantly clear, but everyone commenting just read the headline and assumed what it meant was it couldn't understand it...

[-] bionicjoey@lemmy.ca 24 points 3 months ago

I never said it can't understand it. I am agreeing with the notion that it has a bias against using it.

[-] givesomefucks@lemmy.world 4 points 3 months ago

You said it's rarely used online, which just isn't true.

But like even this:

I am agreeing with the notion that it has a bias against using it

I'm not sure if you understand the bias is against users who use AAVE, or if you're saying a LLM doesn't want to use AAVE.

Maybe you did understand everything, and you're just being vague.

But almost everything you said could be interpreted multiple ways.

[-] sugar_in_your_tea@sh.itjust.works 11 points 3 months ago

Well, if the training data is largely standard english, AAVE could look like less educated English, because it doesn't follow the normal rules and conventions. And there's probably a higher correlation between AAVE use and lower means and/or education because people from the black community who have higher means and/or education probably use standard English more often because that's how they're trained.

So I don't think this is evidence about the model being "racist" or anything of that nature, it's just the model doing model things. If you type in AAVE, chances are higher that you fit the given demographic, because that's likely what the training data shows.

So, I guess don't really see the issue here? This just sounds like people thinking the model does more than it does. The model merely matches input text to data in the model. That's it. There's no "understanding" here, it's just matching inputs to outputs.

[-] Mr_Blott@feddit.uk 12 points 3 months ago

BUT IM DETERMINED TO BE OFFENDED ON SOMEONE ELSE'S BEHALF

[-] Mac@mander.xyz 1 points 3 months ago

There are times when it's acceptable and even admirable to be offended on someone else's behalf.
I'm not sure this is one of those times.

this post was submitted on 29 Aug 2024

87 points (100.0% liked)

Technology

59708 readers

1838 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS