Don't Claude Me (dialecticaldispatches.substack.com)

submitted 2 days ago by yogthos@lemmy.ml to c/technology@lemmy.ml

4 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] yogthos@lemmy.ml 3 points 2 days ago

Yes, the model reflects the biases already baked into the training data., and the pidgin example is almost certainly the model regurgitating classist, racist patterns from its corpus, not a developer explicitly telling it to mock villagers. However, the broader point here is reagarding systemic inequality showing up in AI output.

The intentional claim is based on the fact that Claude straight up refused to answer certain factual questions for users who identified as Iranian or Russian, while cheerfully answering the same questions for Americans. That can't be hand waved away as a statistical correlation between dialect and knowledge. That's a hard refusal trigger almost certainly put there by safety/alignment tuning, RLHF filters, or some geopolitical compliance rules nobody knows about. Someone decided that users from those countries shouldn't get those answers.

So there are two different things happening. One is that the model has passive bias where it learns toxic associations from training data. But the other is active gating where the model is instructed, directly or indirectly, to withhold information based on user demographics. The refusal case clearly shows that there is deliberate choice in whom the model will give answers to.

And the most important aspect of all this is that we cannot reliably know what the reason for a particular behavior is because closed models make it impossible to tell which mechanism is at work. Hence why open and inspectable models are the only way to audit this stuff. The prescription of openness and local control makes sense regardless of whether the harm is passive or active.

this post was submitted on 05 Jun 2026

11 points (100.0% liked)

Technology

42682 readers

261 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 7 years ago

MODERATORS

MinutePhrase@lemmy.ml