38

"Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as "trivial", even when their validity was crucial."

you are viewing a single comment's thread
view the rest of the comments
[-] pennomi@lemmy.world 2 points 6 months ago

LLMs are a lot more sophisticated than we initially thought, read the study yourself.

Essentially they do not simply predict the next token, when scientists trace the activated neurons, they find that these models plan ahead throughout inference, and then lie about those plans when asked to say how they came to a conclusion.

[-] swlabr@awful.systems 25 points 6 months ago* (last edited 6 months ago)

You didn't link to the study; you linked to the PR release for the study. This and this are the papers linked in the blog post.

Note that the papers haven't been published anywhere other than on Anthropic's online journal. Also, what the papers are doing is essentially tea leaf reading. They take a look at the swill of tokens, point at some clusters, and say, "there's a dog!" or "that's a bird!" or "bitcoin is going up this year!". It's all rubbish dawg

[-] pennomi@lemmy.world 1 points 6 months ago

Fair enough, you’re the only person with a reasonable argument, as nobody else can seem to do anything other than name calling.

Linking to the actual papers and pointing out they haven’t been published to a third party journal is far more productive than whatever anti-scientific bullshit the other commenters are doing.

We should be people of science, not reactionaries.

[-] froztbyte@awful.systems 23 points 6 months ago* (last edited 6 months ago)

your argument would be immensely helped if you posted science instead of corporate marketing brochures

load more comments (6 replies)
load more comments (9 replies)
load more comments (27 replies)
this post was submitted on 07 Apr 2025
38 points (100.0% liked)

TechTakes

2268 readers
34 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS