LLMs average <5% on 2025 Math Olympiad; award each other 20x points (arxiv.org)

submitted 2 months ago by slop_as_a_service@awful.systems to c/techtakes@awful.systems

44 comments fedilink hide all child comments

"Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as "trivial", even when their validity was crucial."

you are viewing a single comment's thread
view the rest of the comments

[-] froztbyte@awful.systems 28 points 2 months ago

"thought process" lol.

[-] pennomi@lemmy.world 2 points 2 months ago

LLMs are a lot more sophisticated than we initially thought, read the study yourself.

Essentially they do not simply predict the next token, when scientists trace the activated neurons, they find that these models plan ahead throughout inference, and then lie about those plans when asked to say how they came to a conclusion.

[-] V0ldek@awful.systems 23 points 2 months ago* (last edited 2 months ago)

read the study yourself

> ask the commenter if it's a study or a self-interested blog post
> they don't understand
> pull out illustrated diagram explaining that something hosted exclusively on the website of the for-profit business all authors are affiliated with is not the same as a peer-reviewed study published in a real venue
> they laugh and say "it's a good study sir"
> click the link
> it's a blog post

[-] Soyweiser@awful.systems 16 points 2 months ago

I wonder if they already made up terms like 'bloggophobic' or 'peer review elitist' in that 'rightwinger tries to use leftwing language' way.

load more comments (28 replies)

load more comments (29 replies)

this post was submitted on 07 Apr 2025

38 points (100.0% liked)

TechTakes

1910 readers

78 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems