12
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 03 Aug 2025
12 points (100.0% liked)
TechTakes
2154 readers
73 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
Wait just how bad is 4? 30% accurate? Did they train it wrong as a joke? Also hatless 5 worse than 3?
Yeah, O3 (the model that was RL'd to a crisp and hallucinated like crazy) was very strong on math coding benchmarks. GPT5 (I guess without tools/extra compute?) is worse. Nevertheless...
The one big cope I'm seeing is in the METR graph ofc. Tiny bump with massive error bars above Grok 4 so they can claim the exponential is continuing while the models stagnate in all material ways.
50% success rate? sorry, all this for a coin flip?