12
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 03 Aug 2025
12 points (100.0% liked)
TechTakes
2154 readers
77 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
Well, after 2.5 years and hundreds of billions of dollars burned, we finally have GPT-5. Kind of feels like a make or break moment for the good folks at OAI~~! With the eyes of the world on their lil presentation this morning, everyone could feel the stakes: they needed something that would blow our minds. We finally get to see what a super intelligence looks like! Show us your best cherry picked benchmark Sloppenheimer!
Graphic design is my PASSION. Good thing the entirety of the world's economy is not being held up by cranking out a few more points on SWE bench right????
Ok. what about ARC? Surely ya'll got a new high to prove the AGI mission was progressing right??
Oh my fucking God. They actually have lost the lead to fucking Grok. For my sanity I didn't watch the live stream, but curiously, they left the ARC results out of their presentation. Even though they gave Francois access early to test. Kind of like they knew this looks really bad and underwhelming.
"The word blueberry contains the letter b 3 times."
Also reported in more detail here:
(via)
Wait just how bad is 4? 30% accurate? Did they train it wrong as a joke? Also hatless 5 worse than 3?
Yeah, O3 (the model that was RL'd to a crisp and hallucinated like crazy) was very strong on math coding benchmarks. GPT5 (I guess without tools/extra compute?) is worse. Nevertheless...
The one big cope I'm seeing is in the METR graph ofc. Tiny bump with massive error bars above Grok 4 so they can claim the exponential is continuing while the models stagnate in all material ways.
50% success rate? sorry, all this for a coin flip?
Looks like they already removed it.