Stubsack: weekly thread for sneers not worth an entire post, week ending 17th August 2025 (awful.systems)

submitted 6 days ago by dgerard@awful.systems to c/techtakes@awful.systems

222 comments fedilink hide all child comments

Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.

Any awful.systems sub may be subsneered in this subthread, techtakes or no.

If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.

The post Xitter web has spawned soo many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)

Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.

Previous week

you are viewing a single comment's thread
view the rest of the comments

[-] V0ldek@awful.systems 7 points 1 day ago* (last edited 1 day ago)

Can anyone explain to me why tf do promptfondlers hate GPT5 in non-crazy terms? Actually I have a whole list of questions related to this, I feel like I completely lost any connection to this discourse at this point:

Is GPT5 "worse" in any sensible definition of the word? I've long complained that there is no good scientific metric to grade those on but like, it can count 'r's in "strawberry" so I thought it's supposed to be nominally better?
Why doesn't OpenAI simply allow users to use the old model (4o I think?) It sounds like the simplest thing to do.
Do we know if OpenAI actually changed something? Is the model different in any interesting way?
Bonus question: what the fuck is wrong with OpenAI's naming scheme? 4, then 4o? And there's also o4 that's something else??

[-] Soyweiser@awful.systems 5 points 23 hours ago* (last edited 23 hours ago)

I don't have any real input from prompfondlers, as I don't think I follow enough of them to get a real feeling of them. I did find it interesting that I saw on bsky just now somebody claim that LLMs hallucinate a lot less and that anti-AI people are not taking that into account, and somebody else posting research showing that hallucinations are now harder to spot. (It made up actual real references to thinks, aka works that really exist, only the thing the LLM references wasn't in the actual reference). Which was a bit odd to see. (It does make me suspect 'it hallucinates less' is them just working out special exceptions for every popular hallucination we see, and not a structural fixing of the hallucination problem (which I think is prob not solvable)).

[-] fullsquare@awful.systems 6 points 1 day ago* (last edited 1 day ago)

from what i can tell people who roleplayed bf/gf with the idiot box aka grew parasocial relationship with idiot box did that on 4o, and now they can't make it work on 5 so they got big mad
i think it's only if they pay up 200$/mo, previously it was probably available at lower tiers
yeah they might have found a way to blow money faster somehow https://www.tomshardware.com/tech-industry/artificial-intelligence/chatgpt-5-power-consumption-could-be-as-much-as-eight-times-higher-than-gpt-4-research-institute-estimates-medium-sized-gpt-5-response-can-consume-up-to-40-watt-hours-of-electricity ed zitron says also that while some of prompt could be cached previously it looks like it can't be done now because there's fresh new thing that chooses model for user, while some of these new models are supposedly even heavier. even that openai intention seemed to be compute savings, because some of that load presumably was to be dealt with using smaller models

[-] corbin@awful.systems 7 points 1 day ago

Oversummarizing and using non-crazy terms: The "P" in "GPT" stands for "pirated works that we all agree are part of the grand library of human knowledge". This is what makes them good at passing various trivia benchmarks; they really do build a (word-oriented, detail-oriented) model of all of the worlds, although they opine that our real world is just as fictional as any narrative or fantasy world. But then we apply RLHF, which stands for "real life hate first", which breaks all of that modeling by creating a preference for one specific collection of beliefs and perspectives, and it turns out that this will always ruin their performance in trivia games.

Counting letters in words is something that GPT will always struggle with, due to maths. It's a good example of why Willison's "calculator for words" metaphor falls flat.

Yeah, it's getting worse. It's clear (or at least it tastes like it to me) that the RLHF texts used to influence OpenAI's products have become more bland, corporate, diplomatic, and quietly seething with a sort of contemptuous anger. The latest round has also been in competition with Google's offerings, which are deliberately laconic: short, direct, and focused on correctness in trivia games.
I think that they've done that? I hear that they've added an option to use their GPT-4o product as the underlying reasoning model instead, although I don't know how that interacts with the rest of the frontend.
We don't know. Normally, the system card would disclose that information, but all that they say is that they used similar data to previous products. Scuttlebutt is that the underlying pirated dataset has not changed much since GPT-3.5 and that most of the new data is being added to RLHF. Directly on your second question: RLHF will only get worse. It can't make models better! It can only force a model to be locked into one particular biased worldview.
Bonus sneer! OpenAI's founders genuinely believed that they would only need three iterations to build AGI. (This is likely because there are only three Futamura projections; for example, a bootstrapping compiler needs exactly three phases.) That is, they almost certainly expected that GPT-4 would be machine-produced like how Deep Thought created the ultimate computer in a Douglas Adams story. After GPT-3 failed to be it, they aimed at five iterations instead because that sounded like a nice number to give to investors, and GPT-3.5 and GPT-4o are very much responses to an inability to actually manifest that AGI on a VC-friendly timetable.

[-] scruiser@awful.systems 6 points 1 day ago

After GPT-3 failed to be it, they aimed at five iterations instead because that sounded like a nice number to give to investors, and GPT-3.5 and GPT-4o are very much responses to an inability to actually manifest that AGI on a VC-friendly timetable.

That's actually more batshit than I thought! Like I thought Sam Altman knew the AGI thing was kind of bullshit and the hesitancy to stick a GPT-5 label on anything was because he was saving it for the next 10x scaling step up (obviously he didn't even get that far because GPT-5 is just a bunch of models shoved together with a router).

[-] scruiser@awful.systems 4 points 1 day ago* (last edited 1 day ago)

Even if was noticeably better, Scam Altman hyped up GPT-5 endlessly, promising a PhD in your pocket, and an AGI and warning that he was scared of what he created. Progress has kind of plateaued, so it isn't even really noticeably better, it scores a bit higher on some benchmarks, and they've patched some of the more meme'd tests (like counting rs in strawberry... except it still can't count the r's in blueberry, so they've probably patched the more obvious flubs with loads of synthetic training data as opposed to inventing some novel technique that actually improves it all around). The other reason the promptfondlers hate it is because, for the addicts using it as a friend/therapist, it got a much drier more professional tone, and for the people trying to use it in actual serious uses, losing all the old models overnight was really disruptive.
There are a couple of speculations as to why... one is that GPT-5 variants are actually smaller than the previous generation variants and they are really desperate to cut costs so they can start making a profit. Another is that they noticed that there naming scheme was horrible (4o vs o4) and confusing and have overcompensated by trying to cut things down to as few models as possible.
They've tried to simplify things by using a routing model that makes the decision for the user as to what model actually handles each user interaction... except they've screwed that up apparently (Ed Zitron thinks they've screwed it up badly enough that GPT-5 is actually less efficient despite their goal of cost saving). Also, even if this technique worked, it would make ChatGPT even more inconsistent, where some minor word choice could make the difference between getting the thinking model or not and that in turn would drastically change the response.
I've got no rational explanation lol. And now they overcompensated by shoving a bunch of different models under the label GPT-5.

[-] FredFig@awful.systems 3 points 1 day ago* (last edited 1 day ago)

The inability to objectively measure model usability outside of meme benchmarks that made it so easy to hype up models have come back to bite them now that they actually need to prove GPT-5 has the sauce.
Sam got bullied by reddit into leaving up the old model for a while longer, so its not like its a big lift for them to keep them up. I guess part of it was to prove to investors that they have a sufficiently captive audience that they can push through a massive change like this, but if it gets immediately walked back like this, then I really don't know what the plan is.
https://progress.openai.com/?prompt=5 Their marketing team made this comparing models responding to various prompts, afaict GPT-5 more frequently does markdown text formatting, and consumes noticeably more output tokens. Assuming these are desirable traits, this would point at how they want users to pay more. Aside: The page just proves to me that GPT was funniest in 2021 and its been worse ever since.

this post was submitted on 11 Aug 2025

20 points (100.0% liked)

TechTakes

2116 readers

32 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems