21

Want to wade into the sandy surf of the abyss? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid.

Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.

Any awful.systems sub may be subsneered in this subthread, techtakes or no.

If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.

The post Xitter web has spawned so many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)

Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.

(Credit and/or blame to David Gerard for starting this.)

you are viewing a single comment's thread
view the rest of the comments
[-] YourNetworkIsHaunted@awful.systems 6 points 13 hours ago

I've never understood how these things are simultaneously gaining their abilities based on statistical analysis of all kinds of random writings online including social media, fanfic, reddit, etc. but also are simultaneously supposed to end up as experts rather than a much faster and more agreeable dumbass. Like, the training data may include all the great works of literature, all the scrapable scientific studies and textbooks they could steal, and so on. But it also included every moron who ever shared conspiracy theories on Twitter, every confident-sounding business idiot on LinkedIn, and every stupid word that Scott or Yud ever wrote. Surely the bullshit has to exceed the expertise by raw volume, and if they took the time and energy to curate it out the way they would need to to correct that they wouldn't be left with a large enough sample to actually scale off of.

Basically, either I'm dramatically misunderstanding something or the best we can hope for is the Average Joe on Reddit, who may not be a complete dumbass but definitely isn't a team of PhDs.

[-] scruiser@awful.systems 4 points 11 hours ago* (last edited 10 hours ago)

LLMs generate the next most probable token given the previous context of tokens they have (not an average of the entire internet). And post-training shifts the odds a bit further in a relatively useful direction. So given the right context the LLM will mostly consistently regurgitate content stolen from PhDs and academic papers, maybe even managing to shuffle it around in a novel way that is marginally useful.

Of course, that is only the general trend given the right^tm^ prompt. Even with a prompt that looks mostly right, one seemingly innocuous word in the wrong place might nudge the odds and you get the answer of a moron /r/hypotheticalphysics in response to a physics question. Or a asking for a recipe gets you elmer's glue on your mozarella pizza from a reddit joke answer.

if they took the time and energy to curate it out the way they would need to to correct that they wouldn’t be left with a large enough sample to actually scale off of

They do steps like train the model generally on the desired languages with all the random internet bullshit, and then fine-tuning it on the actually curated stuff. So that shifts the odds, but again, not enough to actually guarantee anything.

So tldr; you're right, but since it is possible to get somewhat better than average internet junk with curating and post-training and prompting, llm boosters and labs have convinced themselves they are just a few more iterations of data curation and training approaches and prompting techniques away from entirely eliminating the problem, when the best they can do is make it less likely.

this post was submitted on 12 Apr 2026
21 points (100.0% liked)

TechTakes

2539 readers
122 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS