overview for diz

Apple: ‘Reasoning’ AIs fail hard if they actually have to think by diz in c/techtakes@awful.systems

[-] diz@awful.systems 13 points 1 week ago* (last edited 1 week ago)

Yeah any time its regurgitating an IMO problem it’s a proof it’salmost superhuman, but any time it actually faces a puzzle with unknown answer, this is not what it is for.

OpenAI engineers are flocking to its rival Anthropic. “They let us huff our own farts,” says one by diz in c/techtakes@awful.systems

[-] diz@awful.systems 12 points 1 week ago

making LLMs not say racist shit

That is so 2024. The new big thing is making LLMs say racist shit.

Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025 by diz in c/techtakes@awful.systems

[-] diz@awful.systems 15 points 3 weeks ago* (last edited 3 weeks ago)

I was trying out free github copilot to see what the buzz is all about:

It doesn't even know its own settings. This one little useful thing that isn't plagiarism, providing natural language interface to its own bloody settings, it couldn't do.

AI resorts to robot blackmail! — because Anthropic asked for a story of robot blackmail by diz in c/techtakes@awful.systems

[-] diz@awful.systems 17 points 3 weeks ago* (last edited 3 weeks ago)

All joking aside, there is something thoroughly fucked up about this.

What's fucked up is that we let these rich fucks threaten us with extinction to boost their stock prices.

Imagine if some cold fusion scammer was permitted to gleefully boast that his experimental cold fusion plant in the middle of a major city could blow it up. Setting up little hydrogen explosions, setting up a neutron source just to make it spicier, etc.

Where Scoot makes the case about how an AGI could build an army of terminators in a year if it wanted. by diz in c/sneerclub@awful.systems

[-] diz@awful.systems 13 points 1 month ago* (last edited 1 month ago)

It is as if there were people fantasizing about automaton mouths and lips and tongues and vocal cords for some reason, and come up with all these fantasies of how it'll be when automatons can talk.

And then Edison invents the phonograph.

And then they stick their you know what in the gearing between the cylinder and the screw.

Except somehow more stupid, because these guys are worried about AI apocalypse while boosting AI hype that pays for this supposed apocalypse.

edit: If someone said in 1850s "automatons won't be able to talk for another 150 years or longer because the vocal tract is too intricate", and some automaton fetishist says that they will be able to talk in 20 years, the phonograph shouldn't lend any credence whatsoever to the latter. What is different this time is that phonograph was genuinely extremely useful for what it is, while the generative AI is not quite as useful and they're going for the automaton fetishist money.

Where Scoot makes the case about how an AGI could build an army of terminators in a year if it wanted. by diz in c/sneerclub@awful.systems

[-] diz@awful.systems 11 points 1 month ago

is somewhere between 0 and 100%.

That really pins it down, doesn't it?

Latest AI-hallucinated legal filing, from AI vendor Anthropic by diz in c/techtakes@awful.systems

[-] diz@awful.systems 12 points 1 month ago* (last edited 1 month ago)

When confronted with a problem like “your search engine imagined a case and cited it”, the next step is to wonder what else it might be making up, not to just quickly slap a bit of tape over the obvious immediate problem and declare everything to be great.

Exactly. Even if you ensure the cited cases or articles are real it will misrepresent what said articles say.

Fundamentally it is just blah blah blah ing until the point comes when a citation would be likely to appear, then it blah blah blahs the citation based on the preceding text that it just made up. It plain should not be producing real citations. That it can produce real citations is deeply at odds with it being able to pretend at reasoning, for example.

Ensuring the citation is real, RAG-ing the articles in there, having AI rewrite drafts, none of these hacks do anything to address any of the underlying problems.

Gemini seem to have "solved" my duck river crossing, lol. by diz in c/techtakes@awful.systems

[-] diz@awful.systems 11 points 2 months ago* (last edited 2 months ago)

Yeah I think the best examples are everyday problems that people solve all the time but don't explicitly write out solutions step by step for, or not in the puzzle-answer form.

It's not even a novel problem at all, I'm sure there's even a plenty of descriptions of solutions to it as part of stories and such. Just not as "logical puzzles" due to triviality.

What really annoys me is when they claim high performance on benchmarks consisting of fairly difficult problems. This is basically fraud, since they know full well it is still entirely "knowledge" reliant, and even take steps to augment it with generated problems and solutions.

I guess the big sell is that it could use bits and pieces of logic gleaned from other solutions to solve a "new" problem. Except it can not.

[long] Some tests of how much AI "understands" what it says (spoiler: very little) by diz in c/sneerclub@awful.systems

[-] diz@awful.systems 12 points 11 months ago

I feel like letter counting and other letter manipulation problems kind of under-sell the underlying failure to count - LLMs work on tokens, not letters, so they are expected to have a difficulty with letters.

The inability to count is of course wholly general - in a river crossing puzzle an LLM can not keep track of what's on either side of the river, for example, and sometimes misreports how many steps it output.

[long] Some tests of how much AI "understands" what it says (spoiler: very little) by diz in c/sneerclub@awful.systems

[-] diz@awful.systems 14 points 11 months ago

But if your response to the obvious misrepresentation that a chatbot is a person of ANY level of intelligence is to point out that it’s dumb you’ve already accepted the premise.

How am I accepting the premise, though? I do call it an Absolute Imbecile, but that's more of a word play on the "AI" moniker.

What I do accept is an unfortunate fact that they did get their "AIs" to score very highly on various "reasoning" benchmarks (some of their own design), standardized tests, and so on and so forth. It works correctly across most simple variations, such as changing the numbers in a problem or the word order.

They really did a very good job at faking reasoning. I feel that even though LLMs are complete bullshit, the sheer strength of that bullshit is easy to underestimate.

[long] Some tests of how much AI "understands" what it says (spoiler: very little) by diz in c/sneerclub@awful.systems

[-] diz@awful.systems 17 points 11 months ago

Yeah I think that's why we need an Absolute Imbecile Level Reasoning Benchmark.

Here's what the typical PR from AI hucksters looks like:

https://www.anthropic.com/news/claude-3-family

Fully half of their claims about performance are for "reasoning", with names like "Graduate Level Reasoning". OpenAI is even worse - recall theirs claiming to have gotten 90th percentile on LSAT?

On top of it, LLMs are fine tuned to convince some dumb ass CEO who "checks it out". Even though you can pay for the subscription, you're neither the customer nor the product, you're just collateral eyeballs on the ad.

"Google Gemini tried to kill me" by diz in c/techtakes@awful.systems

[-] diz@awful.systems 11 points 1 year ago

YOU CAN DO THAT WITHOUT AI.

Can they, though? Sure, in theory Google could hire millions of people to write overviews that are equally idiotic, but obviously that is not something they would actually do.

I think there's an underlying ethical theory at play here, which goes something like: it is fine to fill internet with half-plagiarized nonsense, as long as nobody dies, or at least, as long as Google can't be culpable.