[-] diz@awful.systems 7 points 2 months ago* (last edited 2 months ago)

Even to the extent that they are "prompting it wrong" it's still on the AI companies for calling this shit "AI". LLMs fundamentally do not even attempt to do cognitive work (the way a chess engine does by iterating over possible moves).

Also, LLM tools do not exist. All you can get is a sales demo for the company stock (the actual product being sold), built to impress how close to AGI the company is. You have to creatively misuse these things to get any value out of them.

The closest they get to tools is "AI coding", but even then, these things plagiarize code you don't even want plagiarized (because its MIT licensed and you'd rather keep up with upstream fixes).

[-] diz@awful.systems 7 points 4 months ago

Yeah I'm thinking that people who think their brains work like LLM may be somewhat correct. Still wrong in some ways as even their brains learn from several orders of magnitude less data than LLMs do, but close enough.

[-] diz@awful.systems 7 points 4 months ago

They're also very gleeful about finally having one upped the experts with one weird trick.

Up until AI they were the people who were inept and late at adopting new technology, and now they get to feel that they're ahead (because this time the new half-assed technology was pushed onto them and they didn't figure out they needed to opt out).

[-] diz@awful.systems 7 points 4 months ago* (last edited 4 months ago)

If it was a basement dweller with a chatbot that could be mistaken for a criminal co-conspirator, he would've gotten arrested and his computer seized as evidence, and then it would be a crapshoot if he would even be able to convince a jury that it was an accident. Especially if he was getting paid for his chatbot. Now, I'm not saying that this is right, just stating how it is for normal human beings.

It may not be explicitly illegal for a computer to do something, but you are liable for what your shit does. You can't just make a robot lawnmower and run over a neighbor's kid. If you are using random numbers to steer your lawnmower... yeah.

But because it's OpenAI with 300 billion dollar "valuation", absolutely nothing can happen whatsoever.

[-] diz@awful.systems 7 points 4 months ago* (last edited 4 months ago)

Yeah I'm thinking this one may be special cased, perhaps they wrote a generator of river crossing puzzles with corresponding conversion to "is_valid_state" or some such. I should see if I can get it to write something really ridiculous into "is_valid_state".

Other thing is that in real life its like "I need to move 12 golf carts, one has low battery, I probably can't tow more than 3 uphill, I can ask Bob to help but he will be grumpy...", just a tremendous amount of information (most of it irrelevant) with tremendous^tremendous^ possible moves (most of them possible to eliminate by actual thinking).

[-] diz@awful.systems 7 points 4 months ago

Yeah, I'd also bet on the latter. They also added a fold-out button that shows you the code it wrote (folded by default), but you got to unfold it or notice that it is absent.

[-] diz@awful.systems 7 points 4 months ago

Yeah I think it is almost undeniable chatbots trigger some low level brain thing. Eliza has 27% Turing Test pass rate. And long before that, humans attributed weather and random events to sentient gods.

This makes me think of Langford’s original BLIT short story.

And also of rove beetles that parasitize ant hives. These bugs are not ants but they pass the Turing test for ants - they tap the antennae with an ant and the handshake is correct and they are identified as ants from this colony and not unrelated bugs or ants from another colony.

[-] diz@awful.systems 7 points 5 months ago

It would have to be more than just river crossings, yeah.

Although I'm also dubious that their LLM is good enough for universal river crossing puzzle solving using a tool. It's not that simple, the constraints have to be translated into the format that the tool understands, and the answer translated back. I got told that o3 solves my river crossing variant but the chat log they gave had incorrect code being run and then a correct answer magically appearing, so I think it wasn't anything quite as general as that.

[-] diz@awful.systems 7 points 5 months ago

Chatbots ate my cult.

[-] diz@awful.systems 7 points 6 months ago* (last edited 6 months ago)

Did you use any of that kind of notation in the prompt? Or did some poor squadron of task workers write out a few thousand examples of this notation for river crossing problems in an attempt to give it an internal structure?

I didn't use any notation in the prompt, but gemini 2.5 pro seem to always represent state of the problem after every step in some way. When asked if it does anything with it says it is "very important", so it may be that there's some huge invisible prompt that says its very important to do this.

It also mentioned N cannibals and M missionaries.

My theory is that they wrote a bunch of little scripts that generate puzzles and solutions in that format. Since river crossing is one of the top most popular puzzles, it would be on the list (and N cannibals M missionaries is easy to generate variants of), although their main focus would have been the puzzles in the benchmarks that they are trying to cheat.

edit: here's one of the logs:

https://pastebin.com/GKy8BTYD

Basically it keeps on trying to brute force the problem. It gets first 2 moves correct, but in a stopped clock style manner - if there's 2 people and 1 boat they both take the boat, if there's 2 people and >=2 boats, then each of them takes a boat.

It keeps doing the same shit until eventually its state tracking fails, or its reading of the state fails, and then it outputs the failure as a solution. Sometimes it deems it impossible:

https://pastebin.com/Li9quqqd

All tests done with gemini 2.5 pro, I can post links if you need them but links don't include their "thinking" log and I also suspect that if >N people come through a link they just look at it. Nobody really shares botshit unless its funny or stupid. A lot of people independently asking the same problem, that would often happen if there's a new homework question so they can't use that as a signal so easily.

[-] diz@awful.systems 7 points 6 months ago* (last edited 6 months ago)

It's google though, if nobody uses their shit they just put it inside their search.

It's only gonna go away when they run out of cash.

edit: whoops replied to the wrong comment

[-] diz@awful.systems 7 points 7 months ago* (last edited 7 months ago)

Not really. Here's the chain-of-word-vomit that led to the answers:

https://pastebin.com/HQUExXkX

Note that in "its impossible" answer it correctly echoes that you can take one other item with you, and does not bring the duck back (while the old overfitted gpt4 obsessively brought items back), while in the duck + 3 vegetables variant, it has a correct answer in the wordvomit, but not being an AI enthusiast it can't actually choose the correct answer (a problem shared with the monkeys on typewriters).

I'd say it clearly isn't ignoring the prompt or differences from the original river crossings. It just can't actually reason, and the problem requires a modicum of reasoning, much as unloading groceries from a car does.

view more: ‹ prev next ›

diz

joined 2 years ago