[-] diz@awful.systems 7 points 1 week ago

I think I figured it out.

He fed his post to AI and asked it to list the fictional universes he’d want to live in, and that’s how he got Dune. Precisely the information he needed, just as his post describes.

[-] diz@awful.systems 7 points 1 week ago

Yeah I think it is almost undeniable chatbots trigger some low level brain thing. Eliza has 27% Turing Test pass rate. And long before that, humans attributed weather and random events to sentient gods.

This makes me think of Langford’s original BLIT short story.

And also of rove beetles that parasitize ant hives. These bugs are not ants but they pass the Turing test for ants - they tap the antennae with an ant and the handshake is correct and they are identified as ants from this colony and not unrelated bugs or ants from another colony.

[-] diz@awful.systems 7 points 1 week ago

It would have to be more than just river crossings, yeah.

Although I'm also dubious that their LLM is good enough for universal river crossing puzzle solving using a tool. It's not that simple, the constraints have to be translated into the format that the tool understands, and the answer translated back. I got told that o3 solves my river crossing variant but the chat log they gave had incorrect code being run and then a correct answer magically appearing, so I think it wasn't anything quite as general as that.

[-] diz@awful.systems 7 points 3 weeks ago

Chatbots ate my cult.

[-] diz@awful.systems 7 points 2 months ago* (last edited 2 months ago)

Did you use any of that kind of notation in the prompt? Or did some poor squadron of task workers write out a few thousand examples of this notation for river crossing problems in an attempt to give it an internal structure?

I didn't use any notation in the prompt, but gemini 2.5 pro seem to always represent state of the problem after every step in some way. When asked if it does anything with it says it is "very important", so it may be that there's some huge invisible prompt that says its very important to do this.

It also mentioned N cannibals and M missionaries.

My theory is that they wrote a bunch of little scripts that generate puzzles and solutions in that format. Since river crossing is one of the top most popular puzzles, it would be on the list (and N cannibals M missionaries is easy to generate variants of), although their main focus would have been the puzzles in the benchmarks that they are trying to cheat.

edit: here's one of the logs:

https://pastebin.com/GKy8BTYD

Basically it keeps on trying to brute force the problem. It gets first 2 moves correct, but in a stopped clock style manner - if there's 2 people and 1 boat they both take the boat, if there's 2 people and >=2 boats, then each of them takes a boat.

It keeps doing the same shit until eventually its state tracking fails, or its reading of the state fails, and then it outputs the failure as a solution. Sometimes it deems it impossible:

https://pastebin.com/Li9quqqd

All tests done with gemini 2.5 pro, I can post links if you need them but links don't include their "thinking" log and I also suspect that if >N people come through a link they just look at it. Nobody really shares botshit unless its funny or stupid. A lot of people independently asking the same problem, that would often happen if there's a new homework question so they can't use that as a signal so easily.

[-] diz@awful.systems 7 points 2 months ago* (last edited 2 months ago)

It's google though, if nobody uses their shit they just put it inside their search.

It's only gonna go away when they run out of cash.

edit: whoops replied to the wrong comment

[-] diz@awful.systems 6 points 2 months ago* (last edited 2 months ago)

Yeah it really is fascinating. It follows some sort of recipe to try to solve the problem, like it's trained to work a bit like an automatic algebra system.

I think they had employed a lot of people to write generators of variants of select common logical puzzles, e.g. river crossings with varying boat capacities and constraints, generating both the puzzle and the corresponding step by step solution with "reasoning" and re-printing of the state of the items on every step and all that.

It seems to me that their thinking is that successive parroting can amount to reasoning, if its parroting well enough. I don't think it can. They have this one-path approach, where it just tries doing steps and representing state, just always trying the same thing.

What they need for this problem is to take a different kind of step, reduction (the duck can not be left unsupervised -> the duck must be taken with me on every trip -> rewrite a problem without the duck and with 1 less boat capacity -> solve -> rewrite the solution with "take the duck with you" on every trip).

But if they add this, then there's two possible paths it can take on every step, and this thing is far too slow to brute force the right one. They may get it to solve my duck variant, but at the expense of making it fail a lot of other variants.

The other problem is that even seemingly most elementary reasoning involves very many applications of basic axioms. This is what doomed symbol manipulation "AI" in the past and this is what is dooming it now.

[-] diz@awful.systems 7 points 2 months ago* (last edited 2 months ago)

Not really. Here's the chain-of-word-vomit that led to the answers:

https://pastebin.com/HQUExXkX

Note that in "its impossible" answer it correctly echoes that you can take one other item with you, and does not bring the duck back (while the old overfitted gpt4 obsessively brought items back), while in the duck + 3 vegetables variant, it has a correct answer in the wordvomit, but not being an AI enthusiast it can't actually choose the correct answer (a problem shared with the monkeys on typewriters).

I'd say it clearly isn't ignoring the prompt or differences from the original river crossings. It just can't actually reason, and the problem requires a modicum of reasoning, much as unloading groceries from a car does.

[-] diz@awful.systems 7 points 8 months ago

Maybe if the potato casserole is exploded in the microwave by another physicist, on his way to start a resonance cascade...

(i'll see myself out).

[-] diz@awful.systems 6 points 1 year ago* (last edited 1 year ago)

I tried the same prompt a lot of times and saw "chain of thought" attempts complete with the state modeling... they must be augmenting the training dataset with some sort of script generated crap.

I have to say those are so far the absolute worst attempts.

Day 16 (Egg 3 on side A; Duck 1, Duck 2, Egg 1, Egg 2 on side B): Janet takes Egg 3 across the river.

"Now, all 2 ducks and 3 eggs are safely transported across the river in 16 trips."

I kind of feel that this undermines the whole point of using transformer architecture instead of a recurrent neural network. Machine learning sucks at recurrence.

[-] diz@awful.systems 7 points 1 year ago

The counting failure in general is even clearer and lacks the excuse of unfavorable tokenization. The AI hype would have you believe just an incremental improvement in multi-modality or scaffolding will overcome this, but I think they need to make more fundamental improvements to the entire architecture they are using.

Yeah.

I think the failure could be extremely fundamental - maybe local optimization of a highly parametrized model is fundamentally unable to properly learn counting (other than via memorization).

After all there's a very large number of ways how a highly parametrized model can do a good job of predicting the next token, which would not involve actual counting. What makes counting special vs memorization is that it is relatively compact representation, but there's no reason for a neural network to favor compact representations.

The "correct" counting may just be a very tiny local minimum, with tall hill all around it and no valley leading there. If that's the case then local optimization will never find it.

[-] diz@awful.systems 6 points 1 year ago

Well the problem is it not having any reasoning period.

Not clear what symbolic reasoning would entail, but puzzles generally require you to think through several approaches to solve them, too. That requires a world model, a search, etc. the kind of stuff that actual AIs, even a tik tac toe AI, have, but LLMs don't.

On top of it this all works through machine learning, which produces the resulting network weights through very gradual improvement at next word prediction, tiny step by tiny step. Even if some sort of discrete model (like say the account of what's on either side of the river) could help it predict the next token, there isn't a tiny fraction of a discrete "model" that would help it, and so it simply does not go down that path at all.

view more: ‹ prev next ›

diz

joined 2 years ago