13
Bag of words, have mercy on us
(www.experimental-history.com)
Hurling ordure at the TREACLES, especially those closely related to LessWrong.
AI-Industrial-Complex grift is fine as long as it sufficiently relates to the AI doom from the TREACLES. (Though TechTakes may be more suitable.)
This is sneer club, not debate club. Unless it's amusing debate.
[Especially don't debate the race scientists, if any sneak in - we ban and delete them as unsuitable for the server.]
See our twin at Reddit
I think it's worth being a little more mathematically precise about the structure of the bag. A path is a sequence of words. Any language model is equivalent to a collection of weighted paths. So, when they say:
Yes, but we think that protein folding is NP-complete; it's not just about which amino acids are in the bag, but the paths along them. Similarly, Stockfish is amazingly good at playing chess, which is PSPACE-complete, partially due to knowing the structure between families of positions. But evidence suggests that NP-completeness and PSPACE-completeness are natural barriers, so that either protein folding has simple rules or LLMs can't e.g. predict the stock market, and either chess has simple rules or LLMs can't e.g. simulate quantum mechanics. There's no free lunch for optimization problems either. This is sort of like the Blockhead argument in reverse; Blockhead can't be exponentially large while carrying on a real-time conversation, and contrapositively the relatively small size of a language model necessarily represents a compressed simplified system.
Yeah, that's Whorfian mind-lock, and it can be a real issue sometimes. However, in practice, people slap together a portmanteau or onomatopoeia and get on with the practice of things. Moreover, Zipf processes naturally reduce the size of words as they are used more, producing a language that is naturally evolved to be within a constant factor of the optimal size. That is, the right words evolve to exist and common words evolve to be small.
But that's obvious if we think about paths instead of words. Multiple paths can be equivalent in probability, start and end with the same words, and yet have different intermediate words. Whorfian issues only arise when we lack any intermediate words for any of those paths, so that none of them can be selected.
A more reasonable objection has to do with the size of definitions. It's well-known folklore in logic that extension by definition is mandatory in any large body of work because it's the only way to prevent some proofs from exploding due to combinatorics. LLMs don't have any way to define one word in terms of other words, whether by macro-clustering sequences or lambda-substituting binders, and they end up learning so much nuance that they are unable to actually respect definitions during inference. This doesn't matter for humans because we're not logical or rational, but it stymies any hope that e.g. Transformers, RWKV, or Mamba will produce a super-rational Bayesian Ultron.