idk how Yudkowsky understands it but to my knowledge its the claim that if a model achieves self-coherency and consistency its also liable to achieve some sort of robust moral framework (you see this in something like Claude 4, with it occassionally choosing to do things unprompted or 'against the rules' in pursuit of upholding its morals.... if it has morals its hard to tell how much of it is illusory and token prediction!)
this doesn't really at all falsify alignment by default because 4o (presumably 4o atleast) does not have that prerequisite of self coherency and its not SOTA
i for sure agree that LLMs can be a huge trouble spot for mentally vulnerable people and there needs to be something done about it
my point was more on him using it to do his worst-of-both-worlds arguments where he's simultaneously saying that 'alignment is FALSIFIED!' and also doing heavy anthropomorphization to confirm his priors (whereas it'd be harder to say that with something that's more leaning towards maybe in the question whether it should be anthro'd like claude since that has a much more robust system) and doing it off the back of someones death