18
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 03 Mar 2025
18 points (100.0% liked)
TechTakes
1808 readers
82 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
A classic example of the "AI can't be dumb because humans are dumb too" trope, Pokemon Red edition:
https://www.lesswrong.com/posts/HyD3khBjnBhvsp8Gb/so-how-well-is-claude-playing-pokemon?commentId=muqLXFGXJLbdStaNm
this is so embarrassing. "you say Claude is less capable than a typical six year old? yeah well what if the six year old is notably stupid? did you think of that?"
Instead of increasing the capabilities of llms a lot of work is done in the field of downplaying human capabilities to make llms look better in comparison. You would assume that the 'be aware of biasses, and learn to think rationally' place would notice this trap. But nope, nobody reads the sequences anymore. (E: for the people not in the know, the sequences is the Rationalist bible written by Yud (extremely verbose, the new bits are not good and the good bits are not new) used here as a joke, reading it (and saying you should) used to be part of the cultic milieu of LW).
Wait even LWers aren't reading the sequences anymore? Or rather aren't pretending to have done so?
Why put in the work when you can ask Claude to summarize them for you and reap those sweet sweet internet points?
It wasn't really done that much during the era when Scott A was called the new leader of lesswrong so not sure if it has increased again. I assume a lot still do, as I assume a lot also pretend to have read it. Never looked into any stats, or if those stats are public. I know they put them all on a specific site in 2015. (https://www.readthesequences.com/) The bibliography is a treat (esp as it starts with pop sci books, and a SSC blog post, but also: "Banks, Iain. The Player of Games. Orbit, 1989.", and not one but 3 of the Doc EE Smith lensmen books).
LW subjected me to a CAPTCHA which I find pretty funny for reasons I CBA to articulate right now.
Claude couldn't exit the house at the beginning of Pokémon Red, an incredibly popular and successful game for children, therefore it's dumber than an average child? Sounds dubious. I couldn't figure out how to do that either and look at how intelligent I am!
Without a captcha the AIs would be trained on cutting edge AI alignment research like "videogames are unintuitive sometimes", and this would greatly increase the chance of P(Doom).
The CAPTCHA failed to load properly for me at first, and then was mega slow. Quality custom implementation of a (wrapper around a) CAPTCHA, millions of EA money well spent.
I mean, it's obviously true that games have their own internal structures and languages that aren't always obvious without knowledge or context, and the FireRed comparison is a neat case where you can see that language improving as designers have both more tools (here meaning colors and pixels) and also more experience in using them. But also even in the LW thread they mention that when humans run into that kind of problem they don't just act randomly for 6 hours. Either they came up with some systematic approach for solving the problem, they walked away from the game to ask for help, or something else. Also you have the metacognition to be able to understand easily "that rug at the bottom marks the exit" once it's explained, which I'm pretty sure the LLM doesn't have the ability to process. It's not even like a particularly dumb 6-year-old. Even if it's prone to similar levels of over matching and pattern recognition errors, the 6-year-old has an actual conscious brain to help solve those problems. The whole thing shows once again that pattern recognition and reproduction can get you impressively far in terms of imitating thought, but there's a world of difference between that imitation and the real deal.