871

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic (www.tomshardware.com)

submitted 5 months ago by Lifecoach5000@lemmy.world to c/technology@lemmy.world

168 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[+] FMT99@lemmy.world 297 points 5 months ago* (last edited 4 months ago)

[deleted]

[-] spankmonkey@lemmy.world 232 points 5 months ago

AI including ChatGPT is being marketed as super awesome at everything, which is why that and similar AI is being forced into absolutely everything and being sold as a replacement for people.

Something marketed as AGI should be treated as AGI when proving it isn't AGI.

[-] pelespirit@sh.itjust.works 16 points 5 months ago

Not to help the AI companies, but why don't they program them to look up math programs and outsource chess to other programs when they're asked for that stuff? It's obvious they're shit at it, why do they answer anyway? It's because they're programmed by know-it-all programmers, isn't it.

[-] rebelsimile@sh.itjust.works 28 points 5 months ago

Because they’re fucking terrible at designing tools to solve problems, they are obviously less and less good at pretending this is an omnitool that can do everything with perfect coherency (and if it isn’t working right it’s because you’re not believing or paying hard enough)

load more comments (1 replies)

[-] ImplyingImplications@lemmy.ca 25 points 5 months ago

why don't they program them

AI models aren't programmed traditionally. They're generated by machine learning. Essentially the model is given test prompts and then given a rating on its answer. The model's calculations will be adjusted so that its answer to the test prompt will be closer to the expected answer. You repeat this a few billion times with a few billion prompts and you will have generated a model that scores very high on all test prompts.

Then someone asks it how many R's are in strawberry and it gets the wrong answer. The only way to fix this is to add that as a test prompt and redo the machine learning process which takes an enormous amount of time and computational power each time it's done, only for people to once again quickly find some kind of prompt it doesn't answer well.

There are already AI models that play chess incredibly well. Using machine learning to solve a complexe problem isn't the issue. It's trying to get one model to be good at absolutely everything.

load more comments (10 replies)

[-] PixelatedSaturn@lemmy.world 9 points 5 months ago

I don't think ai is being marketed as awesome at everything. It's got obvious flaws. Right now its not good for stuff like chess, probably not even tic tac toe. It's a language model, its hard for it to calculate the playing field. But ai is in development, it might not need much to start playing chess.

[-] vinnymac@lemmy.world 29 points 5 months ago

What the tech is being marketed as and what it’s capable of are not the same, and likely never will be. In fact all things are very rarely marketed how they truly behave, intentionally.

Everyone is still trying to figure out what these Large Reasoning Models and Large Language Models are even capable of; Apple, one of the largest companies in the world just released a white paper this past week describing the “illusion of reasoning”. If it takes a scientific paper to understand what these models are and are not capable of, I assure you they’ll be selling snake oil for years after we fully understand every nuance of their capabilities.

TL;DR Rich folks want them to be everything, so they’ll be sold as capable of everything until we repeatedly refute they are able to do so.

load more comments (1 replies)

[-] BassTurd@lemmy.world 18 points 5 months ago

Marketing does not mean functionality. AI is absolutely being sold to the public and enterprises as something that can solve everything. Obviously it can't, but it's being sold that way. I would bet the average person would be surprised by this headline solely on what they've heard about the capabilities of AI.

load more comments (7 replies)

load more comments (3 replies)

[-] malwieder@feddit.org 30 points 5 months ago

Google Maps doesn't pretend to be good at chess. ChatGPT does.

load more comments (5 replies)

[-] iAvicenna@lemmy.world 16 points 5 months ago

well so much hype has been generated around chatgpt being close to AGI that now it makes sense to ask questions like "can chatgpt prove the Riemann hypothesis"

load more comments (1 replies)

[-] Broken@lemmy.ml 12 points 5 months ago

I agree with your general statement, but in theory since all ChatGPT does is regurgitate information back and a lot of chess is memorization of historical games and types, it might actually perform well. No, it can't think, but it can remember everything so at some point that might tip the results in it's favor.

load more comments (3 replies)

load more comments (14 replies)

[-] Objection@lemmy.ml 83 points 5 months ago

Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.

[-] bier@feddit.nl 34 points 5 months ago

Yeah its like judging how great a fish is at climbing a tree. But it does show that it's not real intelligence or reasoning

[-] 13igTyme@lemmy.world 13 points 5 months ago

Don't call my fish stupid.

load more comments (1 replies)

[-] Zenith@lemm.ee 15 points 5 months ago

I forgot which airline it is but one of the onboard games in the back of a headrest TV was a game called “Beginners Chess” which was notoriously difficult to beat so it was tested against other chess engines and it ranked in like the top five most powerful chess engines ever

[-] andallthat@lemmy.world 13 points 5 months ago* (last edited 5 months ago)

Machine learning has existed for many years, now. The issue is with these funding-hungry new companies taking their LLMs, repackaging them as "AI" and attributing every ML win ever to "AI".

ML programs designed and trained specifically to identify tumors in medical imaging have become good diagnostic tools. But if you read in news that "AI helps cure cancer", it makes it sound like it was a lone researcher who spent a few minutes engineering the right prompt for Copilot.

Yes a specifically-designed and finely tuned ML program can now beat the best human chess player, but calling it "AI" and bundling it together with the latest Gemini or Claude iteration's "reasoning capabilities" is intentionally misleading. That's why articles like this one are needed. ML is a useful tool but far from the "super-human general intelligence" that is meant to replace half of human workers by the power of wishful prompting

load more comments (2 replies)

[-] NeilBru@lemmy.world 76 points 5 months ago* (last edited 5 months ago)

An LLM is a poor computational/predictive paradigm for playing chess.

[-] surph_ninja@lemmy.world 29 points 5 months ago

This just in: a hammer makes a poor screwdriver.

load more comments (1 replies)

[-] Takapapatapaka@lemmy.world 11 points 5 months ago

Actually, a very specific model (chatgpt3.5-turbo-instruct) was pretty good at chess (around 1700 elo if i remember correctly).

load more comments (4 replies)

load more comments (3 replies)

[-] AlecSadler@sh.itjust.works 65 points 5 months ago

ChatGPT has been, hands down, the worst AI coding assistant I've ever used.

It regularly suggests code that doesn't compile or isn't even for the language.

It generally suggests AC of code that is just a copy of the lines I just wrote.

Sometimes it likes to suggest setting the same property like 5 times.

It is absolute garbage and I do not recommend it to anyone.

[-] j4yt33@feddit.org 17 points 5 months ago

I find it really hit and miss. Easy, standard operations are fine but if you have an issue with code you wrote and ask it to fix it, you can forget it

load more comments (5 replies)

[-] nutsack@lemmy.dbzer0.com 9 points 5 months ago

my favorite thing is to constantly be implementing libraries that don't exist

[-] Blackmist@feddit.uk 12 points 5 months ago

You're right. That library was removed in ToolName [PriorVersion]. Please try this instead.

*makes up entirely new fictitious library name*

load more comments (2 replies)

load more comments (6 replies)

[-] nednobbins@lemm.ee 49 points 5 months ago

Sometimes it seems like most of these AI articles are written by AIs with bad prompts.

Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there's no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.

LLMs on the other hand, are very good at producing clickbait articles with low information content.

[-] nova_ad_vitum@lemmy.ca 24 points 5 months ago

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.

load more comments (3 replies)

[-] Halosheep@lemm.ee 49 points 5 months ago

I swear every single article critical of current LLMs is like, "The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole."

[-] drspod@lemmy.ml 42 points 5 months ago

It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.

[-] inconel@lemmy.ca 18 points 5 months ago

It's also from a company claiming they're getting closer to create morphing shape that can match any hole.

load more comments (1 replies)

load more comments (3 replies)

load more comments (2 replies)

[-] floofloof@lemmy.ca 45 points 5 months ago* (last edited 5 months ago)

I suppose it's an interesting experiment, but it's not that surprising that a word prediction machine can't play chess.

[-] otp@sh.itjust.works 16 points 5 months ago

Because people want to feel superior because they ~~don't know how to use a ChatBot~~ can count the number of "r"s in the word "strawberry", lol

[-] electricyarn@lemmy.world 15 points 5 months ago

Yeah, just because I can't count the number of r's in the word strawberry doesn't mean I shouldn't be put in charge of the US nuclear arsenal!

load more comments (3 replies)

[-] MonkderVierte@lemmy.zip 40 points 5 months ago

LLM are not built for logic.

[-] PushButton@lemmy.world 19 points 5 months ago

And yet everybody is selling to write code.

The last time I checked, coding was requiring logic.

[-] jj4211@lemmy.world 10 points 5 months ago

To be fair, a decent chunk of coding is stupid boilerplate/minutia that varies environment to environment, language to language, library to library.

So LLM can do some code completion, filling out a bunch of boilerplate that is blatantly obvious, generating the redundant text mandated by certain patterns, and keeping straight details between languages like "does this language want join as a method on a list with a string argument, or vice versa?"

Problem is this can be sometimes more annoying than it's worth, as miscompletions are annoying.

load more comments (4 replies)

load more comments (1 replies)

[-] anubis119@lemmy.world 36 points 5 months ago

A strange game. How about a nice game of Global Thermonuclear War?

[-] ada@piefed.blahaj.zone 17 points 5 months ago

No thank you. The only winning move is not to play

load more comments (5 replies)

[-] Furbag@lemmy.world 28 points 5 months ago

Can ChatGPT actually play chess now? Last I checked, it couldn't remember more than 5 moves of history so it wouldn't be able to see the true board state and would make illegal moves, take it's own pieces, materialize pieces out of thin air, etc.

[-] ToastedRavioli@midwest.social 9 points 5 months ago

ChatGPT must adhere honorably to the rules that its making up on the spot. Thats Dallas

load more comments (5 replies)

[-] cley_faye@lemmy.world 24 points 5 months ago

Ah, you used logic. That's the issue. They don't do that.

[-] arc99@lemmy.world 22 points 5 months ago

Hardly surprising. Llms aren't -thinking- they're just shitting out the next token for any given input of tokens.

load more comments (3 replies)

[-] vane@lemmy.world 15 points 5 months ago

It's not that hard to beat dumb 6 year old who's only purpose is mine your privacy to sell you ads or product place some shit for you in future.

[-] capuccino@lemmy.world 14 points 5 months ago

This made my day

load more comments (1 replies)

[-] jsomae@lemmy.ml 13 points 4 months ago

Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it's obviously not going to be good at it, at least not without scaffolding.

load more comments (4 replies)

[-] seven_phone@lemmy.world 12 points 5 months ago

You say you produce good oranges but my machine for testing apples gave your oranges a very low score.

load more comments (1 replies)

[-] Endymion_Mallorn@kbin.melroy.org 12 points 5 months ago

I mean, that 2600 Chess was built from the ground up to play a good game of chess with variable difficulty levels. I bet there's days or games when Fischer couldn't have beaten it. Just because a thing is old and less capable than the modern world does not mean it's bad.

[-] Sidhean 10 points 5 months ago

Can i fistfight ChatGPT next? I bet I could kick its ass, too :p

load more comments

this post was submitted on 09 Jun 2025

871 points (100.0% liked)

Technology

76648 readers

3477 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws