406

Do you like (AI) clocks? (clocks.brianmoore.com)

submitted 1 week ago* (last edited 1 week ago) by LegoBrickOnFire@lemmy.world to c/programmer_humor@programming.dev

58 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[-] BlackEco@lemmy.blackeco.com 91 points 1 week ago* (last edited 1 week ago)

Is it just me or the clocks frequently break or change appearance without the page being refreshed?

Edit: nevermind, I skipped past the sentence explaining that every minute, the site prompts LLMs for a new solution. This is hilariously sad how LLMs aren't able to be consistent from one prompt to another.

[-] Carighan@piefed.world 42 points 1 week ago

It's the expected result if your big ol' artificial intelligence wannabe is ultimately just a stochastic word combinator.

[-] vrighter@discuss.tchncs.de 16 points 1 week ago

if every single token is, at the end, chosen by random dice roll (and they are) then this is exactly what you'd expect.

[-] kersplomp@piefed.blahaj.zone 5 points 1 week ago

that’s a massive oversimplification

[-] vrighter@discuss.tchncs.de 2 points 1 week ago

not really. If the system outputs a probability distribution, then by definition, you're picking somewhat randomly. So not really a simplification

[-] null@piefed.nullspace.lol 4 points 1 week ago

This is hilariously sad how LLMs aren’t able to be consistent from one prompt to another.

Typically that's configurable. Like for a chatbot, you'd want it to give the same/similar results for a given question, where with a character creator, you might want the results to vary so you can re-run until you get something you like.

Of course that wouldn't be as funny here.

load more comments (1 replies)

[-] criticon@lemmy.ca 70 points 1 week ago

This is my favorite

[-] Shaper@lemmy.world 23 points 1 week ago

gaslight clock

load more comments (1 replies)

[-] NeatNit@discuss.tchncs.de 58 points 1 week ago

The last one, Kimi K2, has been consistently good as long as I've been looking at it. That's pretty impressive.

The rest are hilarious!

[-] reseller_pledge609@lemmy.dbzer0.com 25 points 1 week ago

Deepseek has a recognisable clock now and then, too. They both mix up the current time, though.

[-] huppakee@piefed.social 20 points 1 week ago

By far the best, but still off. These three were loaded in the same order as i post them:

[-] SlurpingPus@lemmy.world 3 points 1 week ago* (last edited 1 week ago)

I dig the square clock, and am now sad that the numbers can't be put into the corners on a real clock. Unless they're shifted from the usual position.

load more comments (3 replies)

[-] Hazzard@lemmy.zip 13 points 1 week ago

Haha, I found myself thinking the same thing, and then caught myself, realizing all the other LLMs on this page had lowered the bar immensely for what I'm considering impressive.

[-] Enkrod@feddit.org 12 points 1 week ago

I thought the same and then Kimi K2 came up with a clock that has two 12 and no 11...

[-] huppakee@piefed.social 44 points 1 week ago

[removed by mod]

[-] LegoBrickOnFire@lemmy.world 20 points 1 week ago

that's scary how dementia works :'(

[-] LaLuzDelSol@lemmy.world 11 points 1 week ago

This kinda freaked me out: AI models fed their own outputs as training data will quickly start making distorted images that look spookily like human painting made under the progression of mental illness or drugs.

https://www.nature.com/articles/d41586-024-02420-7

[-] HeyThisIsntTheYMCA@lemmy.world 4 points 1 week ago

well that was terrifying

[-] BatmanAoD@programming.dev 10 points 1 week ago* (last edited 1 week ago)

Thanks for sharing this! I really think that when people see LLM failures and say that such failures demonstrate how fundamentally different LLMs are from human cognition, they tend to overlook how humans actually do exhibit remarkably similar failures modes. Obviously dementia isn't really analogous to generating text while lacking the ability to "see" a rendering based on that text. But it's still pretty interesting that whatever feedback loops did get corrupted in these patients led to such a variety of failure modes.

As an example of what I'm talking about, I appreciated and generally agreed with this recent Octomind post, but I disagree with the list of problems that "wouldn’t trip up a human dev"; these are all things I've seen real humans do, or could imagine a human doing.

[-] huppakee@piefed.social 4 points 1 week ago

such a variety of failure modes

What i find interesting is that in both cases there is a certain consistency in the mistakes too - basically every dementia patient still understands the clock is something with a circle and numbers and not a square with letters for example. LLMs can tell you cokplete bullshit, but still understands it has to be done with perfect grammar in a consistant language. So much so it struggles to respond outside of this box - ask it to insert spelling errors to look human for example.

the ability to "see"

This might be the true problem in both cases, both the patient and the model can not comprehend the bigger picture (a circle is divided into 12 segments, because that is how we deconstructed the time it takes for the earth to spin around it's axis). Things that seem logical to use, are logical because of these kind of connections with other things we know and comprehend.

[-] NeatNit@discuss.tchncs.de 3 points 1 week ago

... what

[-] monotremata@lemmy.ca 8 points 1 week ago

Basically this: https://www.psychdb.com/cognitive-testing/clock-drawing-test

[-] NeatNit@discuss.tchncs.de 4 points 1 week ago* (last edited 1 week ago)

Thanks.

What I still didn't figure out about the comment I replied to is:

What is each row? They're labeled I, II, III, IV. What's being counted?
Why did they link to a home interior design website under "via"?

load more comments (2 replies)

load more comments (1 replies)

[-] crunchy@lemmy.dbzer0.com 38 points 1 week ago

It's funny how GPT-5 is consistently the worst one, and it's not even close.

[-] snooggums@piefed.world 17 points 1 week ago

qwen 2.5 is absolutely pants on head ridiculous compared to gpt5 when I'm looking at it right now.

[-] dimjim@sh.itjust.works 22 points 1 week ago

Some of these are absolutely hilarious

[-] sheepishly@fedia.io 21 points 1 week ago

Given that the AI models are basically constructing these "blindly"- using the language model to string together html and javascript without really being able to check how it looks- some of these are actually pretty impressive. But also making the AI do things it's bad at is funny. Reminds me of all the AI ASCII art fails...

load more comments (1 replies)

[-] TrickDacy@lemmy.world 12 points 1 week ago

What is this obsession with clocks recently?

[-] Panties@lemmy.ca 25 points 1 week ago* (last edited 1 week ago)

I don't know if it's actually related, but I've read that asking people to draw a clock face is a simple way to identify some brain problems

Quick screening for dementia, according to this

Edit: I guess this means most of the AI has 'Conceptual Deficits', pretty accurate lol

[-] Trainguyrom@reddthat.com 7 points 1 week ago

Would be funny if AI models are generating such wildly useless "clocks" because they ingested too many dementia screening tests in their training data

[-] altkey@lemmy.dbzer0.com 5 points 1 week ago

There is someone training the biggest, bestest model to draw clock faces to pass that test as we speak.

[-] tomiant@piefed.social 10 points 1 week ago

I'm guessing it's an easy metric to compare benchmarks. "Write a clock".

[-] QuinnyCoded@sh.itjust.works 12 points 1 week ago* (last edited 1 week ago)

qwen is trying her best 😭😔

[-] zerofk@lemmy.zip 6 points 1 week ago* (last edited 1 week ago)

So far, I’d give qwen the prize for most artistic impression of a clock.

Kimi K2 appears to consistently get it right.

[-] zerofk@lemmy.zip 4 points 1 week ago

And just as I typed that, Kimi made one where 9 and 10, and 11 and 12 overlapped.

[-] ICastFist@programming.dev 10 points 1 week ago

I don't even

I was surprised that both Grok and Gemini 2.5 got it right once, only to fuck it up on the refresh

[-] ChaoticNeutralCzech@feddit.org 10 points 1 week ago

Not really world clocks, they just try to use JavaScript to display the device's time.

[-] TechLich@lemmy.world 10 points 1 week ago

No JavaScript I think, it's just html and CSS. The initial time is provided in the prompt every minute according to the description. I wonder if they'd be any better if they could use js. Probably not.

[-] tomiant@piefed.social 3 points 1 week ago* (last edited 1 week ago)

"Time is a relative of mine" / Eisenstein

[-] tomiant@piefed.social 7 points 1 week ago

You know, I don't, and what the fuck?

[-] Bazell@lemmy.zip 5 points 1 week ago

Well, KIMI K2 seems to have created the working one. Others failed. I suppose that this model was optimized for this while others not.

[-] SolarBoy@slrpnk.net 8 points 1 week ago

The clocks change every minute. I've seen some from deepseek and qwen that looked ok. But kimi seems to be the most consistent

[-] kersplomp@piefed.blahaj.zone 3 points 1 week ago* (last edited 1 week ago)

Really cool idea, but the site seems a bit biased for the chinese models, or is otherwise set up weird. I’m not able to reproduce how consistently bad the others are in web dev arena, which generally accepted as the gold standard for testing AI web dev ability.

[-] AppleTea@lemmy.zip 5 points 1 week ago

Each model is allowed 2000 tokens to generate its clock. Here is its prompt: Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.

are you using the same prompt?

load more comments (1 replies)