When AI is tested on questions it can't model from pre-existing answers on the internet, it only scores 10% in the test. (qz.com)

submitted 6 days ago by Lugh@futurology.today to c/futurology@futurology.today

9 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Lugh@futurology.today 7 points 6 days ago* (last edited 6 days ago)

The dataset consists of 3,000 challenging questions across over a hundred subjects. We publicly release these questions, while maintaining a private test set of held out questions to assess model overfitting.

They say they've addressed this issue.

[-] hendrik@palaver.p3x.de 2 points 6 days ago

I still don't get it. And under "Future Model Performance" they say benchmarks quickly get saturated. And maybe it's going to be the same for this one and models could achieve 50% by the end of this year.... Which doesn't really sound like the "last examn" to me. But maybe it's more the approach of coming up with good science questions. And not the exact dataset??

[-] Lugh@futurology.today 2 points 6 days ago

I think the easiest way to explain this, is to say they are testing the ability to reason your way to an answer, to a question so unique, that it doesn't exist anywhere on the internet.

this post was submitted on 24 Jan 2025

66 points (100.0% liked)

Futurology

1943 readers

108 users here now

founded 2 years ago

MODERATORS

voidx@futurology.today

Lugh@futurology.today

Espiritdescali@futurology.today

AwesomeLowlander@futurology.today