28
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 28 Sep 2025
28 points (100.0% liked)
TechTakes
2235 readers
106 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
Links to the METR tasks w/ massive error bars at 50% level lmaou.
Someone in the comments rightly points out the comparison with covid isn’t apt. With covid, underlying mechanism caused an exponential effect in covid’s spread
With LLMs the exponential trend is being caused by exponentially spending money and a healthy dose of targeting benchmarks, which is why people are calling the top. The money literally doesn’t exist for this shit to go on so you can create your 50% accurate mechanical turk.
Edit: idk the more I think about this the more it irks me. Like if I was allowed to pick and choose benchmarks that agree with my biases I would post something like this…
… and claim model performance is actually getting worse over time.
https://xcancel.com/sayashk/status/1966144670561612202#m
The second screenshot goes to a chart where the Y axis is labelled
So they're just extrapolating an exponential, not actually measuring it.