20
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 14 Sep 2025
20 points (100.0% liked)
TechTakes
2264 readers
48 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
Accidentally posted in an old thread:
This bullshit again
Math competitions need to start assigning problems that require counting the letters in fruit names.
Word problems referring to aliens from cartoons. "Bobbby on planet Glorxon has four Strawberies, which are similar to but distinct from earth strawberries, and Kleelax has seven..."
I also wonder if you could create context breaks, or if they've hit a point where that isn't as much of a factor. "A train leaves Athens, KY traveling at 45 mph. Another train leaves Paris, FL traveling at 50 mph. If the track is 500 miles long, how long is a train trip from Athens to Paris?
LLM's ability to fake solving word problems hinges on being able to crib the answer, so using aliens from cartoons (or automatically-generating random names for objects/characters) will prove highly effective until AI corps can get the answers into their training data.
As for context breaks, those will remain highly effective against LLMs pretty much forever - successfully working around a context break requires reasoning, which LLMs are categorically incapable of doing.
Constantly and subtly twiddling with questions (ideally through automatic means) should prove effective as well - Apple got "reasoning" text extruders to flounder and fail at simple logic puzzles through such a method.
Sorry, what is a "context break"?
Also accidentally posted in an old thread:
Hot take: If a text extruder’s winning gold medals at your contest, that’s not a sign the text extruder’s good at something, that’s a sign your contest is worthless for determining skill.
Pretty sure I've heard similar things about AI "art" vs artists as well.
Nice result, not too shocking after IMO performance. A friend of mind told me that this particular competition is highly time constrained for human competitors, i.e., questions aren’t impossibly difficult per se, but some are time sinks that you simply avoid to get points elsewhere. (5 hours on 12 Qs is tight…)
So when you are competing against a data center using a nuclear reactor vs 3 humans running on broccoli, the claims of superhuman performance definitely require an * attached to them.