Want to wade into the sandy surf of the abyss? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid.
Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.
Any awful.systems sub may be subsneered in this subthread, techtakes or no.
If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.
The post Xitter web has spawned so many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.
(Credit and/or blame to David Gerard for starting this.)
In the latest episode of "behold the power of Mythos" from The Hacker News - Claude Mythos AI Finds 10,000 High-Severity Flaws in Widely Used Software
I distilled it so you don't have to.
That 10,000 count didn't even survive until paragraph 3.
Ah fuck. 1726. But wait, a bad infographic has entered the ring!
Ok now we're talking.
Wait, what? Why those? Why only those?
You couldn't even cherry pick the valid ones?
Where did the other 1259 go? Maybe this other part of the flowchart will go better...
Most of them just spammed at open source maintainers. Right. Maybe Anthropic's media release has the goods!
Slightly lower than the 1900, but ok, whatever.
1587 is lower than the infographic's 1726 confirmed positives.... But 10% of 10000 high sev is still something, right?
I'm sure those maintainers enjoyed that 16% high+ sec rate based on Mythos' own estimations. But wasn't that 1129 the bulk of your reports?
530 is only a third of the reports you made to maintainers...
The infographic says 88.
I'd ask if they were massaging their financials like they massaged 65 advisories, but we know they are.
23,019 potential vulnerability candidates of all severities, 65 advisories. If you printed the code out and drunkenly threw darts at it you'd probably hit the same level of accuracy.
1 cve, 100 things that might have mattered.
2 orders of magnitude false positives doesn't sound like an efficient use of labour for finding vulnerabilities but that's just me.
So what's the over/under on the discrepancies between the numbers that the HN folks got and the official press release numbers being in part due to some kind of hallucinatron hijinks? Because I'm gonna go ahead and predict with confidence that either the HN post was written with a faulty slopbot and they didn't check it or else the presser itself went through the matrix-multiplication-meaning-mangler. Possibly both and all those numbers are similar levels of "more or less right, we swear"
All that it tells me is that if you spent the same amount of resources on just fuzzing randomly picked OSS codebases you'd probably get better value for your buck.
I’ve seen a handful of security people claim different kinds of yields with some of this shit. I haven’t gone to read up in depth but I wouldn’t be too surprised a lot of them run around with unstated assumptions/provisos in their thonkposts (this shit is expensive (for research volume) and only some people can afford the science experiments)
Got a list of a couple of names I’m keeping an eye on as the first tokenprice-pocalypse (that needs a better word) takes place
Vibenarok
Eschatoken
ooh, brava!
Perfection
Anthropic (who own Claude Code) are hoping to IPO this year.
it continues to be amazing to me that this is the “high impact” area they’re going with: even if their analysis systems are better (and frankly I still don’t buy this wholesale, there’s a whole rest of the owl being handwaved[0]), bug-elimination is by definition diminishing returns so you can only fanfare like this the first time
[0] - having fucking gigantic budgets to throw at running a parse of every single repo and every test condition/simulation you wish to certainly does help a hell of a lot, even moreso when you can shell out to a half-dozen second stage review corps…
I honestly can't think of anywhere else they can go with it. They need:
Code security review is probably the only way you can realistically achieve all four. But they're not even coming close. Not even with access to "partner" black box repositories coupled with under-resourced open source packages.
And they know they're not succeeding, because they wouldn't bury that 530 high+ sev number deep in the middle of the press release if they thought it were impressive.
Luckily for them, the slop "news" blogs will parrot numbers like 10k, and their only strength - model collapse as a marketing strategy - can handwave the rest of that owl.