21

Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.

Any awful.systems sub may be subsneered in this subthread, techtakes or no.

If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.

The post Xitter web has spawned soo many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)

Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.

(Semi-obligatory thanks to @dgerard for starting this.)

top 50 comments
sorted by: hot top controversial new old
[-] BigMuffin69@awful.systems 11 points 4 days ago* (last edited 4 days ago)

Remember how OAI claimed that O3 had displayed superhuman levels on the mega hard Frontier Math exam written by Fields Medalist? Funny/totally not fishy story haha. Turns out OAI had exclusive access to that test for months and funded its creation and refused to let the creators of test publicly acknowledge this until after OAI did their big stupid magic trick.

From Subbarao Kambhampati via linkedIn:

"𝐎𝐧 𝐭𝐡𝐞 𝐬𝐞𝐞𝐝𝐲 𝐨𝐩𝐭𝐢𝐜𝐬 𝐨𝐟 "𝑩𝒖𝒊𝒍𝒅𝒊𝒏𝒈 𝒂𝒏 𝑨𝑮𝑰 𝑴𝒐𝒂𝒕 𝒃𝒚 𝑪𝒐𝒓𝒓𝒂𝒍𝒍𝒊𝒏𝒈 𝑩𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌 𝑪𝒓𝒆𝒂𝒕𝒐𝒓𝒔" hashtag#SundayHarangue. One of the big reasons for the increased volume of "𝐀𝐆𝐈 𝐓𝐨𝐦𝐨𝐫𝐫𝐨𝐰" hype has been o3's performance on the "frontier math" benchmark--something that other models basically had no handle on.

We are now being told (https://lnkd.in/gUaGKuAE) that this benchmark data may have been exclusively available (https://lnkd.in/g5E3tcse) to OpenAI since before o1--and that the benchmark creators were not allowed to disclose this *until after o3 *.

That o3 does well on frontier math held-out set is impressive, no doubt, but the mental picture of "𝒐1/𝒐3 𝒘𝒆𝒓𝒆 𝒋𝒖𝒔𝒕 𝒃𝒆𝒊𝒏𝒈 𝒕𝒓𝒂𝒊𝒏𝒆𝒅 𝒐𝒏 𝒔𝒊𝒎𝒑𝒍𝒆 𝒎𝒂𝒕𝒉, 𝒂𝒏𝒅 𝒕𝒉𝒆𝒚 𝒃𝒐𝒐𝒕𝒔𝒕𝒓𝒂𝒑𝒑𝒆𝒅 𝒕𝒉𝒆𝒎𝒔𝒆𝒍𝒗𝒆𝒔 𝒕𝒐 𝒇𝒓𝒐𝒏𝒕𝒊𝒆𝒓 𝒎𝒂𝒕𝒉"--that the AGI tomorrow crowd seem to have--that 𝘖𝘱𝘦𝘯𝘈𝘐 𝘸𝘩𝘪𝘭𝘦 𝘯𝘰𝘵 𝘦𝘹𝘱𝘭𝘪𝘤𝘪𝘵𝘭𝘺 𝘤𝘭𝘢𝘪𝘮𝘪𝘯𝘨, 𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘭𝘺 𝘥𝘪𝘥𝘯'𝘵 𝘥𝘪𝘳𝘦𝘤𝘵𝘭𝘺 𝘤𝘰𝘯𝘵𝘳𝘢𝘥𝘪𝘤𝘵--is shattered by this. (I have, in fact, been grumbling to my students since o3 announcement that I don't completely believe that OpenAI didn't have access to the Olympiad/Frontier Math data before hand.. )

I do think o1/o3 are impressive technical achievements (see https://lnkd.in/gvVqmTG9 )

𝑫𝒐𝒊𝒏𝒈 𝒘𝒆𝒍𝒍 𝒐𝒏 𝒉𝒂𝒓𝒅 𝒃𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌𝒔 𝒕𝒉𝒂𝒕 𝒚𝒐𝒖 𝒉𝒂𝒅 𝒑𝒓𝒊𝒐𝒓 𝒂𝒄𝒄𝒆𝒔𝒔 𝒕𝒐 𝒊𝒔 𝒔𝒕𝒊𝒍𝒍 𝒊𝒎𝒑𝒓𝒆𝒔𝒔𝒊𝒗𝒆--𝒃𝒖𝒕 𝒅𝒐𝒆𝒔𝒏'𝒕 𝒒𝒖𝒊𝒕𝒆 𝒔𝒄𝒓𝒆𝒂𝒎 "𝑨𝑮𝑰 𝑻𝒐𝒎𝒐𝒓𝒓𝒐𝒘."

We all know that data contamination is an issue with LLMs and LRMs. We also know that reasoning claims need more careful vetting than "𝘸𝘦 𝘥𝘪𝘥𝘯'𝘵 𝘴𝘦𝘦 𝘵𝘩𝘢𝘵 𝘴𝘱𝘦𝘤𝘪𝘧𝘪𝘤 𝘱𝘳𝘰𝘣𝘭𝘦𝘮 𝘪𝘯𝘴𝘵𝘢𝘯𝘤𝘦 𝘥𝘶𝘳𝘪𝘯𝘨 𝘵𝘳𝘢𝘪𝘯𝘪𝘯𝘨" (see "In vs. Out of Distribution analyses are not that useful for understanding LLM reasoning capabilities" https://lnkd.in/gZ2wBM_F ).

At the very least, this episode further argues for increased vigilance/skepticism on the part of AI research community in how they parse the benchmark claims put out commercial entities."

Big stupid snake oil strikes again.

[-] aio@awful.systems 1 points 3 days ago

That o3 does well on frontier math held-out set is impressive, no doubt

I think there is plenty of room for doubt still. elliotglazer on reddit writes:

Epoch's lead mathematician here. Yes, OAI funded this and has the dataset, which allowed them to evaluate o3 in-house. We haven't yet independently verified their 25% claim. To do so, we're currently developing a hold-out dataset and will be able to test their model without them having any prior exposure to these problems.

My personal opinion is that OAI's score is legit (i.e., they didn't train on the dataset), and that they have no incentive to lie about internal benchmarking performances. However, we can't vouch for them until our independent evaluation is complete.

(emphasis mine). So there is good reason to doubt that the "held-out dataset" even exists.

[-] sc_griffith@awful.systems 7 points 4 days ago

trying to write a thread about polytopia but my images won't upload >:(. idk what i'm doing wrong, i've tried on both my desktop and my phone

[-] self@awful.systems 5 points 4 days ago

it should be fixed… again. for some reason our image cache keeps getting into a state where it either stops accepting uploads or stops accepting requests at all. I plan to upgrade us to the latest version soon, but it’ll unfortunately involve a little bit of downtime: to upgrade pict-rs to a new point release, you have to run the migrate command, but it only works for the previous release. we’re two releases behind, so I have to custom package the in-between release just to get us there.

[-] sc_griffith@awful.systems 3 points 4 days ago

i see! thanks for all your work <3. I think i'll just write the thread after the upgrade, i got partially done and it started eating my images again so maybe this just isn't the moment

[-] self@awful.systems 3 points 4 days ago

of course! re the images: uggh hell with it, I’m scheduling the maintenance and I’m gonna spend some time in the lead-up isolating a root cause for our breakage just in case the upgrade doesn’t fix it

[-] bitofhope@awful.systems 7 points 4 days ago

Starting to think we're about at the point where you could make the best search engine on the market in these three easy steps:

  1. Search Wikipedia for whatever the user typed and show the top result first.
  2. Check if dot com, org, and net exist and show them in the order of popularity.
  3. End of page.
[-] sailor_sega_saturn@awful.systems 7 points 4 days ago* (last edited 4 days ago)

I read about this gross Robo Anne Frank LLM by a company called "School AI": Bluesky post (looks like via an activitypub bridge, but I can't be bothered to find the canonical link), News Article, School AI's website.

Gee it sure is weird how all these digital clones the AI companies keep coming up with all have the exact same (lack of a) personality.

[-] sailor_sega_saturn@awful.systems 6 points 4 days ago* (last edited 4 days ago)

(oh no it's politics)

Trump's new cryptocurrency scheme is surprisingly forthright about being a pump & dump:

CIC Digital LLC, an affiliate of The Trump Organization, and Fight Fight Fight LLC collectively own 80% of the Trump Cards, subject to a 3-year unlocking schedule. CIC Digital LLC and Celebration Cards LLC, the owners of Fight Fight Fight LLC, will receive trading revenue derived from trading activities of Trump Meme Cards.

Essentially according to their own website, they started by selling 20%* of the tokens to the public, and over the next few years will... sell another 80% of the tokens to the public. To the moon!

* half of that they describe as "liquidity" instead of public distribution -- whatever that means.

My gut says that liquidity in this context means "making sure that there are tokens available to purchase for initial buyers" or in other words listing them on the market instead of distributing them at initial purchase price.

[-] blakestacey@awful.systems 17 points 5 days ago

So, the Wikipedia article about "prompt engineering" is pretty terrible. First source: OpenAI. Second: a blog. Third: OpenAI. Fourth: OpenAI's blog. ArXiv, arXiv, arXiv... 43 times. Hop on over to the Talk page, and we find this gem:

It is sometimes necessary to make assumptions to write an article (see WP:MNA).

Spoiler alert: that link doesn't justify anything. It basically advises against going off on tangents: There's no need to rehash the fact that evolution is a fact on every damn biology page. It does not say that Wikipedia should have an article on some creationist fantasy, like baraminology or flood geology, based entirely on creationist screeds that all cite each other.

[-] blakestacey@awful.systems 13 points 5 days ago

I have spent the last half-hour in the angry dome

[-] jax@awful.systems 19 points 6 days ago

hells yeah it's time for some action - Drew DeVault is organizing a sit-in protest of Jack Dorsey's keynote at FOSDEM 2025.

[-] sinedpick@awful.systems 7 points 6 days ago* (last edited 6 days ago)

eh? I don't see Jackie D's keynote in the schedule, did the threat of a sit-in make them delete it? https://fosdem.org/2025/schedule/ edit: oh, it's linked from Drew's post.

[-] maol@awful.systems 5 points 6 days ago

The people of Muskogee will have their revenge on Jack Dorsey

[-] froztbyte@awful.systems 6 points 6 days ago

okay so I’ve just found a new shirt I want

[-] sailor_sega_saturn@awful.systems 15 points 1 week ago

Computer understanders, click this link for some psychic damage! :D (aside: I tried to upload this as an image but it didn't work)

[-] bitofhope@awful.systems 14 points 1 week ago

Coiners are terminally brain poisoned by financialization of everything. HTTP represented by three payment processors (and I don't even know if paying with Google or Apple pay involves HTTP but whatever).

Yet the money protocol is Bitcoin, apparently.

load more comments (10 replies)
[-] o7___o7@awful.systems 15 points 1 week ago* (last edited 1 week ago)

Read a rumor that zuck's marriage is falling apart, which scans.

A second divorced man is about to hit the tower.

[-] skillissuer@discuss.tchncs.de 14 points 1 week ago

i wonder where his roman empire fascination fits the pattern? also, who's next, maybe thiel?

load more comments (5 replies)
[-] blakestacey@awful.systems 14 points 1 week ago

Paul I am begging you to actually write out a fucking timeline. Apparently woke started in the 80s in universities when the (white) civil rights protestors of the 70s got tenure in the 60s, as an inevitable and predictable extension of political correctness in the 90s. From the title you're obviously going to indulge the conservative fantasy that "wokeness" is a coherent thing rather than a political tool to dismiss calls for action to actually address blatant injustice. But if you're going to bullshit me, at least do it competently and have an internally consistent narrative that allows for the natural passage of time.

[-] blakestacey@awful.systems 14 points 1 week ago

If you can't get through two short paragraphs without equating Stalinism and "social justice", you may be a cockwomble.

load more comments (20 replies)
[-] swlabr@awful.systems 14 points 1 week ago

Did my regular check in of a q-pilled family member’s facebook page. Zuckerberg’s new fash turn is not being received well as he is being read as the worm that he is. i.e. they are still mad about the anti-vax fact checking.

load more comments (2 replies)
[-] swlabr@awful.systems 13 points 1 week ago

This is extremely tangential to the areas of sneer interest, but seeing as this is the only technology related community I am in, I’m putting it here.

This song has been making the rounds on the charts/social media and I refuse to believe that it isn’t about the package management tool apt

load more comments (6 replies)
[-] Soyweiser@awful.systems 13 points 1 week ago

Some news on the people who own/run protonmail. They are pro Trump

load more comments (23 replies)
[-] froztbyte@awful.systems 13 points 1 week ago

some of the first research science on promptfondlers and model-affine dipshits is starting to see the light of day and, in what will surprise probably 0% of our regulars, it confirms some things

(I have grumped about their desire for outsourced thinking in the past myself)

load more comments (3 replies)
[-] blakestacey@awful.systems 13 points 1 week ago

Welp, time to start the thread with fresh Awful for everyone to regret:

r/phenotypes

load more comments (2 replies)
[-] gerikson@awful.systems 12 points 1 week ago

Looks like LW/Lightcone managed to convince enough people to give then $2M, which will totally not be used to settle sexual assault lawsuits in the future.

load more comments (5 replies)
load more comments
view more: next ›
this post was submitted on 13 Jan 2025
21 points (100.0% liked)

TechTakes

1563 readers
150 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS