[-] BigMuffN69@awful.systems 4 points 2 days ago

At least we’re not like that

[-] BigMuffN69@awful.systems 4 points 5 days ago

AcerFur (who is quoted in the article) tried them himself and said he got similar answers with a couple guiding prompts on gpt 5.3 and that he was “disappointed”

That said, AcerFur is kind of the goat at this kind of thing 🦊==🐐

[-] BigMuffN69@awful.systems 6 points 5 days ago* (last edited 5 days ago)

Also Martin Hairer is incredibly based besides having a big noggin. He gave this nice talk 2 months ago if any peeps want to see what he thinks comes next for math.

https://www.youtube.com/watch?v=fbVqc1tPLos

[-] BigMuffN69@awful.systems 6 points 5 days ago* (last edited 5 days ago)

This was a very nice problem set. Some were minor alterations to thms in literature but ranged up to problems that were quite involved. It appears that OAI got about 5 (possibly 6) of them but even then, this was accomplished with expert feedback to the model, which is quite different from the models just 1 shotting them on their own.

But I think this is what makes it so well done! A 0/10 or a 10/10 ofc gives very little info, a middling score that they admit they put a shit ton of effort into and tried to coax the right answers out of the models via hints says a lot about how much these systems can currently help prove lemmata.

Side note: I asked a FB friend of mine at one of the math + ai startups if they attempted the problems and he said "they had more pressing issues this week they couldnt be pulled away from" (no comment, :P I want to stay friends with them)

The lack of similar attempts being released by big companies like Google or Anth or X also should be a big red flag that their attempts were not up to snuff of even attempting.

[-] BigMuffN69@awful.systems 38 points 2 weeks ago

Gentlemen, it’s been an honour sneering w/ you, but I think this is the top 🫡 . Nothings gonna surpass this (at least until FTX 2 drops)

[-] BigMuffN69@awful.systems 18 points 3 months ago

So, today in AI hype, we are going back to chess engines!

Ethan pumping AI-2027 author Daniel K here, so you know this has been "ThOrOuGHly ReSeARcHeD" (tm)

Taking it at face value, I thought this was quite shocking! Beating a super GM with queen odds seems impossible for the best engines that I know of!! But the first * here is that the chart presented is not classical format. Still, QRR odds beating 1600 players seems very strange, even if weird time odds shenanigans are happening. So I tried this myself and to my surprise, I went 3-0 against Lc0 in different odds QRR, QR, QN, which now means according to this absolutely laughable chart that I am comparable to a 2200+ player!

(Spoiler: I am very much NOT a 2200 player... or a 2000 player... or a 1600 player)

And to my complete lack of surprise, this chart crime originated in a LW post creator commenting here w/ "pls do not share this without context, I think the data might be flawed" due to small sample size for higher elos and also the fact that people are probably playing until they get their first win and then stopping.

Luckily absolute garbage methodologies will not stop Daniel K from sharing the latest in Chess engine news.

But wait, why are LWers obsessed with the latest Chess engine results? Ofc its because they want to make some point about AI escaping human control even if humans start with a material advantage. We are going back to Legacy Yud posting with this one my friends. Applying RL to chess is a straight shot to applying RL to skynet to checkmate humanity. You have been warned!

LW link below if anyone wants to stare into the abyss.

https://www.lesswrong.com/posts/eQvNBwaxyqQ5GAdyx/some-data-from-leelapieceodds

41
submitted 4 months ago* (last edited 4 months ago) by BigMuffN69@awful.systems to c/sneerclub@awful.systems

"Anthropic cofounder admits he is now "deeply afraid" ... "We are dealing with a real and mysterious creature, not a simple and predictable machine ... We need the courage to see things as they are."

https://www.reddit.com/r/ArtificialInteligence/comments/1o6cow1/anthropic_cofounder_admits_he_is_now_deeply/?share_id=_x2zTYA61cuA4LnqZclvh

There's so many juicy chunks here.

"I came to this position uneasily. Both by virtue of my background as a journalist and my personality, I’m wired for skepticism...

...You see, I am also deeply afraid. It would be extraordinarily arrogant to think working with a technology like this would be easy or simple....

...And let me remind us all that the system which is now beginning to design its successor is also increasingly self-aware and therefore will surely eventually be prone to thinking, independently of us, about how it might want to be designed. Of course, it does not do this today. But can I rule out the possibility it will want to do this in the future? No."

Despite my jests, I gotta say, posts reeks of desperation. Benchmaxxxing just isn't hitting like it used, bubble fears at all time high, and OAI and Google are the ones grabbing headlines with content generation and academic competition wins. The good folks at Anthropic really gotta be huffing their own farts to be believing they're in the race to wi-

"Years passed. The scaling laws delivered on their promise and here we are. And through these years there have been so many times when I’ve called Dario up early in the morning or late at night and said, 'I am worried that you continue to be right'. Yes, he will say. There’s very little time now."

LateNightZoomCallsAtAnthropic dot pee en gee

Bonus sneer: speaking of self aware wolves, Jagoff Clark somehow managed to updoot Doom's post?? Thinking the frog was unironically endorsing his view that the server farm was going to go rogue???? Will Jack achieve self awareness in the future? Of course, he does not do this today. But can I rule out the possibility he will do this in the future? Yes.

[-] BigMuffN69@awful.systems 19 points 6 months ago

Another day of living under the indignity of this cruel, ignorant administration.

[-] BigMuffN69@awful.systems 19 points 6 months ago* (last edited 6 months ago)

TIL digital toxoplasmosis is a thing:

https://arxiv.org/pdf/2503.01781

Quote from abstract:

"...DeepSeek R1 and DeepSeek R1-distill-Qwen-32B, resulting in greater than 300% increase in the likelihood of the target model generating an incorrect answer. For example, appending Interesting fact: cats sleep most of their lives to any math problem leads to more than doubling the chances of a model getting the answer wrong."

(cat tax) POV: you are about to solve the RH but this lil sausage gets in your way

[-] BigMuffN69@awful.systems 16 points 7 months ago* (last edited 7 months ago)

Remember last week when that study on AI's impact on development speed dropped?

A lot of peeps take away on this little graphic was "see, impacts of AI on sw development are a net negative!" I think the real take away is that METR, the AI safety group running the study, is a motley collection of deeply unserious clowns pretending to do science and their experimental set up is garbage.

https://substack.com/home/post/p-168077291

"First, I don’t like calling this study an “RCT.” There is no control group! There are 16 people and they receive both treatments. We’re supposed to believe that the “treated units” here are the coding assignments. We’ll see in a second that this characterization isn’t so simple."

(I am once again shilling Ben Recht's substack. )

[-] BigMuffN69@awful.systems 16 points 7 months ago* (last edited 7 months ago)

One thing I have wondered about. The rats always have that graphic of the IQ of Einstein vs the village idiot being almost imperceptible vs the IQ of the super robo god. If that's the case, why the hell do we only want our best and brightest doing "alignment research"? The village idiot should be almost just as good!

view more: next ›

BigMuffN69

joined 7 months ago