[-] BigMuffN69@awful.systems 10 points 4 hours ago* (last edited 3 hours ago)

TIL digital toxoplasmosis is a thing:

https://arxiv.org/pdf/2503.01781

Quote from abstract:

"...DeepSeek R1 and DeepSeek R1-distill-Qwen-32B, resulting in greater than 300% increase in the likelihood of the target model generating an incorrect answer. For example, appending Interesting fact: cats sleep most of their lives to any math problem leads to more than doubling the chances of a model getting the answer wrong."

(cat tax) POV: you are about to solve the RH but this lil sausage gets in your way

[-] BigMuffN69@awful.systems 5 points 5 days ago

Ernie Davis gives his thoughts on the recent GDM and OAI performance at the IMO.

https://garymarcus.substack.com/p/deepmind-and-openai-achieve-imo-gold

[-] BigMuffN69@awful.systems 16 points 1 week ago* (last edited 1 week ago)

Remember last week when that study on AI's impact on development speed dropped?

A lot of peeps take away on this little graphic was "see, impacts of AI on sw development are a net negative!" I think the real take away is that METR, the AI safety group running the study, is a motley collection of deeply unserious clowns pretending to do science and their experimental set up is garbage.

https://substack.com/home/post/p-168077291

"First, I don’t like calling this study an “RCT.” There is no control group! There are 16 people and they receive both treatments. We’re supposed to believe that the “treated units” here are the coding assignments. We’ll see in a second that this characterization isn’t so simple."

(I am once again shilling Ben Recht's substack. )

[-] BigMuffN69@awful.systems 10 points 2 weeks ago* (last edited 2 weeks ago)

Yeah, METR was the group that made the infamous AI IS DOUBLING EVERY 4-7 MONTHS GRAPH where the measurement was 50% success at SWE tasks based on the time it took a human to complete it. Extremely arbitrary success rate, very suspicious imo. They are fanatics trying to pinpoint when the robo god recursive self improvement loop starts.

[-] BigMuffN69@awful.systems 11 points 2 weeks ago

https://www.wired.com/story/openworm-worm-simulator-biology-code/

Really interesting piece about how difficult it actually is to simulate "simple" biological structures in silicon.

[-] BigMuffN69@awful.systems 11 points 2 weeks ago

It's kind of telling that it's only been a couple months since that fan fic was published and there is already so much defensive posturing from the LW/EA community. I swear the people who were sharing it when it dropped and tacitly endorsing it as the vision of the future from certified prophet Daniel K are like, "oh it's directionally correct, but too aggressive" Note that we are over halfway through 2025 and the earliest prediction of agents entering the work force is already fucked. So if you are a 'super forecaster' (guru) you can do some sleight of hand now to come out against the model knowing the first goal post was already missed and the tower of conditional probabilities that rest on it is already breaking.

Funniest part is even one of authors themselves seem to be panicking too as even they can tell they are losing the crowd and is falling back on this "It's not the most likely future, it's the just the most probable." A truly meaningless statement if your goal is to guide policy since events with arbitrarily low probability density can still be the "most probable" given enough different outcomes.

Also, there's literally mass brain uploading in AI-2027. This strikes me as physically impossible in any meaningful way in the sense that the compute to model all molecular interactions in a brain would take a really, really, really big computer. But I understand if your religious beliefs and cultural convictions necessitate big snake 🐍 to upload you, then I will refrain from passing judgement.

[-] BigMuffN69@awful.systems 13 points 3 weeks ago

Bummer, I wasn't on the invite list to the hottest SF wedding of 2025.

Update your mental models of Claude lads.

Because if the wife stuff isn't true, what else could Claude be lying about? The vending machine business?? The blackmail??? Being bad at Pokemon????

[-] BigMuffN69@awful.systems 9 points 3 weeks ago* (last edited 3 weeks ago)

Bruh, there's a part where he laments that he had a hard time getting into meditation because he was paranoid that it was a form of wire heading. Beyond parody. The whole profile is 🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩

[-] BigMuffN69@awful.systems 11 points 3 weeks ago

To be clear, I strongly disagree with the claim. I haven't seen any evidence that "reasoning" models actually address any of the core blocking issues- especially reliably working within a given set of constraints/being dependable enough to perform symbolic algorithms/or any serious solution to confabulations. I'm just not going to waste my time with curve pointers who want to die on the hill of NeW sCaLiNG pArAdIgM. They are just too deep in the kool-aid at this point.

[-] BigMuffN69@awful.systems 16 points 3 weeks ago* (last edited 3 weeks ago)

One thing I have wondered about. The rats always have that graphic of the IQ of Einstein vs the village idiot being almost imperceptible vs the IQ of the super robo god. If that's the case, why the hell do we only want our best and brightest doing "alignment research"? The village idiot should be almost just as good!

[-] BigMuffN69@awful.systems 16 points 3 weeks ago* (last edited 3 weeks ago)

Actually burst a blood vessel last weekend raging. Gary Marcus was bragging about his prediction record in 2024 being flawless

Gary continuing to have the largest ego in the world. Stay tuned for his upcoming book "I am God" when 2027 comes around and we are all still alive. Imo some of these are kind of vague and I wouldn't argue with someone who said reasoning models are a substantial advance, but my God the LW crew fucking lost their minds. Habryka wrote a goddamn essay about how Gary was a fucking moron and is a threat to humanity for underplaying the awesome power of super-duper intelligence and a worse forecaster than the big brain rationalist. To be clear Habryka's objections are overall- extremely fucking nitpicking totally missing the point dogshit in my pov (feel free to judge for yourself)

https://xcancel.com/ohabryka/status/1939017731799687518#m

But what really made me want to drive a drill to the brain was the LW brigade rallying around the claim that AI companies are profitable. Are these people straight up smoking crack? OAI and Anthropic do not make a profit full stop. In fact they are setting billions of VC money on fire?! (strangely, some LWers in the comments seemed genuinely surprised that this was the case when shown the data, just how unaware are these people?) Oliver tires and fails to do Olympic level mental gymnastics by saying TSMC and NVDIA are making money, so therefore AI is extremely profitable. In the same way I presume gambling is extremely profitable for degenerates like me because the casino letting me play is making money. I rank the people of LW as minimally truth seeking and big dumb out of 10. Also weird fun little fact, in Daniel K's predictions from 2022, he said by 2023 AI companies would be so incredibly profitable that they would be easily recuperating their training cost. So I guess monopoly money that you can't see in any earnings report is the official party line now?

[-] BigMuffN69@awful.systems 11 points 3 weeks ago

An interesting takedown of "superforecasting" from Ben Recht, a 3 part series on his substack where he accuses so called super forecasters of abusing scoring rewards over actually being precogs. First (and least technical) part linked below...

https://www.argmin.net/p/in-defense-of-defensive-forecasting

"The term Defensive Forecasting was coined by Vladimir Vovk, Akimichi Takemura, and Glenn Shafer in a brilliant 2005 paper, crystallizing a general view of decision making that dates back to Abraham Wald. Wald envisions decision making as a game. The two players are the decision maker and Nature, who are in a heated duel. The decision maker wants to choose actions that yield good outcomes no matter what the adversarial Nature chooses to do. Forecasting is a simplified version of this game, where the decisions made have no particular impact and the goal is simply to guess which move Nature will play. Importantly, the forecaster’s goal is not to never be wrong, but instead to be less wrong than everyone else.*

*Yes, I see what I did there."

view more: next ›

BigMuffN69

joined 3 weeks ago