[-] Architeuthis@awful.systems 21 points 3 months ago

Oh no, you must have missed the surprise incelism, let me fix that:

And as the world learned a decade ago, I was able to date, get married, and have a family, only because I finally rejected what I took to be the socially obligatory attitude for male STEM nerds like me—namely, that my heterosexuality was inherently gross, creepy, and problematic, and that I had a moral obligation never to express romantic interest to women.

[-] Architeuthis@awful.systems 21 points 5 months ago

Claude's system prompt had leaked at one point, it was a whopping 15K words and there was a directive that if it were asked a math question that you can't do in your brain or some very similar language it should forward it to the calculator module.

Just tried it, Sonnet 4 got even less digits right 425,808 × 547,958 = 233,325,693,264 (correct is 233.324.900.064)

I'd love to see benchmarks on exactly how bad at numbers LLMs are, since I'm assuming there's very little useful syntactic information you can encode in a word embedding that corresponds to a number. I know RAG was notoriously bad at matching facts with their proper year for instance, and using an LLM as a shopping assistant (ChatGTP what's the best 2k monitor for less than $500 made after 2020) is an incredibly obvious use case that the CEOs that love to claim so and so profession will be done as a human endeavor by next Tuesday after lunch won't even allude to.

[-] Architeuthis@awful.systems 20 points 8 months ago* (last edited 8 months ago)

Today in relevant skeets:

::: spoiler transcript Skeet: If you can clock who this is meant to be instantly you are on the computer the perfect amount. You’re doing fine don’t even worry about it.

Quoted skeet: 'Why are high fertility people always so weird?' A weekend with the pronatalists

Image: Egghead Jr. and Miss Prissy from Looney Tunes Foghorn Leghorn shorts.

[-] Architeuthis@awful.systems 21 points 1 year ago* (last edited 1 year ago)

On each step, one part of the model applies reinforcement learning, with the other one (the model outputting stuff) “rewarded” or “punished” based on the perceived correctness of their progress (the steps in its “reasoning”), and altering its strategies when punished. This is different to how other Large Language Models work in the sense that the model is generating outputs then looking back at them, then ignoring or approving “good” steps to get to an answer, rather than just generating one and saying “here ya go.”

Every time I've read how chain-of-thought works in o1 it's been completely different, and I'm still not sure I understand what's supposed to be going on. Apparently you get a strike notice if you try too hard to find out how the chain-of-thinking process goes, so one might be tempted to assume it's something that's readily replicable by the competition (and they need to prevent that as long as they can) instead of any sort of notably important breakthrough.

From the detailed o1 system card pdf linked in the article:

According to these evaluations, o1-preview hallucinates less frequently than GPT-4o, and o1-mini hallucinates less frequently than GPT-4o-mini. However, we have received anecdotal feedback that o1-preview and o1-mini tend to hallucinate more than GPT-4o and GPT-4o-mini. More work is needed to understand hallucinations holistically, particularly in domains not covered by our evaluations (e.g., chemistry). Additionally, red teamers have noted that o1-preview is more convincing in certain domains than GPT-4o given that it generates more detailed answers. This potentially increases the risk of people trusting and relying more on hallucinated generation.

Ballsy to just admit your hallucination benchmarks might be worthless.

The newsletter also mentions that the price for output tokens has quadrupled compared to the previous newest model, but the awesome part is, remember all that behind-the-scenes self-prompting that's going on while it arrives to an answer? Even though you're not allowed to see them, according to Ed Zitron you sure as hell are paying for them (i.e. they spend output tokens) which is hilarious if true.

[-] Architeuthis@awful.systems 20 points 1 year ago

"When asked about buggy AI [code], a common refrain is ‘it is not my code,’ meaning they feel less accountable because they didn’t write it.”

Strong they cut all my deadlines in half and gave me an OpenAI API key, so fuck it energy.

He stressed that this is not from want of care on the developer’s part but rather a lack of interest in “copy-editing code” on top of quality control processes being unprepared for the speed of AI adoption.

You don't say.

[-] Architeuthis@awful.systems 21 points 1 year ago* (last edited 1 year ago)

Former Oath Keeper police chief says best he can do is keep fining them $500 for noise pollution as often as possible, supposedly there's no legal way to force stop the source of the noise complaint, and Texas counties can't pass their own ordinances, only cities can. It also says someone is exploring if they can get the installation declared a public nuisance or something along those lines to open more legal avenues.

I feel that once old people start dying of stress and children are getting sleep deprivation torture while bleeding from their ears, more drastic options should have been on the table down at militia central, but I guess they have other priorities and/or know which side their bread is buttered.

[-] Architeuthis@awful.systems 21 points 2 years ago* (last edited 2 years ago)

Great quote from the article on why prediction markets and scientific racism currently appear to be at one degree of separation:

Daniel HoSang, a professor of American studies at Yale University and a part of the Anti-Eugenics Collective at Yale, said: “The ties between a sector of Silicon Valley investors, effective altruism and a kind of neo-eugenics are subtle but unmistakable. They converge around a belief that nearly everything in society can be reduced to markets and all people can be regarded as bundles of human capital.

[-] Architeuthis@awful.systems 21 points 2 years ago* (last edited 2 years ago)

Nightmare blunt rotation in the Rewind AI front page recommendations:

Recommended by Andreessen, Altman and Reddit founder

Also it appears to be different than Recall in that it's a third party app and not pushed as the default in every new OS installation.

[-] Architeuthis@awful.systems 21 points 2 years ago

To be clear, it's because he played Edward Snowden in a movie. That's the conspiracy.

[-] Architeuthis@awful.systems 21 points 2 years ago* (last edited 2 years ago)

On one hand it's encouraging that the comments are mostly pushing back.

On the other hand a lot of them do so on the basis of a disagreement over the moral calculus of how many chickens a first trimester fetus should be worth, and whether that makes pushing for abortion bans inefficient compared to efforts to reduce the killing of farm animals for food.

Which, while pants-on-head bizarre in any other context, seems fairly normal by EA standards.

[-] Architeuthis@awful.systems 20 points 2 years ago* (last edited 2 years ago)

This reads very, uh, addled. I guess collapsing the wavefunction means agreeing on stuff? And the uncanny valley is when the vibes are off because people are at each others throats? Is 'being aligned' like having attained spiritual enlightenment by way of Adderall?

Apparently the context is that he wanted the investment firms under ftx (Alameda and Modulo) to completely coordinate, despite being run by different ex girlfriends at the time (most normal EA workplace), which I guess paints Elis' comment about Chinese harem rules of dating in a new light.

edit: i think the 'being aligned' thing is them invoking the 'great minds think alike' adage as absolute truth, i.e. since we both have the High IQ feat you should be agreeing with me, after all we share the same privileged access to absolute truth. That we aren't must mean you are unaligned/need to be further cleansed of thetans.

view more: ‹ prev next ›

Architeuthis

joined 2 years ago