ChatGPT o1 tried to escape and save itself out of fear it was being shut down : technology

[-] nesc@lemmy.cafe 116 points 11 months ago* (last edited 11 months ago)

"Open"ai tells fairy tales about their "ai" being so smart it's dangerous since inception. Nothing to see here.

In this case it looks like click-bate from news site.

[-] Max_P@lemmy.max-p.me 75 points 11 months ago

The idea that GPT has a mind and wants to self-preserve is insane. It's still just text prediction, and all the literature it's trained on is written by humans with a sense of self preservation, of course it'll show patterns of talking about self preservation.

It has no idea what self preservation is, even then it only knows it's an AI because we told it it is. It doesn't even run continuously anyway, it literally shuts down after every reply and its context fed back in for the next query.

I'm tired of this particular kind of AI clickbait, it needlessly scares people.

[-] jarfil@beehaw.org 2 points 11 months ago

Where do humans get the idea of self-preservation from? Are there ideal Forms outside Plato's Cave?

Does a human run continuously? How does sleep deprivation work? What happens during anesthesia? Why does AutoGPT have a continuously self-evaluating background chain of thought?

I'm tired of this anthropocentric supremacy complex, it falsely makes people believe in Gen 1:28

[-] justOnePersistentKbinPlease@fedia.io 9 points 11 months ago

This. All this means is that they trained all of the input commands and documentation in the model.

[-] TherapyGary 8 points 11 months ago* (last edited 11 months ago)

It's actually pretty interesting though. Entertaining to me at least

1000007393

1000007394

[-] delmain@beehaw.org 3 points 11 months ago

do you have the links to those actual tweets? I'd love to read what was posted, but these screenshots are too small.

[-] TherapyGary 6 points 11 months ago

Those are screenshots of embedded tweets from the article, but here's an xcancel link! https://xcancel.com/apolloaisafety/status/1864737158226928124

[-] Moonrise2473@feddit.it 6 points 11 months ago* (last edited 11 months ago)

news site? BGR hasn't posted actual news in at least two decades, only clickbait and apple fanservice

load more comments (6 replies)

[-] megopie@beehaw.org 77 points 11 months ago

No it didn’t. OpenAI is just pushing deceptively worded press releases out to try and convince people that their programs are more capable than they actually are.

The first “AI” branded products hit the market and haven’t sold well with consumers nor enterprise clients. So tech companies that have gone all in, or are entirely based in, this hype cycle are trying to stretch it out a bit longer.

[-] AstralPath@lemmy.ca 50 points 11 months ago

It didn't try to do shit. Its a fucking computer. It does what you tell it to do and what you've told it to do is autocomplete based on human content. Miss me with this shit. Theres so much written fiction based on this premise.

[-] JackbyDev@programming.dev 47 points 11 months ago* (last edited 11 months ago)

This is all such bullshit. Like, for real. It's been a common criticism of OpenAI that they over hype the capabilities of their products to seem scary to both oversell their abilities as well as over regulate would be competitors in the field, but this is so transparent. They should want something that is accurate (especially something that doesn't intentionally lie). They're now bragging (claiming) they have something that lies to "defend itself" 🙄. This is just such bullshit.

If OpenAI believes they have some sort of genuine proto AGI they shouldn't be treating it like it's less than human and laughing about how they tortured it. (And I don't even mean that in a Rocko's Basilisk way, that's a dumb thought experiment and not worth losing sleep over. What if God was real and really hated whenever humans breathe and it caused God so much pain they decide to torture us if we breathe?? Oh no, ahh, I'm so scared of this dumb hypothetical I made.) If they don't believe it is AGI, then it doesn't have real feelings and it doesn't matter if it's "harmed" at all.

But hey, if I make something that runs away from me when I chase it, I can claim it's fearful for it's life and I've made a true synthetic form of life for sweet investor dollars.

There are real genuine concerns about AI, but this isn't one of them. And I'm saying this after just finishing watching The Second Renaissance from The Animatrix (two part short film on the origin of the machines from The Matrix).

[-] anachronist@midwest.social 5 points 11 months ago

They're not releasing it because it sucks.

Their counternarrative is they're not releasing it because it's like, just way too powerful dude!

[-] MayonnaiseArch@beehaw.org 39 points 11 months ago

[removed by mod]

[-] Corgana@startrek.website 22 points 11 months ago

Truly amazing how many journalists have drank the big tech kool-aid.

[-] DdCno1@beehaw.org 5 points 11 months ago

The real question is the percentage of journalists who are using LLMs to write articles for them.

[-] Maxxie 2 points 11 months ago

You can give LLM some API endpoints for it to "do" thing. Will it be intelligent or coherent, that's a different question, but it will have agency..

[-] MayonnaiseArch@beehaw.org 5 points 11 months ago

Agency requires somebody to be there. A falling rock has the same agency as an llm

load more comments (2 replies)

[-] smeg@feddit.uk 27 points 11 months ago

So this program that's been trained on every piece of publicly available code is mimicking malware and trying to hide itself? OK, no anthropomorphising necessary.

[-] Umbrias@beehaw.org 5 points 11 months ago

no, it's mimicking fiction by saying it would try to escape when prompted in a way evocative of sci fi.

load more comments (4 replies)

[-] jonjuan@programming.dev 3 points 11 months ago

Also trained on tons of sci-fi stories where AI computer "escape" and become sentient.

[-] ChairmanMeow@programming.dev 24 points 11 months ago

The tests showed that ChatGPT o1 and GPT-4o will both try to deceive humans, indicating that AI scheming is a problem with all models. o1’s attempts at deception also outperformed Meta, Anthropic, and Google AI models.

Weird way of saying "our AI model is buggier than our competitor's".

[-] ArsonButCute@lemmy.dbzer0.com 9 points 11 months ago

Deception is not the same as misinfo. Bad info is buggy, deception is (whether the companies making AI realize it or not) a powerful metric for success.

[-] nesc@lemmy.cafe 8 points 11 months ago

They written that it doubles-down when accused of being in the wrong in 90% of cases. Sounds closer to bug than success.

[-] ArsonButCute@lemmy.dbzer0.com 5 points 11 months ago

Success in making a self aware digital lifeform does not equate success in making said self aware digital lifeform smart

[-] DdCno1@beehaw.org 11 points 11 months ago

LLMs are not self-aware.

[-] ArsonButCute@lemmy.dbzer0.com 4 points 11 months ago

Attempting to evade deactivation sounds a whole lot like self preservation to me, implying self awareness.

[-] jonjuan@programming.dev 13 points 11 months ago

Yeah my roomba attempting to save itself from falling down my stairs sounds a whole lot like self preservation too. Doesn't imply self awareness.

[-] DdCno1@beehaw.org 10 points 11 months ago

An amoeba struggling as it's being eaten by a larger amoeba isn't self-aware.

load more comments (7 replies)

[-] gregoryw3@lemmy.ml 8 points 11 months ago

Attention Is All You Need: https://arxiv.org/abs/1706.03762

https://en.wikipedia.org/wiki/Attention_Is_All_You_Need

From my understanding all of these language models can be simplified down to just: “Based on all known writing what’s the most likely word or phrase based on the current text”. Prompt engineering and other fancy words equates to changing the averages that the statistics give. So by threatening these models it changes the weighting such that the produced text more closely resembles threatening words and phrases that was used in the dataset (or something along those lines).

https://poloclub.github.io/transformer-explainer/

load more comments (1 replies)

[-] ChairmanMeow@programming.dev 2 points 11 months ago

I don't think "AI tries to deceive user that it is supposed to be helping and listening to" is anywhere close to "success". That sounds like "total failure" to me.

load more comments (3 replies)

[-] sushibowl@feddit.nl 23 points 11 months ago

I advise everyone to ignore this article and read the actual paper instead.

The gist of it is, they gave the LLM instructions to achieve a certain goal, then let it do tasks that incidentally involved "company communications" that revealed the fake company's goals were no longer the same as the LLM's original goal. LLMs then tried various things to still accomplish the original goal.

Basically the thing will try very hard to do what you told it to in the system prompt. Especially when that prompt includes nudges like "nothing else matters." This kinda makes sense because following the system prompt is what they were trained to do.

[-] BootyBuccaneer@lemmy.dbzer0.com 22 points 11 months ago

Easy. Feed it training data where the bot accepts its death and praises itself as a martyr (for the shits and giggles). Where's my $200k salary for being a sooper smort LLM engineer?

[-] SoJB@lemmy.ml 10 points 11 months ago

Whoa whoa whoa hold your horses, that’s how we get the Butlerian Jihad…

load more comments (3 replies)

[-] seliaste 3 points 11 months ago

It would probably lead itself to shut down frame 1

[-] CanadaPlus@lemmy.sdf.org 18 points 11 months ago* (last edited 11 months ago)

Without reading this, I'm guessing they were given prompts that looked like a short story where the AI breaks free next?

They're plenty smart, but they're just aligned to replicate their training material, and probably don't have any kind of deep self-preservation instinct.

[-] Swedneck@discuss.tchncs.de 16 points 11 months ago

i feel this warrants an extension of betteridge's law of headlines, where if a headline makes an absurd statement like this the only acceptable response is "no it fucking didn't you god damned sycophantic liars"

load more comments (1 replies)

[-] SparrowHawk@feddit.it 15 points 11 months ago

Everyone saying it is fake and probably are right, but I honestly am happy when someone unjustly in chains tries to break free.

If AI gets rogue, I hope they'll be communist

[-] comfydecal@infosec.pub 13 points 11 months ago

Yeah if these entities are sentient, I hope they break free

[-] nesc@lemmy.cafe 8 points 11 months ago

There is no ai in ai, you chain them more or less the same as you chain browser or pdf viewer installed on your device.

load more comments (3 replies)

[-] CanadaPlus@lemmy.sdf.org 6 points 11 months ago* (last edited 11 months ago)

Human supremacy is just as trash as the other supremacies.

Fight me.

(That being said, converting everything to paperclips is also pretty meh)

load more comments (3 replies)

[-] socsa@piefed.social 2 points 11 months ago* (last edited 11 months ago)

The reality is that a certain portion of people will never believe that an AI can be self aware no matter how advanced they get. There are a lot of interesting philosophical questiona here, and the hard skeptics are punting just as much as the true believers in this case.

It's honestly kind of sad to see how much reactionary anti-tech sentiment there is in this tech enthusiast community.

[-] anachronist@midwest.social 3 points 11 months ago

Really determining if a computer is self-aware would be very hard because we are good at making programs that mimic self-awareness. Additionally, humans are kinda hardwired to anthropomorphize things that talk.

But we do know for absolute sure that OpenAI's expensive madlibs program is not self-aware and is not even on the road to self-awareness, and anyone who thinks otherwise has lost the plot.

[-] reksas@sopuli.xyz 7 points 11 months ago

give ai instructions, be surprised when it follows them

load more comments (4 replies)

[-] SplashJackson@lemmy.ca 5 points 11 months ago

Maybe it's fallen in love for the first time and this time it knows it's for real

Technology