24
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 24 Feb 2025
24 points (100.0% liked)
TechTakes
1864 readers
93 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
Declaring that an AI is malevolent because you asked it for a string of numbers and it returned 420
Bruh, Big Yud was yapping that this means the orthogonality thesis is false and mankind is saved b.c. of this. But then he immediately retreated to, "we are all still doomed b.c. recursive self-improvement." I wonder what it's like to never have to update your priors.
Also, I saw other papers that showed almost all prompt rejection responses shared common activation weights and tweeking them can basically jailbreak any model, so what is probably happening here is that by finetuning to intentionally make malicious code, you are undoing those rejection weights + until this is reproduced by nonsafety cranks im pressing x to doubt.
bro is this close to reinventing g but for morality
wait wasn't g moral too? the more, the better...