20
submitted 1 year ago by ranok@sopuli.xyz to c/technology@beehaw.org

I made this and thought you all might enjoy it, happy hacking!

top 6 comments
sorted by: hot top controversial new old
[-] Puttybrain@beehaw.org 8 points 1 year ago* (last edited 1 year ago)

I tried running this with some output from a Wizard-Vicuna-7B-Uncensored model and it returned ('Human', 0.06035491254523517)

So I don't think that this hits the mark, to be fair, I got it to generate something really dumb but a perfect LLM detection tool will likely never exist.

Good thing is that it didn't false positive my own words.

Below is the output of my LLM, there's a decent amount of swearing so heads up

Edit:

Tried with a more sensible question and still got a false negative

('Human', 0.03917657845587952)

[-] Mac@lemmy.world 4 points 1 year ago

Lmao what the hell. That was hilarious.

[-] Echolot@sh.itjust.works 2 points 1 year ago

That is a genius approach, though a big challenge is probably going to be to select the correct corpus

[-] TheTrueLinuxDev@beehaw.org 3 points 1 year ago

It's not if you're aware of Generative Adversarial Network, it's a losing game for the detector, because anything you use to try and detect generated content would only strengthen the neural net model itself, so by making a detector, you're making it better at evading detection. At least in language model, they are looking at how GAN can be applied in language model.

[-] semibreve42@lemmy.dupper.net 2 points 1 year ago

Super cool approach. I wouldn't have guessed it would be that effective if someone had explained it to me without the data.

I'm curious how easy it is to "defeat". If you take an AI generated text that is successfully identified with high confidence and superficially edit it to include something an LLM wouldn't usually generate (like a few spelling errors), is that enough to push the text out of high confidence?

I ask because I work in higher ed, and have been sitting on the sidelines watching the chaos. My understanding is that there's probably no way to automate LLM detection to a high enough certainty for it to be used in an academic setting as cheat detection, the false positives are way too high.

[-] ranok@sopuli.xyz 3 points 1 year ago

ZipPy is much less robust to defeat attempts than larger model-based detectors. Earlier I asked ChatGPT to write in the voice of a highschool student and it fooled the detectors. The web-UI let's you add LLM-generated text in the style that you're looking at to improve the accuracy of those types of content.

I don't think we'll ever be able to detect it reliably enough to fail students, if they co-write with a LLM.

load more comments
view more: next ›
this post was submitted on 09 Jun 2023
20 points (100.0% liked)

Technology

37689 readers
270 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS