749
submitted 1 year ago* (last edited 1 year ago) by AlmightySnoo@lemmy.world to c/reddit@lemmy.ml
you are viewing a single comment's thread
view the rest of the comments
[-] Pixelologist@lemmy.world 5 points 1 year ago* (last edited 1 year ago)

Please correct me if I'm mistaken but isn't the reddit dataset used to train LLMs from before Chat GPT became widely known? I was under the impression data from that point onwards was poisoned and not useful for training purposes

I can't seem to find it now but I remember there being a ~90gb .zip megadb upload that got passed around a lot on machine learning reddit subs that was a snapshot of reddit before x date

this post was submitted on 25 Jun 2023
749 points (100.0% liked)

Reddit

13627 readers
2 users here now

founded 5 years ago
MODERATORS