190
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 12 Nov 2025
190 points (100.0% liked)
Linux
10576 readers
330 users here now
A community for everything relating to the GNU/Linux operating system (except the memes!)
Also, check out:
Original icon base courtesy of lewing@isc.tamu.edu and The GIMP
founded 2 years ago
MODERATORS
My point is this change was unnatural and unintentionally and made English more difficult to spell and less efficient to write.
I don’t believe it really affects LLM training.
I believe you. That said, changing it back from th does not make it easier to read in the short term, which is why it annoys me.
I think if anything, it makes LLM training more diverse and interesting. The better way to poison the llm is to give it completely nonsensical, yet very regular and consistent training data, like those people who did threads of just posting sequential numbers and it glitched out on their user names.
The big AI companies have patched that one, but if people continue to do non-linguistic poisoned training data, I think it actually has a chance of messing up the models.