"Admit" is a strong word, I'd go for "desperately attempt to deny".
I wonder if the OpenAI habit of naming their models after the previous ones' embarrassing failures is meant as an SEO trick. Google "chatgpt strawberry" and the top result is about o1. It may mention the origin of the codename, but ultimately you're still streered to marketing material.
Either way, I'm looking forward to their upcoming AI models Malpractice, Forgery, KiddieSmut, ClassAction, SecuritiesFraud and Lemonparty.
The stretching is just so blatant. People who train neural networks do not write a bunch of tokens and weights. They take a corpus of training data and run a training program to generate the weights. That's why it is the training program and the corpus that should be considered the source form of the program. If either of these can't be made available in a way that allows redistribution of verbatim and modified versions, it can't be open source. Even if I have a powerful server farm and a list of data sources for Llama 3, I can't replicate the model myself without committing copyright infringement (neither could Facebook for that matter, and that's not an entirely separate issue).
There are large collections of freely licensed and public domain media that could theoretically be used to train a model, but that model surely wouldn't be as big as the proprietary ones. In some sense truly open source AI does exist and has for a long time, but that's not the exciting thing OSI is lusting after, is it?
Jesus christ. Not surprised to see yet another dowsing wand being sold for cops, but what kind of a court admits this shit as evidence?
Turns out chuds suddenly love DEI when it's about making the Russian Military Industrial Complex feel safe and included in major open source projects. My heart goes out to the victims of discrimination willingly working for military contractors who now have to submit their Linux kernel patches for review like most people.
"if we don’t do it, someone else will"
Ohh yes, if Grindr doesn't feed Grindr users's chats, which only the users involved and possibly Grindr itself are presumably privy to, to an LLM, someone else will.
It's incredibly frustrating to try and figure out how this grift works. The company is bleeeing money at high pressure. The more users they get, the faster they lose money. Even if you're a true believer who thinks their product is useful and will be ubiquitous in the near future, there's no way this makes sense as an investment.
It could be a greater fool scam, but if you're goddamn Softbank, Microsoft, or NVIDIA investing hundreds of millions, surely you are the biggest fool already? Who's MSFT gonna flip their share to? Scrooge McDuck? A G7 member government? God?
Or maybe they're expecting to become so ubiquitous you can't live without ChatGPT, at which point they will jack up the price (the good old MS EEE/Oracle Hustle). I suppose that would parse, but the novelty is already fading and public sentiment is at a downward slope. Even if you're a true believer, you'd have to beat the competition first. You could also hope for a magician to come along and suddenly invent chips that are an order of magnitude more efficient, but you'd still need to pay another king's ransom to have them designed, manufactured and sold to you (and absolutely not to your competitors).
How do they get away with these bonkers numbers? They're somehow going to make 20 times more revenue in the remaining year than they have until now? They're going to nearly double their earnings every year? They're gonna fucking invest seven trillion in TSMC chip fabs? These numbers are made up by a nine-year old. My burger restaurant where we use natural diamonds as grill charcoal is gonna be worth inifite plus one zillion brazillion skibidillion dollars next year. Please invest in it.
TSMC suit: "And is the seven trillion dollars in the room with us right now?"
Sure, but this isn't about making copyright stricter, but just making it explicit that the existing law applies to AI tech.
I'm very critical of copyright law, but letting specifically big tech pretend like they're not distributing derivative work because it's derived from billions of works on the internet is not the gateway to copyright abolition I'd hope to see.
The phrasing "a bit less racist" suggests a nonzero level of racism in the output, yet the participants also complain about the censorship making the bot refuse to discuss sensitive topics. Sounds like these LLMs can only be boringly racist.
You think I hate nerds? Ha ha, fuck you. If I did hate nerds, I could never do the kind of damage to the reputation of the geek subculture that Paul Graham and his like-minded cohorts have.
Come on out and nerd fight me, posers. I will outnerd you. I will eat you for breakfast. You ain't got shit on me. Give me six lines written by the hand of the most pedantic bluecheck and I will pedantically point out seven fallacies in them, write a filk song about it and arrange it to chiptune.
Yet somehow people don't seem to hate me for being a nerd. I think I'm fairly popular and most people react to my nerdy interests positively, or at worst indifferently. If it sounds like I'm bragging, I'm not saying I'm some super cool gigachad, but that anyone is compared to these losers.
People don't hate techbros for being nerds, or men for that matter. If I (a nerdy man) am the one who supposedly hates nerds and men, why are you the one implying that the faults I criticise techbros for are innate to nerds and men?
Oh well, we all know the real reason techbros hate leftists is that we make so much better art, have better sex, do better science and are oh so resentably attractive.
LLMs are quite impressive as chatbots all things considered. The conversations with them are way more realistic and almost as funny as the ones with the IRC markov chain my friend made as a freshman CS student.
Of course, out bot's training data only included the IRC channel's logs of a few years and the Finnish Bible we later threw in for shits and giggles. A training set of approximately zero terabytes in total.
LLMs are less a marvel of machine learning algorithms (though I admit they might play a part) and more one of data scraping. Based on their claims, they have already dug through the vast majority of publicly accessible world wide web, so where do you go from there? Sure, there are a lot of books that are not on the web, but feeding them in the machine is about as hard as getting them on the web to begin with.