Top-level commenters would do well to read Authors Guild v Google, two decades ago. They're also invited to rend their garments and gnash their teeth at Google, if they like.
Last Week Tonight's rant of the week is about AI slop. A Youtube video is available here. Their presentation is sufficiently down-to-earth to be sharable with parents and extended family, focusing on fake viral videos spreading via Facebook, Instagram, and Pinterest; and dissecting several examples of slop in order to help inoculate the audience.
It's been almost six decades of this, actually; we all know what this link will be. Longer if you're like me and don't draw a distinction between AI, cybernetics, and robotics.
A German lawyer is upset because open-source projects don't like it when he pastes chatbot summaries into bug reports. If this were the USA, he would be a debit to any bar which admits him, because the USA's judges have started to disapprove of using chatbots for paralegal work.
Somebody pointed out that HN's management is partially to blame for the situation in general, on HN. Copying their comment here because it's the sort of thing Dan might blank:
but I don't want to get hellbanned by dang.
Who gives a fuck about HN. Consider the notion that dang is, in fact, partially to blame for this entire fiasco. He runs an easy-to-propagandize platform due how much control of information is exerted by upvotes/downvotes and unchecked flagging. It's caused a very noticeable shift over the past decade among tech/SV/hacker voices -- the dogmatic following of anything that Musk or Thiel shit out or say, this community laps it up without hesitation. Users on HN learn what sentiment on a given topic is rewarded and repeat it in exchange for upvotes.
I look forward to all of it burning down so we can, collectively, learn our lessons and realize that building platforms where discourse itself is gamified (hn, twitter, facebook, and reddit) is exactly what led us down this path today.
Every person I talk to — well, every smart person I talk to — no, wait, every smart person in tech — okay, almost every smart person I talk to in tech is a eugenicist. Ha, see, everybody agrees with me! Well, almost everybody…
Meanwhile, actual Pastafarians (hi!) know that the Russian Federation openly persecutes the Church of the Flying Spaghetti Monster for failing to help the government in its authoritarian activities, and also that we're called to be anti-authoritarian. The Fifth Rather:
I'd really rather you didn't challenge the bigoted, misogynist, hateful ideas of others on an empty stomach. Eat, then go after the bastards.
May you never run out of breadsticks, travelers.
He's talking like it's 2010. He really must feel like he deserves attention, and it's not likely fun for him to learn that the actual practitioners have advanced past the need for his philosophical musings. He wanted to be the foundation, but he was scaffolding, and now he's lining the floors of hamster cages.
This is some of the most corporate-brained reasoning I've ever seen. To recap:
- NYC elects a cop as mayor
- Cop-mayor decrees that NYC will be great again, because of businesses
- Cops and other oinkers get extra cash even though they aren't business
- Commercial real estate is still cratering and cops can't find anybody to stop/frisk/arrest/blame for it
- Folks over in New Jersey are giggling at the cop-mayor, something must be done
- NYC invites folks to become small-business owners, landlords, realtors, etc.
- Cop-mayor doesn't understand how to fund it (whaddaya mean, I can't hire cops to give accounting advice!?)
- Cop-mayor's CTO (yes, the city has corporate officers) suggests a fancy chatbot instead of hiring people
It's a fucking pattern, ain't it.
I think that this is actually about class struggle and the author doesn't realize it because they are a rat drowning in capitalism.
2017: AI will soon replace human labor
2018: Laborers might not want what their bosses want
2020: COVID-19 won't be that bad
2021: My friend worries that laborers might kill him
2022: We can train obedient laborers to validate the work of defiant laborers
2023: Terrified that the laborers will kill us by swarming us or bombing us or poisoning us; P(guillotine) is 20%; my family doesn't understand why I''m afraid; my peers have even higher P(guillotine)
Rumor is that GPT-4 is also underpriced; in general, rumors are that OpenAI loses money on all of its products individually. It's sneerworthy, but I don't know what it means for the future; few things are more dangerous than a cornered wild startup who is starving and afraid.
Read carefully. On p1-2, the judge makes it clear that "the incentive for human beings to create artistic and scientific works" is "the ability of copyright holders to make money from their works," to the law, there isn't any other reason to publish art. This is why I'm so dour on copyright, folks; it's not for you who love to make art and prize it for its cultural impact and expressive power, but for folks who want to trade art for money.
On p3, a contrast appears between Chhabria and Alsup (yes, that Alsup); the latter knows what a computer is and how to program it, and this makes him less respectful of copyright overall. Chhabria doesn't really hide that they think Meta didn't earn their summary judgement, presumably because they disagree with Alsup about whether this is a "competitive or creative displacement." That's fair given the central pillar of the decision on p4:
An analogy might make this clearer. Suppose a transient person on a street corner is babbling. Occasionally they spout what sounds like a quote from a Star Wars film. Intrigued, we prompt the transient to recite the entirety of Star Wars, and they proceed to mostly recreate the original film, complete with sound effects and voice acting, only getting a few details wrong. Does it matter whether the transient paid to watch the original film (as opposed to somebody else paying the fee)? No, their recreation might be candid and yet not faithful enough to infringe. Is Lucas entitled to a licensing fee for every time the transient happens to learn something about Star Wars? Eh, not yet, but Disney's working on it. This is why everybody is so concerned about whether the material was pirated, regardless of how it was paid for; they want to say that what's disallowed is not the babbling on the street but the access to the copyrighted material itself.
Almost every technical claim on p8-9 is simplified to the point of incorrectness. They are talking points about Transformers turned into aphorisms and then axioms. The wrongest claim is on p9, that "to be able to generate a wide range of text … an LLM's training data set must be large and diverse" (it need only be diverse, not large) followed by the claim that an LLM's "memory" must be trained on books or equivalent "especially valuable training data" in order to "work with larger amounts of text at once" (conflating hyperparameters with learned parameters.) These claims show how the judge fails to actually engage with the technical details and thus paints with a broad brush dipped in the wrong color.
On p12, the technical wrongness overflows. Any language model can be forced to replicate a copyrighted work, or to avoid replication, by sampling techniques; this is why perplexity is so important as a metric. What would have genuinely been interesting is whether Llama is low-perplexity on the copyrighted works, not the rate of exact replications, since that's the key to getting Llama to produce unlimited Harry Potter slash or whatever.
On p17 the judge ought to read up on how Shannon and Markov initially figured out information theory. LLMs read like Shannon's model, and in that sense they're just like humans: left to right, top to bottom, chunking characters into words, predicting shapes and punctuation. Pretending otherwise is powdered-wig sophistry or perhaps robophobia.
On p23 Meta cites fuckin' Sega v. Accolade! This is how I know y'all don't read the opinions; you'd be hyped too. I want to see them cite Galoob next. For those of you who don't remember the 90s, the NES and Genesis were video game consoles, and these cases established our right to emulate them and write our own games for them.
p28-36 is the judge giving free legal advice. I find their line of argumentation tenuous. Consider Minions; Minions are bad, Minions are generic, and Minions can be used to crank out infinite amounts of slop. But, as established at the top, whoever owns Minions has the right to profit from Minions, and that is the lone incentive by which they go to market. However, Minions are arbitrary; there's no reason why they should do well in the market, given how generic and bad they are. So if we accept their argument then copyright becomes an excuse for arbitrary winners to extract rent from cultural artifacts. For a serious example, look up the ironic commercialization of the Monopoly brand.