And 100% of them are just trying to suck up to the rich the hardest to get some cash thrown their way for posting. Their whole community has been built 100% around that from day 1.
Note also that genuine labor saving stuff like say the Unity engine with Unity asset store, did result in an absolute flood of shovelware on Steam back in the mid 2010s (although that probably had as much having to do with Steam FOMO-ing about the possibility of not letting the next Minecraft onto Steam).
As a thought experiment imagine an unreliable labor saving tool that speeds up half* of the work 20x, and slows down the other half 3x. You would end up 1.525 times slower.
The fraction of work (not by lines but by hours) that AI helps with is probably less than 50% , and the speed up is probably worse than 20x.
Slowdown could be due to some combination of
- Trying to do it with AI until you sink too much time into that and then doing it yourself (>2x slowdown here).
- Being slower at working with the code you didn't write.
- It being much harder to debug code you didn't write.
- Plagiarism being inferior to using open source libraries.
footnote: "half" as measured by the pre-tool hours.
It's curious how if ChatGPT was a person - saying exactly the same words - he would've gotten charged with a criminal conspiracy, or even shot, as its human co-conspirator in Florida did.
And had it been a foreign human in the middle east, radicalizing random people, he would've gotten a drone strike.
"AI" - and the companies building them - enjoy the kind of universal legal immunity that is never granted to humans. That needs to end.
The other interesting thing is that if you try it a bunch of times, sometimes it uses the calculator and sometimes it does not. It, however, always claims that it used the calculator, unless it didn't and you tell it that the answer is wrong.
I think something very fishy is going on, along the lines of them having done empirical research and found that fucking up the numbers and lying about it makes people more likely to believe that gemini is sentient. It is a lot weirder (and a lot more dangerous, if someone used it to calculate things) than "it doesn't have a calculator" or "poor LLMs cant do math". It gets a lot of digits correct somehow.
Frankly this is ridiculous. They have a calculator integrated in the google search. That they don't have one in their AIs feels deliberate, particularly given that there's a plenty of LLMs that actually run calculator almost all of the time.
edit: lying that it used a calculator is rather strange, too. Humans don't say "code interpreter" or "direct calculator" when asked to multiply two numbers. What the fuck is a "direct calculator"? Why is it talking about "code interpreter" and "direct calculator" conditionally on there being digits (I never saw it say that it used a "code interpreter" when the problem wasn't mathematical), rather than conditional on there being a [run tool] token outputted earlier?
The whole thing is utterly ridiculous. Clearly for it to say that it used a "code interpreter" and a "direct calculator" (what ever that is), it had to be fine tuned to say that. Consequently to a bunch of numbers, rather than consequently to a [run tool] thing it uses to run a tool.
edit: basically, congratulations Google, you have halfway convinced me that an "artificial lying sack of shit" is possible after all. I don't believe that tortured phrases like "code interpreter" and a "direct calculator" actually came from the internet.
These assurances - coming from an "AI" - seem like they would make the person asking the question be less likely to double check the answer (and perhaps less likely to click the downvote button), In my book this would qualify them as a lie, even if I consider LLM to not be any more sentient than a sack of shit.
Further support for the memorization claim: I posted examples of novel river crossing puzzles where LLMs completely fail (on this forum).
Note that Apple’s actors / agents river crossing is a well known “jealous husbands” variant, which you can ask a chatbot to explain to you. It gladly explains, even as it can’t follow its own explanation (since of course it isn’t its own explanation but a plagiarized one, even if changes words).
edit: https://awful.systems/post/4027490 and earlier https://awful.systems/post/1769506
I think what I need to do is to write up a bunch of puzzles, assign them randomly to 2 sets, and test & post one set, while holding back on the second set (not even testing it on any online chatbots). Then in a year or two see how much the set that's public improves, vs the one that's held back.
Actually, having read it carefully, it is interesting that they actually don't claim it was hacked, they claim that the modification was unauthorized. They also don't claim that they removed the access from that mysterious "employee" who modified it. I'm thinking they had some legal reason to technically not lie.
AI peddlers just love any "critique" that presumes the AI is great at something.
Safety concern that LLMs would go Skynet? Say no more, I hear you and I'll bring it up first thing in the Congress.
Safety concern that terrorists might use it to make bombs? Say no more! I agree that the AI is so great for making bombs! We'll restrict it to keep people safe!
It sounds too horny, you say? Yeah, good point, I love it. Our technology is better than sex itself! We'll keep it SFW to keep mankind from going extinct due to robosexuality!
Both parties are buying into a premise we already know to be incorrect.
We may know it is incorrect, but LLM salesmen are claiming things like "90th percentile on LSAT", high scores on a "college level reasoning benchmark" and so on and so forth.
They are claiming "yeah yeah there's all the anekdotal reports of glue pizza, but objectively, our AI is more capable than your workers, so you can replace them with our AI", and this is starting to actually impact the job market.
Other thing to add to this is that there's just one or two people in the train providing service for hundreds of other people or millions of dollars worth of goods. Automating those people away is simply not economical, not even in terms of the headcount replaced vs headcount that has to be hired to maintain the automation software and hardware.
Unless you're a techbro, who deeply resents labor, someone who would rather hire 10 software engineers than 1 train driver.
Also, my thought on this is that since an LLM has no internal state with which to represent the state of the problem, it can't ever actually solve any variation of the river crossing. Not even those that it "solves" correctly.
If it outputs the correct sequence, inside your head the model of the problem will be in the solved state, but on the LLM's side there's just a sequence of steps that it wrote down, with those steps directly inhibiting production of another "Trip" token, until that crosses a threshold. There isn't an inventory or even a count of items, there's an unrelated number that weights for or against "Trip".
If we are to anthropomorphize it (which we shouldn't, but anyway), it's bullshitting up an answer and it gradually gets a feeling that it has bullshitted enough, which can happen at the right moment, or not.
I love the "criti-hype". AI peddlers absolutely love any concerns that imply that the AI is really good at something.
Safety concern that LLMs would go Skynet? Say no more, I hear you and I'll bring it up in the congress!
Safety concern that terrorists might use it to make bombs? Say no more! I agree that the AI is so great for making bombs! We'll restrict it to keep people safe!
Sexual roleplay? Yeah, good point, I love it. Our technology is better than sex itself! We'll restrict it to keep mankind from falling into the sin of robosexuality and going extinct! I mean, of course, you can't restrict something like that, but we'll try, at least until we release a hornybot.
But any concern about language modeling being fundamentally not the right tool for some job (Do you want to cite a paper or do you want to sample from the underlying probability distribution?), hey hey hows about we talk about the skynet thing instead?
Oh they are going to charge per token for github copilot? That thing is a money waste for everyone, I'm pretty sure. I get a mix of inane mildly good suggestions, irrelevant stuff, and an occasional suggestion of super evil sabotage. Due to mild OCD about issues, I tend to have to fix said mildly good suggestions, but from the objective perspective that nitpickery is not worth it, everything was fine without, we had compiler warnings, coverity, etc.
edit: the difference being that the old stuff was deterministic and you just ran it on the whole codebase and had it pass. Unlike gh copilot that'll just make up new shit. And as for the times it caught some bad bug that you made... add more tests instead.