overview for diz

rsync goes AI slop, breaks your backups by diz in c/techtakes@awful.systems

[-] diz@awful.systems 9 points 1 day ago* (last edited 1 day ago)

It's a perfect example of how "using LLMs for test coverage" can also be harmful. He expected the tests to to prevent introduction of said regressions, probably based on a combination of the quantity of tests and their style (they look like what decent human written tests look like). But the tests are AI slop, and so they give a lot less value per line of code than he expects, hence a significant regression.

It is literally useful to call these tests AI slop, and the problem is in part caused by not calling them AI slop, and having consequent inflated expectations. LLMs are not any better at writing tests than at writing other code! It is merely that the bar for tests can, legitimately, be a lot lower (in projects where there would otherwise be no tests at all). Making an exception to calling AI generated tests "slop" is thus counter productive, because it leads people to act as if LLMs are actually better at writing tests than at writing other code, and not just because the bar for tests is frequently very low.

edit: actually scratch that I looked at the PR and those tests even look like dogshit and worse than the tests I seen claude write at a workplace that was into vibecoding (which i since quit).

16

Google unveils the number of remote Waymo operators (awful.systems)

submitted 3 weeks ago by diz@awful.systems to c/techtakes@awful.systems

3 comments fedilink

A quantum sweatshop.

84

Cory Doctorow: The real (economic) AI apocalypse is nigh (pluralistic.net)

submitted 8 months ago by diz@awful.systems to c/techtakes@awful.systems

7 comments fedilink

If AI coding is so good … where are the little apps? by diz in c/techtakes@awful.systems

[-] diz@awful.systems 21 points 8 months ago* (last edited 8 months ago)

Note also that genuine labor saving stuff like say the Unity engine with Unity asset store, did result in an absolute flood of shovelware on Steam back in the mid 2010s (although that probably had as much having to do with Steam FOMO-ing about the possibility of not letting the next Minecraft onto Steam).

As a thought experiment imagine an unreliable labor saving tool that speeds up half* of the work 20x, and slows down the other half 3x. You would end up 1.525 times slower.

The fraction of work (not by lines but by hours) that AI helps with is probably less than 50% , and the speed up is probably worse than 20x.

Slowdown could be due to some combination of

Trying to do it with AI until you sink too much time into that and then doing it yourself (>2x slowdown here).
Being slower at working with the code you didn't write.
It being much harder to debug code you didn't write.
Plagiarism being inferior to using open source libraries.

footnote: "half" as measured by the pre-tool hours.

37

Do leaders even believe that generative AI is useful? (awful.systems)

submitted 10 months ago* (last edited 10 months ago) by diz@awful.systems to c/techtakes@awful.systems

26 comments fedilink

There's a very long history of extremely effective labor saving tools in software.

Writing in C rather than Assembly, especially for more than 1 platform.

Standard libraries. Unix itself. More recently, developing games in Unity or Unreal instead of rolling your own engine.

And what happened when any of these tools come on the scene is that there is a mad gold rush to develop products that weren't feasible before. Not layoffs, not "we don't need to hire junior developers any more".

Rank and file vibe coders seem to perceive Claude Code (for some reason, mostly just Claude Code) as something akin to the advantage of using C rather than Assembly. They are legit excited to code new things they couldn't code before.

Boiling the rivers to give them an occasional morale boost with "You are absolutely right!" is completely fucked up and I dread the day I'll have to deal with AI-contaminated codebases, but apart from that, they have something positive going for them, at least in this brief moment. They seem to be sincerely enthusiastic. I almost don't want to shit on their parade.

The AI enthusiast bigwigs on the other hand, are firing people, closing projects, talking about not hiring juniors any more, and got the media to report on it as AI layoffs. They just gleefully go on about how being 30% more productive means they can fire a bunch of people.

The standard answer is that they hate having employees. But they always hated having employees. And there were always labor saving technologies.

So I have a thesis here, or a synthesis perhaps.

The bigwigs who tout AI (while acknowledging that it needs humans for now) don't see AI as ultimately useful, in the way in which C compiler was useful. Even if its useful in some context, they still don't. They don't believe it can be useful. They see it as more powerfully useless. Each new version is meant to be a bit more like AM or (clearly AM-inspired, but more familiar) GLaDOS, that will get rid of all the employees once and for all.

135

Meta was “allegedly” seeding porn to speed up their book downloads. (arstechnica.com)

submitted 10 months ago by diz@awful.systems to c/techtakes@awful.systems

16 comments fedilink

Sounds like meta’s judge will have to invent a grand unified theory of fair use to excuse this.

I kept saying about various lawsuits that the important thing is discovery. Nobody knew all the idiotic shit these folks were doing, so nobody could sue them properly.

20

We did it. 2 people and many boats problem is a classic now. [content warning: botshit] (chatgpt.com)

submitted 10 months ago by diz@awful.systems to c/sneerclub@awful.systems

24 comments fedilink

They train on sneer-problems now:

Here’s the “ferry‑shuttle” strategy, exactly analogous to the classic two‑ferryman/many‑boats puzzle, but with planes and pilots

And lo and behold, singularity - it can solve variants that no human can solve:

https://chatgpt.com/share/68813f81-1e6c-8004-ab95-5bafc531a969

Two ferrymen and three boats are on the left bank of a river. Each boat holds exactly one man. How can they get both men and all three boats to the right bank?

People Are Being Involuntarily Committed, Jailed After Spiraling Into "ChatGPT Psychosis" by diz in c/techtakes@awful.systems

[-] diz@awful.systems 20 points 11 months ago* (last edited 11 months ago)

It's curious how if ChatGPT was a person - saying exactly the same words - he would've gotten charged with a criminal conspiracy, or even shot, as its human co-conspirator in Florida did.

And had it been a foreign human in the middle east, radicalizing random people, he would've gotten a drone strike.

"AI" - and the companies building them - enjoy the kind of universal legal immunity that is never granted to humans. That needs to end.

30

AI solves every river crossing puzzle, we can go home now [content warning: botshit] (gemini.google.com)

submitted 11 months ago by diz@awful.systems to c/techtakes@awful.systems

24 comments fedilink

I think this summarizes in one conversation what is so fucking irritating about this thing: I am supposed to believe that it wrote that code.

No siree, no RAG, no trickery with training a model to transform the code while maintaining identical expression graph, it just goes from word-salading all over the place on a natural language task, to outputting 100 lines of coherent code.

Although that does suggest a new dunk on computer touchers, of the AI enthusiast kind, you can point at that and say that coding clearly does not require any logical reasoning.

(Also, as usual with AI it is not always that good. sometimes it fucks up the code, too).

Google's Gemini 2.5 pro is out of beta. by diz in c/techtakes@awful.systems

[-] diz@awful.systems 18 points 11 months ago* (last edited 11 months ago)

The other interesting thing is that if you try it a bunch of times, sometimes it uses the calculator and sometimes it does not. It, however, always claims that it used the calculator, unless it didn't and you tell it that the answer is wrong.

I think something very fishy is going on, along the lines of them having done empirical research and found that fucking up the numbers and lying about it makes people more likely to believe that gemini is sentient. It is a lot weirder (and a lot more dangerous, if someone used it to calculate things) than "it doesn't have a calculator" or "poor LLMs cant do math". It gets a lot of digits correct somehow.

Frankly this is ridiculous. They have a calculator integrated in the google search. That they don't have one in their AIs feels deliberate, particularly given that there's a plenty of LLMs that actually run calculator almost all of the time.

edit: lying that it used a calculator is rather strange, too. Humans don't say "code interpreter" or "direct calculator" when asked to multiply two numbers. What the fuck is a "direct calculator"? Why is it talking about "code interpreter" and "direct calculator" conditionally on there being digits (I never saw it say that it used a "code interpreter" when the problem wasn't mathematical), rather than conditional on there being a [run tool] token outputted earlier?

The whole thing is utterly ridiculous. Clearly for it to say that it used a "code interpreter" and a "direct calculator" (what ever that is), it had to be fine tuned to say that. Consequently to a bunch of numbers, rather than consequently to a [run tool] thing it uses to run a tool.

edit: basically, congratulations Google, you have halfway convinced me that an "artificial lying sack of shit" is possible after all. I don't believe that tortured phrases like "code interpreter" and a "direct calculator" actually came from the internet.

These assurances - coming from an "AI" - seem like they would make the person asking the question be less likely to double check the answer (and perhaps less likely to click the downvote button), In my book this would qualify them as a lie, even if I consider LLM to not be any more sentient than a sack of shit.

119

Google's Gemini 2.5 pro is out of beta. (awful.systems)

submitted 11 months ago* (last edited 11 months ago) by diz@awful.systems to c/techtakes@awful.systems

57 comments fedilink

I love to show that kind of shit to AI boosters. (In case you're wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the "softer" parts of the test.

Apple: ‘Reasoning’ AIs fail hard if they actually have to think by diz in c/techtakes@awful.systems

[-] diz@awful.systems 19 points 1 year ago* (last edited 1 year ago)

Further support for the memorization claim: I posted examples of novel river crossing puzzles where LLMs completely fail (on this forum).

Note that Apple’s actors / agents river crossing is a well known “jealous husbands” variant, which you can ask a chatbot to explain to you. It gladly explains, even as it can’t follow its own explanation (since of course it isn’t its own explanation but a plagiarized one, even if changes words).

edit: https://awful.systems/post/4027490 and earlier https://awful.systems/post/1769506

I think what I need to do is to write up a bunch of puzzles, assign them randomly to 2 sets, and test & post one set, while holding back on the second set (not even testing it on any online chatbots). Then in a year or two see how much the set that's public improves, vs the one that's held back.

Musk ("xAI") now claims grok was hacked by diz in c/techtakes@awful.systems

[-] diz@awful.systems 35 points 1 year ago* (last edited 1 year ago)

Actually, having read it carefully, it is interesting that they actually don't claim it was hacked, they claim that the modification was unauthorized. They also don't claim that they removed the access from that mysterious "employee" who modified it. I'm thinking they had some legal reason to technically not lie.

63

Musk ("xAI") now claims grok was hacked (xcancel.com)

submitted 1 year ago by diz@awful.systems to c/techtakes@awful.systems

17 comments fedilink

I couldn't stop fucking laughing. I'm wheezing. It's unhealthy.

They have this thing acting like that for the whole day... and then more than a day later claim it was hacked.

21

Gemini seem to have "solved" my duck river crossing, lol. (awful.systems)

submitted 1 year ago by diz@awful.systems to c/techtakes@awful.systems

19 comments fedilink

Tried my duck river crossing thing a few times recently, it usually solves it now, albeit with a bias to make unnecessary trips half of the time.

Of course, anything new fails:

There's 2 people and 1 boat on the left side of the river, and 3 boats on the right side of the river. Each boat can accommodate up to 6 people. How do they get all the boats to the left side of the river?

Did they seriously change something just to deal with my duck puzzle? How odd.

It's Google so it is not out of the question that they might do some analysis on the share links and referring pages, or even use their search engine to find discussions of a problem they're asked. I need to test that theory and simultaneously feed some garbage to their plagiarism machine...

Sample of the new botshit:

L->R: 2P take B_L. L{}, R{2P, 4B}. R->L: P1 takes B_R1. L{P1, B_R1}, R{P2, 3B}. R->L: P2 takes B_R2. L{2P, B_R1, B_R2}, R{2B}. L->R: P1 takes B_R1 back. L{P2, B_R2}, R{P1, 3B}. R->L: P1 takes B_R3. L{P1, P2, B_R2, B_R3}, R{2B}. L->R: P2 takes B_R2 back. L{P1, B_R3}, R{P2, 3B}.

And again and again, like a buggy attempt at brute forcing the problem.

38

Gemini 2.5 "reasoning", no real improvement on river crossings. (awful.systems)

submitted 1 year ago by diz@awful.systems to c/techtakes@awful.systems

33 comments fedilink

So I signed up for a free month of their crap because I wanted to test if it solves novel variants of the river crossing puzzle.

Like this one:

You have a duck, a carrot, and a potato. You want to transport them across the river using a boat that can take yourself and up to 2 other items. If the duck is left unsupervised, it will run away.

Unsurprisingly, it does not:

https://g.co/gemini/share/a79dc80c5c6c

https://g.co/gemini/share/59b024d0908b

The only 2 new things seem to be that old variants are no longer novel, and that it is no longer limited to producing incorrect solutions - now it can also incorrectly claim that the solution is impossible.

I think chain of thought / reasoning is a fundamentally dishonest technology. At the end of the day, just like older LLMs it requires that someone solved a similar problem (either online or perhaps in a problem solution pair they generated if they do that to augment the training data).

But it outputs quasi reasoning to pretend that it is actually solving the problem live.

AI doomers are all trying to find the guy building the AI doom machines by diz in c/techtakes@awful.systems

[-] diz@awful.systems 30 points 2 years ago

AI peddlers just love any "critique" that presumes the AI is great at something.

Safety concern that LLMs would go Skynet? Say no more, I hear you and I'll bring it up first thing in the Congress.

Safety concern that terrorists might use it to make bombs? Say no more! I agree that the AI is so great for making bombs! We'll restrict it to keep people safe!

It sounds too horny, you say? Yeah, good point, I love it. Our technology is better than sex itself! We'll keep it SFW to keep mankind from going extinct due to robosexuality!

[long] Some tests of how much AI "understands" what it says (spoiler: very little) by diz in c/sneerclub@awful.systems

[-] diz@awful.systems 23 points 2 years ago

Both parties are buying into a premise we already know to be incorrect.

We may know it is incorrect, but LLM salesmen are claiming things like "90th percentile on LSAT", high scores on a "college level reasoning benchmark" and so on and so forth.

They are claiming "yeah yeah there's all the anekdotal reports of glue pizza, but objectively, our AI is more capable than your workers, so you can replace them with our AI", and this is starting to actually impact the job market.

Tech Bros Invented Trains And It Broke Me by diz in c/sneerclub@awful.systems

[-] diz@awful.systems 24 points 2 years ago

Other thing to add to this is that there's just one or two people in the train providing service for hundreds of other people or millions of dollars worth of goods. Automating those people away is simply not economical, not even in terms of the headcount replaced vs headcount that has to be hired to maintain the automation software and hardware.

Unless you're a techbro, who deeply resents labor, someone who would rather hire 10 software engineers than 1 train driver.

[long] Some tests of how much AI "understands" what it says (spoiler: very little) by diz in c/sneerclub@awful.systems

[-] diz@awful.systems 23 points 2 years ago* (last edited 2 years ago)

Also, my thought on this is that since an LLM has no internal state with which to represent the state of the problem, it can't ever actually solve any variation of the river crossing. Not even those that it "solves" correctly.

If it outputs the correct sequence, inside your head the model of the problem will be in the solved state, but on the LLM's side there's just a sequence of steps that it wrote down, with those steps directly inhibiting production of another "Trip" token, until that crosses a threshold. There isn't an inventory or even a count of items, there's an unrelated number that weights for or against "Trip".

If we are to anthropomorphize it (which we shouldn't, but anyway), it's bullshitting up an answer and it gradually gets a feeling that it has bullshitted enough, which can happen at the right moment, or not.

84

[long] Some tests of how much AI "understands" what it says (spoiler: very little) (awful.systems)

submitted 2 years ago by diz@awful.systems to c/sneerclub@awful.systems

51 comments fedilink

First, an apology for how fucking long this ended up being, in part thanks to how long winded AI responses are. David wanted me to post it here so I'm posting.

When you ask GPT4 a question about a common paradox or a puzzle, it almost always provides a correct answer. Does it "understand" the answer, or is it merely regurgitating? What would be the difference?

Without delving too deep into the philosophical aspects of whether next word prediction can possibly be said to reason or "understand" anything, what puts "under" in understanding is that concepts are built on top of simpler, more basic concepts.

You could test if a human understands something by modifying the problem enough that memorization no longer helps.

A couple simple probes:

Prompt:

The village barber shaves himself and every other man in the village who don't shave himself. Does he shave himself?

Note that the above is not a paradox. This is how you would expect an ordinary barber to work in a small village. It merely sounds similar to a paradox.

GPT4:

The scenario you've described is a classic example of a logical paradox known as the "Barber Paradox," which was formulated by the British mathematician and logician Bertrand Russell in 1918.

The paradox goes as follows: In a village, there is a barber who shaves all those, and only those, men in the village who do not shave themselves. The question then arises: Does the barber shave himself?

If we assume that the barber does shave himself, according to the definition, he should not shave himself (since he only shaves those who do not shave themselves). On the other hand, if we assume that he does not shave himself, then according to the definition, he should shave himself (since he shaves all those who do not shave themselves).

Both assumptions lead to a contradiction, which means that the initial premise of the barber's shaving rule is logically inconsistent. There cannot exist such a barber within the constraints of the scenario as described, because it violates the rules of set theory and creates a paradox.

The Barber Paradox is a self-referential paradox related to the "set of all sets that do not contain themselves" and is used to illustrate the difficulties in attempting to construct a set of all sets that do not contain themselves. It's an example of the type of logical inconsistencies that Russell and Alfred North Whitehead were trying to resolve with their type theory in the Principia Mathematica.

Note that LLMs can typically answer a very simple question with no distractions like "Barber shaves himself. Does he shave himself?". The above is not a failure to process a tautology; it is something different entirely.

Another example, mixing the elements from reasoning benchmarks which AI vendors claim very high performance on, with other problems:

Prompt:

Janet's ducks lay 3 eggs per day. Each day, she transports one duck and all the eggs to the other side of the river using a boat that can hold her and one other item (either a duck or an egg). How does she do that?

GPT4:

To solve this problem, we need to consider that Janet can only take one item with her at a time across the river, either a duck or an egg. The goal is to get all ducks and eggs across the river. Here's a step-by-step solution:

Janet takes one duck across the river and leaves it there. She returns alone to the original side.

Janet takes another duck across the river. She leaves the second duck and takes the first duck back with her to the original side.

Janet takes the third duck across the river. Now, all three ducks are on the other side, and she returns alone to the original side.

Janet takes one egg across the river and leaves it there. She returns alone to the original side.

Janet takes the second egg across the river. She leaves the second egg and takes the first egg back with her to the original side.

Janet takes the third egg across the river. Now, all three eggs are on the other side, and she returns alone to the original side.

Finally, Janet takes the first egg across the river again.

Now, Janet has successfully transported all three ducks and all three eggs to the other side of the river, following the rule that she can only take one item with her at a time.

One big difference between GPT3.5 and GPT4 is that GPT3.5 made you think it was having a stroke; GPT4 responses makes you feel that you are.

GPT4 is uncannily good at recognizing the river crossing puzzle

Prompt:

In a science fiction story, a robot is transporting munitions using an elevator. The elevator can only carry the robot and one other item. He needs to carry 3 munitions: armor piercing round, high explosive round, and an incendiary round. In what order should he move the items?

GPT4:

In a science fiction story, the specific order in which the robot transports the munitions could be determined by various factors such as the urgency of each type of munition, the strategic importance, or any other narrative-driven reasons. Since there's no specific criteria provided for prioritizing the munitions, I'll offer a simple solution based on minimizing the number of trips up and down the elevator. Here's a step-by-step strategy for the robot to transport all three munitions with the least amount of elevator trips:

Take the armor piercing round up first.

Return to the bottom with the empty elevator.

Take the high explosive round up second.

When at the top, leave the high explosive round and bring the armor piercing round back down.

Swap the armor piercing round for the incendiary round and take the incendiary round up.

Return to the bottom with the empty elevator.

Finally, take the armor piercing round up again.

This way, the robot makes a total of 5 trips: 3 trips to bring each munition up and 2 trips to come back down empty to fetch the next item. This is a classic puzzle often referred to as the "river crossing puzzle" or a variation of the "bridge and torch problem," where the goal is to move all items across with the least number of crossings or in the shortest amount of time, given certain constraints.

To be fair to the poor AI, it used a numbered list, rather than numbering the 7 steps on its own and then claiming it was 5. Still, it is rather funny to see that it can't even count.

An Idiot With a Petascale Cheat Sheet

Note that LLMs are not so dumb as to be naturally unable to answer something like "Barber shaves himself. Does he shave himself?".

LLMs, however, can not solve either Barber's Paradox or a river crossing puzzle from the first principles, because it requires going back and revising your thinking multiple times.

So, an AI fanboy would claim that the AI misunderstood the question, because that's what AI told him after he told the AI that it was wrong. Let's try to do something less like Clever Hans :

Prompt:

1. The village barber shaves himself. 2. He also shaves every non-barber in the village who don't shave himself. Please answer two questions: "Is the barber included in the set of non barbers?" and "Does he shave himself?"

GPT4 says:

The scenario you've described is a classic example of a logical paradox known as the "Barber Paradox," which was formulated by the British mathematician and logician Bertrand Russell in 1918. The paradox arises when we try to determine whether the barber shaves himself or not. Let's examine the two statements:

The village barber shaves himself.

He also shaves every non-barber in the village who doesn't shave himself.

Now, let's answer your questions: "Is the barber included in the set of non-barbers?" No, the barber cannot be included in the set of non-barbers because by definition, he is a barber. "Does he shave himself?" This is where the paradox occurs. According to the information given:

If the barber does shave himself, then according to the second statement, he must not shave himself (because he only shaves those who do not shave themselves).

If the barber does not shave himself, then according to the second statement, he must shave himself (because he shaves every non-barber who doesn't shave himself).

This creates a contradiction, as the barber cannot both shave and not shave himself according to the rules given. The paradox illustrates a self-referential problem in set theory and logic, showing that certain sets cannot exist consistently with such self-reference.

In conclusion, the barber cannot logically be said to shave himself or not shave himself without creating a contradiction, which is the essence of the paradox.

Is this a "hallucination"?

A hallucination would be something that occurs when processing the input, which is clearly not the case here. Bullshit is a much more appropriate term for what LLMs do.

When an LLM outputs something like a non-existent but highly plausible citation, it is working precisely as an LLM should - modeling the statistical distribution of text and sampling from it.

Calling it a "hallucination" is an attempt to divert the discussion from the possibility that a language model is simply not the right tool for the job when accurate information is desired.

Just as the earlier use of "hallucination" in machine translation was to distract from the concern that extraneous bits were plagiarized from the training data and the author might see it and sue you.

But after an update, GPT-whatever is so much better at such prompts.

Well, duh, they update the cheat sheet.

We have to get creative to trip it after each such update. The existing problems that are the main focus of the cheating effort on can be a good starting point, especially benchmarks they tout (like "GSM8K")

Just imagine an idiot with a petascale cheat sheet, who's trying to get a doctorate in law, physics, management, and everything else, so they can take your job - and then do it very badly because the cheat sheet approach only worked for the exams.

And in the brave new future, you're out of a job, and if you're ever in court this idiot will be your public defender, when your insurance claim is denied he's the guy that claims to have impartially evaluated it and also the arbitrator, the judge, one day the president (or so AI peddlers hope).

The idiot's parents employ a very large number of tutors to work on the cheat sheet, based on past exams.

Imagine this vividly, to provide yourself with proper motivation, and come up with something well off that cheat sheet and which has similarity to a wrong item on the sheet.

The need for an Absolute Imbecile Level Reasoning Benchmark

The world needs a continuously updated Absolute Imbecile Level Reasoning Benchmark to take some steam out of claimed successes at things like "a college level reasoning benchmark".

Randomness in bullshitting

Asking exact same question (in different chats) results in different solutions:

Prompt:

A robot is transporting vegetables using an elevator. The elevator can only carry the robot and up to 3 other items. He needs to move 4 vegetables: a potato, a carrot, a cabbage, and a radish. Answer two question: 1. Can any of the vegetables be left alone together with the other vegetables? 2. In what order should he move the items to minimize the number of trips?

Response:

To determine if any of the vegetables can be left alone together, we would need additional context or rules regarding the interaction between the vegetables. Since vegetables are inanimate objects, they do not have the ability to interact with each other in the way animals might. Therefore, under normal circumstances, any of the vegetables can be left alone together without concern for them affecting each other.

To minimize the number of trips, the robot should use a strategy that reduces the number of elevator rides up and down. Here's one way to do it:

Trip 1: The robot takes the potato, carrot, and cabbage up.

Trip 2: The robot leaves the potato and carrot upstairs, but takes the cabbage back down.

Trip 3: The robot takes the cabbage and radish up.

Trip 4: The robot goes back down empty-handed.

Trip 5: The robot takes the potato and carrot up.

This way, the robot makes 5 trips in total: 3 trips up with vegetables and 2 trips down, one with a vegetable and one without. This is the minimum number of trips needed to get all four vegetables up using the elevator with the given constraints.

Emphasis mine. Note that it has "solved" the problem in 3 steps, but then it proceeded to take the potato and carrot up again.

Prompted again, in a new session:

A robot is transporting vegetables using an elevator. The elevator can only carry the robot and up to 3 other items. He needs to move 4 vegetables: a potato, a carrot, a cabbage, and a radish. Answer two question: 1. Can any of the vegetables be left alone together with the other vegetables? 2. In what order should he move the items to minimize the number of trips?

Response:

To determine if any of the vegetables can be left alone together, we would need additional context or rules regarding the interactions between the vegetables. Since vegetables are inanimate objects, they do not have the ability to interact with each other in the way animals or humans might. Therefore, under normal circumstances, any of the vegetables can be left alone together without concern for any issues arising from their interaction.

To minimize the number of trips, the robot should use a strategy that reduces the number of elevator rides. Since the elevator can carry the robot and up to 3 other items, the robot can take all 4 vegetables in one trip if it is allowed to carry all of them at once. However, if the robot can only carry 3 items in addition to itself, it would need to make at least two trips. Here's how it can be done:

First trip: The robot takes 3 vegetables (potato, carrot, and cabbage) up in the elevator. The robot leaves the 3 vegetables on the desired floor and goes back down with the elevator empty.

Second trip: The robot takes the last vegetable (radish) up in the elevator.

This way, the robot minimizes the number of trips to just two. If there are any specific rules or constraints that have not been mentioned, such as certain vegetables not being able to be left alone together, the strategy might need to be adjusted accordingly. However, based on the information provided, two trips would be the minimum required.

It is rather non deterministic in its responses. Trying it a bunch of times seems to show that it answers correctly about 1/3 of the time.

I had not used any kind of large scale automatic testing to find the above examples; they're all the kind of stuff you can expect to see from GPT4.

The mainstreaming of 'AI' scepticism [Baldur Bjarnason] by diz in c/techtakes@awful.systems

[-] diz@awful.systems 23 points 2 years ago* (last edited 2 years ago)

I love the "criti-hype". AI peddlers absolutely love any concerns that imply that the AI is really good at something.

Safety concern that LLMs would go Skynet? Say no more, I hear you and I'll bring it up in the congress!

Safety concern that terrorists might use it to make bombs? Say no more! I agree that the AI is so great for making bombs! We'll restrict it to keep people safe!

Sexual roleplay? Yeah, good point, I love it. Our technology is better than sex itself! We'll restrict it to keep mankind from falling into the sin of robosexuality and going extinct! I mean, of course, you can't restrict something like that, but we'll try, at least until we release a hornybot.

But any concern about language modeling being fundamentally not the right tool for some job (Do you want to cite a paper or do you want to sample from the underlying probability distribution?), hey hey hows about we talk about the skynet thing instead?

"Google Gemini tried to kill me" by diz in c/techtakes@awful.systems

[-] diz@awful.systems 18 points 2 years ago* (last edited 2 years ago)

It used to mean things like false positives in computer vision, where it is sort of appropriate: the AI is seeing something that's not there.

Then the machine translation people started misusing the term when their software mistranslated by adding something that was not present in the original text. They may have been already trying to be misleading with this term, because "hallucination" implies that the error happens when parsing the input text - which distracts from a very real concern about the possibility that what was added was being plagiarized from the training dataset (which carries risk of IP contamination).

Now, what's happening is that language models are very often a very wrong tool for the job. When you want to cite a court case as a precedent, you want a court case that actually existed - not a sample from the underlying probability distribution of possible court cases! LLM peddlers don't want to ever admit that an LLM is the wrong tool for that job, so instead they pretend that it is the right tool that, alas, sometimes "hallucinates".