[-] diz@awful.systems 8 points 1 week ago

I’d just write the list then assign randomly. Or perhaps pseudorandomly like sort by hash and then split in two.

One problem is that it is hard to come up with 20 or more completely unrelated puzzles.

Although I don’t think we need a large number for statistical significance here, if it’s like 8/10 solved in the cheating set and 2/10 in the hold back set.

[-] diz@awful.systems 9 points 2 weeks ago* (last edited 2 weeks ago)

I swear I’m gonna plug an LLM into a rather traditional solver I’m writing. I may tuck deep into the paper a point how it’s quite slow to use an LLM to mutate solutions in a genetic algorithm or a swarm solver. And in any case non LLM would be default.

Normally I wouldn’t sink that low but I got mouths to feed, and frankly, fuck it, they can persist in this madness for much longer than I can stay solvent.

This is as if there was a mass delusion that a pseudorandom number generator can serve as an oracle, predicting the future. Doing any kind of Monte Carlo simulation of something like weather in that world would of course confirm all the dumb shit.

[-] diz@awful.systems 9 points 2 weeks ago

I wonder what's gonna happen first, the bubble popping or Yudkowsky getting so fed up with gen AI he starts sneering.

[-] diz@awful.systems 11 points 3 weeks ago* (last edited 3 weeks ago)

I think it could work as a minor gimmick, like terminal hacking minigame in fallout. You have to convince the LLM to tell you the password, or you get to talk to a demented robot whose brain was fried by radiation exposure, or the like. Relatively inconsequential stuff like being able to talk your way through or just shoot your way through.

Unfortunately this shit is too slow and too huge to embed a local copy of, into a game. You need a lot of hardware compatibility. And running it in the cloud would cost too much.

[-] diz@awful.systems 10 points 1 month ago

Yeah. I’d love to see the prompt, gab’s nazi ai prompt was utterly pathetic and this one got to be pretty bad as well.

[-] diz@awful.systems 8 points 2 months ago* (last edited 2 months ago)

He’s such a complete moron. He doesn’t want to recite “DEI shibboleths”? What does he even think that would refer to? Why shibboleths?

To spell it out, that would refer to an antisemitic theory that the reason (for example) some black guy would get a medal of honor (the “deimedal”) is because of the jews.

I swear this guy is dumber than Trump. Trump for all his rambling, uses actual language - Trump understands what the shit he is saying means to his followers. Scott… he really does not.

[-] diz@awful.systems 11 points 2 months ago* (last edited 2 months ago)

Yeah I think the best examples are everyday problems that people solve all the time but don't explicitly write out solutions step by step for, or not in the puzzle-answer form.

It's not even a novel problem at all, I'm sure there's even a plenty of descriptions of solutions to it as part of stories and such. Just not as "logical puzzles" due to triviality.

What really annoys me is when they claim high performance on benchmarks consisting of fairly difficult problems. This is basically fraud, since they know full well it is still entirely "knowledge" reliant, and even take steps to augment it with generated problems and solutions.

I guess the big sell is that it could use bits and pieces of logic gleaned from other solutions to solve a "new" problem. Except it can not.

[-] diz@awful.systems 10 points 2 months ago* (last edited 2 months ago)

Yeah, exactly. There's no trick to it at all, unlike the original puzzle.

I also tested OpenAI's offerings a few months back with similarly nonsensical results: https://awful.systems/post/1769506

All-vegetables no duck variant is solved correctly now, but I doubt it is due to improved reasoning as such, I think they may have augmented the training data with some variants of the river crossing. The river crossing is one of the top most known puzzles, and various people have been posting hilarious bot failures with variants of it. So it wouldn't be unexpected that their training data augmentation has river crossing variants.

Of course, there's very many ways in which the puzzle can be modified, and their augmentation would only cover obvious stuff like variation on what items can be left with what items or spots on the boat.

[-] diz@awful.systems 8 points 11 months ago

Perhaps it was near ready to emit a stop token after "the robot can take all 4 vegetables in one trip if it is allowed to carry all of them at once." but "However" won, and then after "However" it had to say something else because that's how "however" works...

Agreed on the style being absolutely nauseating. It wasn't a very good style when humans were using it, but now it is just the style of absolute bottom of the barrel, top of the search results garbage.

[-] diz@awful.systems 9 points 11 months ago* (last edited 11 months ago)

I think you can make a slight improvement to Wolfram Alpha: using an LLM to translate natural language queries into queries WA can consume, then feeding them into WA. WA always reports exactly what it computed, so if it "misunderstands" you, it's a lot easier to notice.

The problem here is that AI boys got themselves hyped up for it being actually intelligent, so none of them would ever settle for some modest application of LLMs. Google fired the authors of "stochastic parrot" paper, AFAIK.

simply pasting LLM output into CAS input and then the CAS output back into LLM input (which, let’s be honest, is the first thing tech bros will try as it doesn’t require much basic research improvement), will not help that much and will likely generate an entirely new breed of hilarious errors and bullshit (I like the term bullshit instead of hallucination, it captures the connotation errors are of a kind with the normal output).

Yeah I have examples of that as well. I asked GPT4 at work to calculate the volume of 10cm long, 0.1mm diameter wire. It seems to be doing correct arithmetic by some mysterious means which do not use scientific notation, and then the LLM can not actually count so it miscounts zeroes and outputs a result that is 1000x larger than the correct answer.

[-] diz@awful.systems 9 points 1 year ago

GPT4 supposedly (it says that it is GPT4). I have access to one that is cleared for somewhat sensitive data, so presumably my queries aren't getting flagged and human reviewed by OpenAI.

[-] diz@awful.systems 11 points 1 year ago

YOU CAN DO THAT WITHOUT AI.

Can they, though? Sure, in theory Google could hire millions of people to write overviews that are equally idiotic, but obviously that is not something they would actually do.

I think there's an underlying ethical theory at play here, which goes something like: it is fine to fill internet with half-plagiarized nonsense, as long as nobody dies, or at least, as long as Google can't be culpable.

view more: ‹ prev next ›

diz

joined 2 years ago