eating our own dogshit : techtakes

[-] sailor_sega_saturn@awful.systems 26 points 5 months ago* (last edited 5 months ago)

Ah yes the typical workflow for LLM generated changes:

LLM produces nonsense at the behest of employee A.
Employee B leaves a bunch of edits and suggestions to hammer it into something that's sloppy but almost kind of makes sense. A soul-sucking error prone process that takes twice as long as just writing the dang code.
Code submitted!
Employee A gets promoted.

Also the fact that this isn't integrated with tests shows how rushed the implementation was. Not even LLM optimists should want code changes that don't compile or that break tests.

[-] wjs018@piefed.social 16 points 5 months ago* (last edited 5 months ago)

I just looked at the first PR out of curiosity, and wow...

this isn't integrated with tests

That's the part that surprised me the most. It failed the existing automation. Even after prompted to fix the failing tests, it proudly added a commit "fixing" it (it still didn't pass...something that copilot should really be able to check). Then the dev had to step in and say why the test was failing and how to fix the code to make it pass. With this much handholding all of this could have been done much faster and cleaner without any AI involvement at all.

[-] zbyte64@awful.systems 13 points 5 months ago

The point is to get open source maintainers to further train their program because they already scraped all our code. I wonder if this will become a larger trend among corporate owned open source projects.

[-] Soyweiser@awful.systems 24 points 5 months ago* (last edited 5 months ago)

No real understanding of what it's doing, it's just guessing.

Are they talking about the LLMs or the people who think just chatting with the LLM will fix it? :)

E: from a comment about this on hackernews:

Funniest PRs are the ones that "resolves" test failures by removing/commenting out the test cases, or change the assertions.

Perfect, no notes. Ship

[-] swlabr@awful.systems 24 points 5 months ago

you all joke, but my mind is so expanded by stimulants that I, and only I, can see how this dogshit code will one day purchase all the car manufacturers and build murderbots

[-] Soyweiser@awful.systems 9 points 5 months ago

Look, im def on team Murderbot, but when ~~we~~ the AI's start building them I really hope Martha Wells gets some kickbacks at least.

[-] o7___o7@awful.systems 8 points 5 months ago* (last edited 5 months ago)

I love how Wells has given us both a great series of stories AND a jokey terminator analog to diffuse the mAnLy trope of building and/or fighting terminators.

[-] ReversalHatchery@beehaw.org 14 points 5 months ago

just look at it. it is not enough that AI is boiling the planet, but with every iteration of copilot, all those automatic checks are reran! on the first mentioned PR, the checks have been running for 20 minutes when I'm reading it, and there's like a dozen of them!

other projects have to pay for processing time on github actions!!

this is insanity

[-] Irelephant@lemm.ee 11 points 5 months ago

Came here to post this, funnily enough.

We're poisoning people's air for this.

[-] sturger@sh.itjust.works 10 points 5 months ago

Microsoft claims that AI can replace human programmers. Why doesn't Microsoft just do so and let an AI "fix" the problems reported by AIs?

Not sure why they're even involving human employees in this problem. /s

[-] swlabr@awful.systems 8 points 5 months ago

Someone should write a script that estimates how much time has been spent re-fondling LLMPRs on Github.

[-] o7___o7@awful.systems 5 points 5 months ago

An image of a Github-themed restaurant that serves poop burgers.

[-] Kowowow@lemmy.ca 4 points 5 months ago

Is there a reason why that ai "evolution" thing don't work for code? In theory shouldn't it be decent at least

[-] o7___o7@awful.systems 12 points 5 months ago* (last edited 5 months ago)

zbyte64 gave a great answer. I visualize it like this:

Writing software that does a thing correctly within well defined time and space constraints is nothing like climbing a smooth gradient to a cozy global maximum.

On a good day, it's like hopping on a pogo stick around a spiky, discontinuous, weirdly-connected n-dimensional manifold filled with landmines (for large values of n).

The landmines don't just explode. Sometimes they have unpredictable comedic effects, such as ruining your weekend two months from now.

Evolution is simply the wrong tool for the job.

[-] scruiser@awful.systems 10 points 5 months ago

To elaborate on the other answers about alphaevolve. the LLM portion is only a component of alphaevolve, the LLM is the generator of random mutations in the evolutionary process. The LLM promoters like to emphasize the involvement of LLMs, but separate from the evolutionary algorithm guiding the process through repeated generations, LLM is as likely to write good code as a dose of radiation is likely to spontaneously mutate you to be able to breathe underwater.

And the evolutionary aspect requires a lot of compute, they don't specify in their whitepaper how big their population is or the number of generations, but it might be hundreds or thousands of attempted solutions repeated for dozens or hundreds of generations, so that means you are running the LLM for thousands or tens of thousands of attempted solutions and testing that code against the evaluation function everytime to generate one piece of optimized code. This isn't an approach that is remotely affordable or even feasible for software development, even if you reworked your entire software development process to something like test driven development on steroids in order to try to write enough tests to use them in the evaluation function (and you would probably get stuck on this step, because it outright isn't possible for most practical real world software).

Alphaevolve's successes are all very specific very well defined and constrained problems, finding specific algorithms as opposed to general software development

[-] Evinceo@awful.systems 4 points 5 months ago

Imagine if it was trying to build a feature for a client who needs to look at a demo every time.

[-] mountainriver@awful.systems 2 points 5 months ago

Throwing rocks into an increasing pile and after each throw burning offerings to evaluate how much closer we are getting to having a cathedral.

Possibly the worst way of building a cathedral, except it doesn't really qualify as "building a cathedral". Some workmen may take the pile of rocks and build an actual cathedral, but that doesn't mean that the step of throwing rocks was necessary or desirable.

[-] zbyte64@awful.systems 8 points 5 months ago

Talking about Alpha Evolve https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ ?

First, Microsoft isn't using this yet but even if they were it doesn't work in this context. What Google did was they wrote a fitness function to tune the Generative process. Why not have some rubric that scores the code as our fitness function? Because the function needs to be continuous for this to work well, no sudden cliffs. But also they didn't address how this would work in a multi-objective space, this technique doesn't let the LLM make reasonable trade offs between complexity and speed.

[-] Kowowow@lemmy.ca 4 points 5 months ago

I forgot about alpha evolve, with all the flashy titles about I figured it wasn't a big deal, I was more talking about the low level stuff I guess like "ai learns to play mario/walk" but I imagine it follows the same logic the other comment talks about

[-] FredFig@awful.systems 6 points 5 months ago* (last edited 5 months ago)

If you're referring to genetic algorithms, those work by giving the computer some type of target to gun for that's easy to measure and then letting the computer go loose with randomly changing bits from the original object. I guess in your mind, that'd be randomly evolving the codebase and then submitting the best version.

There's a lot of problems with the idea of genetic codebase that I'm sure you can figure out on your own, but I'll give you one for free: "better code" is a very hard thing for computers to measure.

[-] V0ldek@awful.systems 6 points 5 months ago* (last edited 5 months ago)

For LLMs specifically? Code is not text, aside from the most clinical, dictionary definition of "text".

But even then, it also fails at writing coherent short or longform, so even if code was "just text" it'd fail equally badly.

[-] Kowowow@lemmy.ca 1 points 5 months ago

That's too bad I was hoping to be able to one day revive dead/abandoned mods using some form of ai, I was thinking of training it off of how the old mods funtioned in the last working version maybe it could find what needs to be changed

[-] froztbyte@awful.systems 5 points 5 months ago

....the fuck is this post

[-] o7___o7@awful.systems 7 points 5 months ago

I think that what we've got here is a genuine victim of the hype.

[-] froztbyte@awful.systems 4 points 5 months ago

you could be right

[-] Kowowow@lemmy.ca 2 points 5 months ago

Ah I didn't realize it was that big of an impossibility to get ai to update old minecraft mods no one else is interested in, no way I'll ever be able to learn to code without going back to highschool

[-] Evinceo@awful.systems 9 points 5 months ago

You can definitely learn how to code, I believe in you.

[-] o7___o7@awful.systems 7 points 5 months ago* (last edited 5 months ago)

It's alright! There's a multibillion dollar advertising operation working to convince us that generative AI can do these sorts of things. Plus, it's always tough to go through someone else's work, much less mods from a decade ago that were written by ambitious amateurs. I couldn't read my own code after a couple of months if I wasn't such an absurd over-commenter.

If you want a chill intro to the real situation, I highly recommend this episode of On the Media that had Ed Zitron on. You could knock it out over a commute or two no problem: https://www.wnycstudios.org/podcasts/otm/articles/brooke-talks-ai-with-ed-zitron

A funnier and angrier thing that explains why this won't work can be found here: https://ludic.mataroa.blog/blog/i-will-fucking-piledrive-you-if-you-mention-ai-again/

[-] Kowowow@lemmy.ca 1 points 5 months ago

This all bad news, guess I'll never see an emulator written based on how the original game plays, but that would only matter if it was any more dmca proof than humans doing it and I know even less about that

[-] froztbyte@awful.systems 3 points 5 months ago

your posts keep just slinging words together and it’s just fucking weird

[-] froztbyte@awful.systems 3 points 5 months ago

once again: the fuck is this post

[-] Kowowow@lemmy.ca 1 points 5 months ago

I don't understand

[-] froztbyte@awful.systems 3 points 5 months ago

negative reactions? to your own shitty posts?

well fuck damn, I wonder what’s confusing

[-] Kowowow@lemmy.ca 1 points 5 months ago

Ok it just seemed like you were confused but if you don't like what I posted that makes sense

[-] froztbyte@awful.systems 3 points 5 months ago

yes, you’ve gotta be right, that must be exactly what’s happening. absolutely no other possibilities.

[-] dgerard@awful.systems 3 points 5 months ago

yeah, I think that's enough

TechTakes