1130
Consequences
(pawb.social)
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
So is basically every human artist. Basically any artist out there has seen tons of other art prior and draws on that observed corpus to influence their own output. If I commissioned you to draw something you didn't know what was, you'd go look up other depictions of that thing to get a basis for what you should be aiming at.
The way AI does it is similar, except that it's looked at way more examples than you but also doesn't have an understanding of what those things actually are beyond the examples themselves. That last bit is why it used to have so many problems with hands, and still often has problems with writing in the background or desk/table legs.
We can actually look at a hand, and understand it, logically thinking about the composition and style to work with. AI can only copy paste the difference of pixels' colors on digital images whose metadata happens to contains the word 'hand'. No matter how many 'examples' have been scraped, it can't actually interpret them the same way we do.
If some alien species asked you to draw part of it's anatomy that can move into a wide array of configurations, but you are required to do so based only on pictures the aliens sent you that they tell you shows that part among other things, would you do better?
Like, what you said is specifically why it's bad at hands and table legs and the like - they can appear in many different ways and it's only reference point for them is pictures of them it's seen. You understand hands and think logically about them mostly because you have a not just wider but deeper set of experiences to work from. Even then, 4 fingered hands have been common in cartoons because even having hands, being surrounded by other beings with hands and in a culture that makes heavy use of hands a lot of artists have trouble doing them quite right.
Yes, I would do better. I would take a look at the pictures, and think about the angles / geometry, the reason of differences between the pictures, and being able to count sure helps. If they were to show me pictures in a vastly different style, I would make assumptions, like it is a different representation of the same concept. I would not just mash them together based on color values.
I get what you're coming from, but the only reason these models seem to be able to get stuff done, is the insane amount of training data and iterations.
Enjoying this discussion, by the way! It's fun to think about.