I don't think it's that SD and LLMs are solutions looking for problems, it's that there are very clear problems to which they provide 90% of a solution and make it impossible to clear the last 10%.
They're the new WYSIWYG/low-code. Everyone that doesn't fully understand the problem space thinks they're some ultimate solution that is going to revolutionise everything. People that do are responding with a resounding 'meh'.
Stable Diffusion is a great example. Something that can generate consistent game assets would be an absolute game changer for the entire game industry and open up a new wave of high tech indie game development, but despite every "oh wow" demo hitting the front page of HN, we've had the tech for a couple of years now and the only thing that's come out of it is some janky half solutions (3D meshes from pictures that are unworkable in real games, still no way to generate assets in consistent styles without a huge amount of complex tinkering) and a bunch of fucking hentai lol.
Human work is much more deterministic than AI as it encompasses a lot more constraints than what the task specified. If you take concept art creations, while the brief may be a few sentences, the artist knows to respect anatomy and perspective rules, as well as some common definitions (when the brief says ship, you know that it’s the ship concept approved last week). As an artist, I’ve used reference pictures, dolls, 3d renders and one of the most aspect these tools had was consistency. I don’t see Large Models be consistent without another models applying constraint to what they’re capable of producing, like rules defining correct anatomy and extracting data that defines a character. The fact is we do have tools like MakeHuman [0], Marvelous Designer [1], and others that let you generate ideas that are consistent in their flexibility.
I look at Copilot and it’s been the same for me. I’m either working on a huge codebase and most of the time, it means tweaking and refactoring, which is not something I trust a LLM with. Or it’s a greenfield project and I usually write only the necessary code for a task and boilerplate generation is not a thing for me. Coding for me is like sculpting and LLM-based solutions feel like trying to do with bricks attached to my feet. You can get something working if you’re patient enough, but it’s make more sense and it’s more enjoyable to just use your fingers.
And even a lot of the hentai is fucking worthless for the same reasons! Try generating some kinbaku. It's really hard to get something where all the rope actually connects and interacts sensibly because it doesn't actually know what a knot is. Instead, you end up with M. C. Escher: Fetish Edition.
I can shed some light on this phenomena: These models are trained on many images but no thought is put into the "generalisation" aspect the ML community was so obsessed with during the deep-learning era.
It's very easy to create a "Stochastic Parrot" but I'm quite sure these models are capable of learning underlying information such as correct layout of a knot - given the right data and curriculum of course. Maybe slight architecture tweaks.
I'm sure this is the reason we're starting to see a normal amount of fingers or ability to write text. Proof of concept was 2015 until 2022 now we're starting to see interesting things come out of the workshops.
Pretty much what happened with Speech Recognition for 30 years. That last 10% had to be handled manually. Even if you get 90% right, it still means ever second sentence has issues. And as things scale up the costs of all that manual behind the scenes hacking scale up too. We underestimated how many issues involved Ambiguity - where N people see the same thing and have N different interpretations. So you see a whole bunch of Speech Rec companies rising and falling over time.
Now things are pretty mature, but it took decades to get there but there is still a whole bunch of hacks upon hacks behind the scenes. Same story will repeat with each new problem domain.
We use Whisper for automatic translation, supposedly SotA, but we have to fix its output, I would say, very often. It repeats things, translates things for no reason, has trouble with numbers.. it's improved in leaps and bounds but I'd say that speech recognition doesn't seem to be there yet.
A game with no consistency in the art is probably enabled. We've crossed the threshold where something like Magic the Gathering could be recreated by a tiny team and a low budget.
I don't think the limiting factor here is the software; it looks like we got AI-generated art pretty much as soon as consumer graphics cards could handle it (10 years ago it would have been quite hard). I'd be measuring progress in hardware generations not years and from that perspective Stable Diffusion is young.
Current AI is entirely incapable of generating the balanced and fun/engaging rule sets required for a MtG style game. Sure the art assets could be generated with skilled prompting and touchup but even that is nowhere close to the strong statement you made.
OP likely meant that a Midjourney-level AI can easily generate all the card art.
Obviously, current AIs cannot generate game rulesets because the game feel is an internal phenomenon that cannot be represented in the material domain and therefore AIs cannot train on it.
They're the new WYSIWYG/low-code. Everyone that doesn't fully understand the problem space thinks they're some ultimate solution that is going to revolutionise everything. People that do are responding with a resounding 'meh'.
Stable Diffusion is a great example. Something that can generate consistent game assets would be an absolute game changer for the entire game industry and open up a new wave of high tech indie game development, but despite every "oh wow" demo hitting the front page of HN, we've had the tech for a couple of years now and the only thing that's come out of it is some janky half solutions (3D meshes from pictures that are unworkable in real games, still no way to generate assets in consistent styles without a huge amount of complex tinkering) and a bunch of fucking hentai lol.