Somewhere in the net of tubes of our AC we have a machine that produces rocks. They randomly shoot of the air vents, please install ballistic shields in front of the vents to stop them from hitting our customers.
Which sounds insane until you realise that you’ve just described in outline something very like the iron dome missile defence system, which actually exists in reality.
(And of course you’ll get no argument from me that it’s insane that such things need to exist at all, but such is the world we live in.)
We didn't see the majority switch from Google to Duckduckgo because of ads or privacy... Being the "default" brings network effects that is hard to switch away.
Football anime doesn’t involve real people or stakes. AI can introduce a storyline, characters, etc. It won't necessarily be as popular as the real sport but I doubt the audience is zero.
I'm aware this sounds like a "no true Scotsman" argument, but I said "would a football fan watch" and the people who would watch that are not football fans.
I don't mean to denigrate it though, what I'm saying is that media would be serving a totally different purpose than the one served by professional sport today.
I guess, people are very varied, there are probably SOME strange people who watch football today with a motivation that's compatible with AI.
Also, this doesn't mean AI football wouldn't be useless. And there could even be people who watch both, since they could scratch different itches. I said I "don't give a shit" about AI art but that's not really true, it's useful, I'm glad kebab shops get a cheap way to decorate their menus. I'm sure people are getting porn generated that matches their incredibly bizarre kinks and I'm glad they get to jerk off better than they used to.
But I guess what I really am sure of is that AI can't REPLACE human art any more than it can replace football.
Then little toddler LLM will announce something like “I implemented what you requested and we’re all done. You can run the lint now.” And I’ll reply “do it yourself.”
I can only assume that everyone reporting amazing success with agent swarms and very long running tasks are using a different harness than I am :)
There's 3x more React libraries and code out there to reference. AI agents do _a lot_ better with React than, say, SolidJS. So productivity > performance is a tradeoff that many companies seem to happily take.
There's so many React libraries because the framew... I mean library is extremely barebones.
That is, in my view, the main issue with React (rendering model notwithstanding): every React application is composed of tens of tiny dependencies and since there are several competing libraries for even basic things like routing, no two projects are the same.
Umm, are we really taking "being AI friendly" as an account to choose framework now? I’ve seen this mentioned often enough lately that it’s making me uncomfortable. I fear if this will becomes epidemic. This will literally hault progress in improving and adopting better frameworks.
As a solo dev who picked SolidJS, yes, it is a big factor. Things like tldraw, node editors, wysiwyg editors, etc. Having to reimplement them is a huge time sink.
I’m always surprised when people bother to point out more-subtle flaws in AI images as “tells”, when the “depth-of-field problem” is so easily spotted, and has been there in every AI image ever since the earliest models.
The blur isn't correct though. Like the amount of blur is wrong for the distance, zoom amount etc. So the depth of field is really wrong even if it conforms to "subject crisp, background blurred"
My personal mechanistic understanding of diffusion models is that, "under the hood", the core thing they're doing, at every step and in every layer, is a kind of apophenia — i.e. they recognize patterns/textures they "know" within noise, and then they nudge the noise (least-recognizable pixels) in the image toward the closest of those learned patterns/textures, "snapping" those pixels into high-activation parts of their trained-in texture-space (with any text-prompt input just adding a probabilistic bias toward recognizing/interpreting the noise in certain parts of the image as belonging to certain patterns/textures.)
I like to think of these patterns/textures that diffusion models learn as "brush presets", in the Photoshop sense of the term: a "brush" (i.e. a specific texture or pattern), but locked into a specific size, roughness, intensity, rotation angle, etc.
Due to the way training backpropagation works (and presuming a large-enough training dataset), each of these "brush presets" that a diffusion model learns, will always end up learned as a kind of "archetype" of that brush preset. Out of a collection of examples in the training data where uses of that "brush preset" appear with varying degrees of slightly-wrong-size, slightly-wrong-intensity, slightly-out-of-focus-ness, etc, the model is inevitably going to learn most from the "central examples" in that example cluster, and distill away any parts of the example cluster that are less shared. So whenever a diffusion model recognizes a given one of its known brush presets in an image and snaps pixels toward it, the direction it's moving those pixels will always be toward that archetypal distilled version of that brush preset: the resultant texture in perfect focus, and at a very specific size, intensity, etc.
This also means that diffusion models learn brushes at distinctively-different scales / rotation angles / etc as entirely distinct brush presets. Diffusion models have no way to recognize/repair toward "a size-resampled copy of" one of their learned brush presets. And due to this, diffusion models will never learn to render in details small enough that the high-frequency components of of their recognizable textural-detail would be lost below the Nyquist floor (which is why they suck so much at drawing crowds, tiny letters on signs, etc.) And they will also never learn to recognize or reproduce visual distortions like moire or ringing, that occur when things get rescaled to the point that beat-frequencies appear in their high-frequency components.
Which means that:
- When you instruct a diffusion model that an image should have "low depth-of-field", what you're really telling it is that it should use a "smooth-blur brush preset" to paint in the background details.
- And even if you ask for depth-of-field, everything in what a diffusion model thinks of as the "foreground" of an image will always have this surreal perfect focus, where all the textures are perfectly evident.
- ...and that'll be true, even when it doesn't make sense for the textures to be evident at all, because in real life, at the distance the subject is from the "camera" in the image, the presumed textures would actually be so small as to be lost below the Nyquist floor at anything other than a macro-zoom scale.
These last two problems combine to create an effect that's totally unlike real photography, but is actually (unintentionally) quite similar to how digital artists tend to texture video-game characters for "tactile legibility." Just like how you can clearly see the crisp texture of e.g. denim on Mario's overalls (because the artist wanted to make it feel like you're looking at denim, even though you shouldn't be able to see those kinds of details at the scaling and distance Mario is from the camera), diffusion models will paint anything described as "jeans" or "denim" as having a crisply-evident denim texture, despite that being the totally wrong scale.
It's effectively a "doll clothes" effect — i.e. what you get when you take materials used to make full-scale clothing, cut tiny scraps of those materials to make a much smaller version of that clothing, put them on a doll, and then take pictures far closer to the doll, such that the clothing's material textural detail is visibly far larger relative to the "model" than it should be. Except, instead of just applying to the clothing, it applies to every texture in the scene. You can see the pores on a person's face, and the individual hairs on their head, despite the person standing five feet away from the camera. Nothing is ever aliased down into a visual aggregate texture — until a subject gets distant enough that the recognition maybe snaps over to using entirely different "brush preset" learned specifically on visual aggregate textures.
Right, prompting for depth of field will never work (with current models) because it treats it as a style rather than knowing on some level how light and lenses behave. The model needs to know this, and then we can prompt it with the lens and zoom and it will naturally do the rest. Like how you prompt newer video models without saying "make the ball roll down the hill"
I spent more than an hour writing the above comment, with my own two human hands, spending real thinking time on inventing some (AFAIK) entirely-novel educational metaphors to contribute something unique to the discussion. And you're going to ignore it out-of-hand because, what, you think "long writing" is now something only AIs do?
Kindly look at my commenting history on HN (or on Reddit, same username), where I've been writing with exactly this long and rambling overly-detailed "should have been a blog post" style going on 15+ years now.
Then, once you're convinced that I'm human, maybe you'll take this advice:
A much more useful heuristic for noticing textual AI slop than "it's long and wordy" (or "it contains em-dashes"), is that, no matter how you prompt them, LLMs are constitutionally incapable of writing run-on sentences (like this one!)
Basically every LLM base model at this point, has been RLHFed by feedback from a (not necessarily native-English-speaking, not necessarily highly literate) userbase. And that has pushed the models toward a specific kind of "writing for readability", that aims for a very low lowest-common-denominator writing style... but in terms of grammar/syntax, rather than vocabulary. These base models (or anything fine-tuned from them) will consistently spew out these endless little atomic top-level sentences — one thought per sentence, or sometimes even several itty-bitty sentences per thought (i.e. the "Not x. Not y. Just z." thing) — that can each be digested individually, with no mental state-keeping required.
It's a very inhuman style of writing. No real human being writes like LLMs do, because it doesn't match the way human beings speak or think. (You can edit prose after-the-fact to look like an LLM wrote it, but I dare you to try writing that way on your first draft. It's next to impossible.)
Note how the way LLMs write, is exactly the opposite of the way I write. My writing requires a high level of fluency with English-language grammar and syntax to understand! Which makes it actually rather shitty as general-purpose prose. Luckily I'm writing here on HN for an audience that skews somewhat older and more educated than the general public. But it's still not a style I would subject anyone to if I bothered to spend any time editing what I write after I write it. My writing epitomizes the aphorism "I wrote you a long letter because I didn't have the time to write you a short one." (It's why these are just HN comments in the first place; if I had the time to clean them up, then I'd make them into blog posts!)
Apologies, I did jump the gun here. There has been more and more lazy LLM replies on HN lately and yours raised a flag in my mind because I can't remember someone commenting that deeply while also agreeing with me (normally if it's a lengthy response it's because they are arguing against my point).
There are some enlightening points here about LLM writing style for me. Trying to write like an LLM being impossible (at least for a non-trivial length of text) is such a good point. Run on sentences as another hint that it's not an LLM is also useful. Thanks!
Which is pretty amusing - because it's the exact opposite problem that BFL had with the original Flux model - every single image looked like it was taken with a 200mm f/4.
For $10 flat per request up to 128k tokens they’re losing money. 100 * 100k is 10m tokens. At current api pricing that’s $50 input tokens, not even accounting for output!
Having worked some time in huge businesses, I can assure that there are many corporate copilot subscribers that never use it, that's where they earn money.
In the past we had to buy an expensive license of some niche software, used by a small team, for a VP "in case he wanted to look".
Worse in many gov agencies, whenever they buy software, if it's relatively cheap, everyone gets it.
It might be a gym-type situation, where the average of all users just ends up being profitable. Of course it could be bait-and-switch to get people committed to their platform.
Eventually, you can standardize what you don't understand
The problem I see now is that everyone wants to be the winner in a hype cycle and be the standards bringer. How many "standards" have we seen put out now? No one talks about MCP much anymore, langchain I haven't seen in more than a year, will we be talking about Skills in another year?
So set up e2e tests and make sure it does things you said you wanted. Just like how you use a library or database. Trust, but verify. Only if it breaks do you have to peak under the covers.
Sadly people do not care about redundant and verbose code. If that was a concern, we wouldn't have 100+mb of apps, nor 5mb web app bundles. Multibillion b2b apps shipping a 10mb json file just for searching emojis and no one blinks an eye.
The effort to set up e2e tests can be more than just writing the thing. Especially for UI as computers just does not interpret things as humans do (spatial relation, overflow, low to no contrast between elements).
Also, the assumption that you can do ___ thing (tests, some dumb agent framework, some prompting trick), and suddenly magically all of the problems with LLMs vanish, is very wrong and very common.