Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If only AI was not completely and utterly useless for any unique problems for which there isn't extreme amounts of available training data. You know, something any competent programmer knows and has already known for years. And these problems end up being involved in basically every single non-trivial application and after not very long into development on those applications. If only AI didn't very readily and aggressively lead you down very bad rabbit holes when it makes large changes or implementations even on code-bases for which there is ample training data, because that's just the nature of how it works. It doesn't fact check itself, it doesn't compare different approaches, it doesn't actually summarize and effectively utilize the "wisdom of the crowd", it just makes stuff up. It makes up whatever looks the most correct based on its training data, with some randomness added. Turns out that's seriously unhelpful in important ways for large projects with lots of different technical and architectural decisions that have to make tradeoffs and pick a specific road among multiple over and over again.

Really sick and tired of these AI grifters. The bubble needs to pop already so these scammers can go bankrupt and we can get back to a rational market again.



I get it. I've been through cycles of this over the past three years, too. Used a lot of various tools, had a lot of disappointment, wasted a lot of time and money.

But this is the kinda the whole point of my post...

In our system, we added fact checking itself, comparing different approaches, summarizing and effectively utilizing the "wisdom of the crowd" (and it's success over time).

And it made it work massively better for even non-trivial applications.


You're going to have to put quotes around "fact checking" if you're using LLMs to do it.

"comparing different approaches, summarizing and effectively utilizing the "wisdom of the crowd" (and it's success over time)"

I fail to see how this is defensible as well.


Compiling and evaluating output are types of fact checking. We've done more extensive automated evaluations of "groundedness" by extra ting factual statements and seeing whether or not they are based on input data or hallucinated. There are many techniques that work well.

For comparisons, you can ask the model to eval on various axis e.g. reliability, maintainability, cyclometeic complexity, API consistency, whatever, and they generally do fine.

We run multi-trial evals with multiple inputs across multiple semantic and deterministic metrics to create statistical scores we use for comparisons... basically creating benchmark suites by hand or generated. This also does well for guiding development.


And by "wisdom of the croud", I'm referring to sharing what works well and what doesn't and building good approaches into the frameworks... encoding human expertise. We do it all the time.


Also... "scammer and AI grifter"?? Damn dude. It's any early-stage open-source experiment result and, mostly, just talking about how it makes me question whether or not I'll be programming in the future. Nobody's asking for your money.


My last comment wasn't really directed at you it just reminded me of how I feel about the whole scene right now.


I feel that. I've been on an emotional roller-coaster for three years now. I didn't expect any of this before then. :O




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: