Hacker Newsnew | past | comments | ask | show | jobs | submit | rnxrx's commentslogin

Thus does kind of beg the question: If developers are being laid off because AI is better/faster/cheaper or makes all their people 10x or whatever fig leaf, what happens if the required tooling ends up being more expensive? From the investor’s point of view is the drag of employee costs better or worse than a ballooning expense item?

They lay people off and look good in front of investors. Then they hire people, talk about "growth", and once again look good in front of investors.

This would never fly if stock market was rational. But it never is.


And if/when companies need to scale back their ai investments they can spin it too and the stock market will eat it up.

They are just being AI efficient, and doing more while spending less on it :)

I wonder if this will happen before they have some obligatory debloating of the investors exposition to the company.


I suppose if it all works out it'll end up way more expensive than the employees the models displaced ever were. These kinds of technologies usually end up as an oligopoly at best, and those players will have a wide moat by then, and the things these models build will be tweaked such that no other model or human being can realistically work on them anymore, and then they can price gouge everyone to the brink of unprofitability.

At least the models don’t need health insurance, office space, a cafeteria, or have a threat of unionizing.

The model provider would be like a union, at least if unions had absolute control over their members, could take them all away at any time forever with no substantial negative consequences to itself, and spend billions on employer lock-in so switching to the competition is worse than paying the 12% model salary raise.

Because they are not people or alive, you can literally torture them if it gives you a mild increase in performance. For all practical purposes you can't do that to living humans. What is the price to put on being able to do that? It might weight the scales a bit for some employers.

Shh, that's the quiet part the investors don't want to say outloud.

There's 10-15 labs near the frontier, and like 30 serious inference providers, over 70 total on OpenRouter.

With research and hardware near guaranteed to bring the efficiency way up, I'm not scared here of massive price hikes.

There is no moat.


> If developers are being laid off because AI is better/faster/cheaper

This is, in my opinion, tripe. SWEs are being laid off because of post-Covid over-hiring. The only evidence for labour destruction is in junior hires. But not because anyone is being fired, but because entry-level jobs are being cannibalised.


In general economy that is not the stock market is looking less and less great. Answer to this is to tighten the belt and that means losing employees. Especially as there has not been any new great revenue sources outside AI in recent years.

> Especially as there has not been any new great revenue sources outside AI in recent years.

Nobody can make a profit with AI. Any clever idea can be cloned with AI, competition makes it unprofitable. No moat, no arbitrage opportunity. "During the gold rush, the only people making money were the men selling shovels."

We can definitely do amazing things with AI, and it makes us have superpowers, but so does everyone else. My competition also uses AI. I have to keep up with an AI powered competition now.


The shovels are the datacenters. China and America are building them. Even after the valuations puff out, that infrastructure will remain as a massive competitive advantage to those economies.

I am not convinced this will be true. The big piles of GPUs make sense when the models will change multiple times before the hardware fails; but when there's no more money for rapidly training models, the best can be encoded as circuits in much more energy efficient hardware, rendering even the new power supply infrastructure for the data centres useless.

I suspect AI would have to get drastically more expensive before it starts looking worse than payroll. If one developer using Claude Code can effectively substitute for 2 developers, you are already coming out ahead at current API pricing assuming very heavy usage, your cost is going to be ~1.5x developer (factoring in beyond salary - benefits, PTO, the other overhead that comes with having employees).

So you're getting 2 for the price of 1.5. Scale that up to 500 devs at a big company and it's a big chunk of change saved on payroll.

Keeping your headcount or hiring humans instead, AI would have to start to cost upwards of $15k/month/developer or more before it costs more than hiring. You're looking at about 4 billion tokens per month before humans start to break even or are cheaper.


You're starting from the assumption that its a 2x benefit. That's a massive leap.

True, that was more hypothetical if it got good enough to 2x.

But even taking a more realistic 1.25x (20% time savings) gain, lets say you drop from 500 to 400 devs, you'd have to hit around $4,000/dev/month in token spend before hiring humans again would break even.

Payroll is just expensive, in most companies it's by far the biggest expense. AI still has to cost drastically more before investors would call it out as being worse than increasing headcount, from a pure dollars perspective.


Also assuming that current API pricing is sustainable and not subsidized.

This is economy dependant. It’s really Indians why will take the brunt of AI job losses.

Interesting point. Outsource the outsourcers...

More expensive is a difficult calculation: faster can sometimes warrant the higher cost, if it means you can go faster to market. Also, LLMs work 24x7, and can be scaled up and down as needed. Faster to off board an LLM than to fire an employee (especially here in Europe). So, even if AI is more expensive than a developer, from TCO and ROI perspective it can still make business sense.

"AI" is just a cover for laying ppl off and saving cost. But the pendulum will swing the the other way and the companies will realise that knowledgeable ppl are still required to generate and utilize the generated code. No serious company can run with vibe-coded apps generated by laymen.

There is no profit, expense, revenue. Those don't matter. Only thing that matters is stock price goes up, and laying off makes stock price go up. When laying off make stock price go down, then laying off stop.

I imagine layoffs are also very much "this quarter and next quarter" with regards to investor visibility.

While LLM Opex is "some future quarter" and very easy to co-mingle with other expenses.


The $48K also isn't fully sunk cost - there's a non-trivial residual value for those GPUs at the moment and likely for a few years yet. The server has a depreciation curve that's pretty enviable, actually!

Cultures of patronage are fertile ground for mediocrity.. very much a running theme in the history of human organization.

> Cultures of patronage are fertile ground for mediocrity.

While judging that at such a remote is hard, the Roman Republic was such a culture, with strong patronage networks. And while we may or may not agree with Rome's goal, that culture didn't seem to produce mediocrity.


It seems like LLMs are actually pretty good at the sorts of things needed to manage a high-volume mailing list (summarizing, looking for dupes, sentiment, flagging things, etc), even if only as augmentation for human eyes.

That said, I get why this would rankle a lot of the folks involved.


That's just a security/protection racket with extra steps: "Someone is paying us to hurt your business/site; pay us money to defend your site against our attacks".

Cisco's fiscal year closes at the end of July, which makes this time of year the season for reorgs, LRs (as they're colloquially known) and the usual maneuvering that leads up to establishing budgets, sales quotas and the like. It sucks that this kind of thing has become so normalized now.

I'm not sure we'll ever really be free of the GIGO (garbage in / garbage out) principle. Tools will get better and better, but can never be a substitute for a deep understanding of the thing we want to create.


Not to be glib, but is there any industry where management consultants have been shown to make a statistically significant difference either way?


My dad ran a crisis management consultancy for years. I just googled a few of his clients and they all survived the process. He would come in, assist with minor layoffs, repair business processes, usually get some software installed/updated back when that was a huge multiplier to a business and then leave when everything was running smooth.

I also am aware of a situation where a pair of business consultants who were meant to be assisting with a software project were diverted (at full rate 1200/day) to assisting with redecorating an office.

I was directly involved, oppositionally, to a pair of business analyst consultants who tried to get a customer of mine to change their (admittedly terrible) vendor selection by repeating security concerns over and over again in the meeting. They never actually got to the point of analysing said terrible vendors terrible integration practices or costing up a migration path. They just banged on about security and contacted us separately after the meeting asking for more details about the security situation.

Basically you get out of it, what you want to get out of it. It depends on the consultant, their education, and the terms of their engagement. I don't know if statistics would be useful in this scenario or how you would control for wildly different outcomes.


The management consulting industry wouldn’t work without them.


I mean does it work? Other than profit making for the consulting companies?

Like someone else pointed out, if people are hiring them in order to provide cover for decision making, then maybe the whole thing being a charade is the point.


You missed the joke.


Statistically relevant: yes. In a positive way: no.

Well, McKinsey still existing? Too much influence. Otherwise they would have gone like so many other consulting companies.

https://www.trtworld.com/article/12748537


In, fire 30% of the workforce, new logo, out.

You are now a fully trained management consultant. (Alan Johnson, Peep Show)


In the good old time you at least had to spend some time coming up with the inspirational slide deck to explain the meaning of the new logo! Now even that part has been automated :(


TIL gnome-lib has a meaning outside of programming


I wonder if a part of the problem isn't just the misapplication of LLMs in the first place. As has been mentioned elsewhere, perhaps the agent's prompt should be to write code to accomplish as much of the task in as repeatable/verifiable/deterministic a way as possible. This would hopefully include validation of the agent's output as well. The overall goal would be to keep the LLM out of doing processing that could be more efficiently (and often correctly) handled programmatically.


100% agreed. use the non-deterministic thing that is right 90% of the time to generate a deterministic thing that is right 100% of the time. one of the key things I add to my prompts is:

- Please consult me when you encounter any ambiguous edge cases

Attaching the AI to production to directly do things with API calls is bad. For me the only use case where the app should do any AI stuff is with reading/categorizing/etc. Basically replacing the "R" in old CRUD apps. If you want to use that same new AI based "R" endpoint to auto fill forms for the "C", "U", and "D" based on a prompt that's cool, but it should never mutate anything for a customer before a human reviews it. Basically CRUD apps are still CRUD apps (and this will always be true), they just have the benefit of having a very intelligent "R" endpoint that can auto complete forms for customers (or your internal tooling/Jenkins pipelines/etc), or suggest (but never invoke) an action.


> Please consult me when you encounter any ambiguous edge cases

Why not check the logprobs of the output and take action when the prob of the first and second most likely token is too similar? (or below a certain threshold?


I think you're getting abstraction layers mixed up, prediction uncertainty and logical uncertainty aren't the same. In a reasoning model, it's entirely possible that there's only one likely continuation and it says something like "This edge case is ambiguous, but what the user most likely meant is X".


because this is manual? are you an llm?


I think there is a flow in most organizations from:

llm -> prompt -> result

llm -> prompt + prompt encoded as skill -> result

llm -> prompt + deterministic code encoded as skill -> result

I do think prompting to generate code early can shortcut that path to deterministic code, but we're still essentially embedding deterministic code in a non-deterministic wrapper. There is a missing layer of determinism in many cases that actually make long-horizon tasks successful. We need deterministic code outside the non-deterministic boundary via an agentic loop or framework. This puts us in a place where the non-deterministic decision making is sandwiched in between layers of determinism:

deterministic agentic flows -> non-deterministic decision making -> deterministic tools

This has been a very powerful pattern in my experiments and it gets even stronger when the agents are building their own determinism via tools like auto-researcher.


This is exactly how I did my last project of automating the generation of an interface library between a server that controls hardware and the mobile app.

The hardware control team delivers a spec as a document and spreadsheet. The mobile team was using that to code the interface library and validating their code against the server. I converted the document to TSV, sent some parts to Claude and have it write a parser for the TSV keeping all the nuances of human written spec. It took more than 150 iterations to get the parser to handle all edge cases and generate an intermediate output as JSON. Then Claude helped me write a code generator using some custom glue on top of Apollo to generate the code that is consumed by the mobile app.

This whole pipeline runs as part of Github actions and calls Claude only when our library validator fails. There is an md file which is sent to Claude on failure as part of the request to figure out what went wrong, propose a solution and create a PR. This is followed by a human review, rework and merge. Total credits consumed to get here < $350.


The problem is that often the program runs into some edge case that requires interpretation, at which point one is tempted to let the LLM deal with the edge case, at which point one is tempted to let the LLM deal with the whole loop and let it do the tool calls


Agreed. I think the approach described here is promising. Most of the workflow is deterministic and includes safeguards, but an LLM is invoked in the one case where it's really useful.

https://lethain.com/agents-as-scaffolding/


Completely agree! People tend to forget we are non deterministic too! Yet we are able to write code fine, and fairly reliably by using tools that can help keep us fairly honest.

I think most problems with ai tend to be around can you deterministically test the thing you are asking it to do?

How many of us would never ever show work, without going to check the thing we just built first?


> can you deterministically test the thing you are asking it to do?

Of course: have it write tests first; and run them to check its work.

Works well for refactoring, but greenfield implementations still rely on a spec that is guaranteed to be incomplete, overcomplete and wrong in many ways.


You can't ask something to check its own work without external reward/penalty. It'll cheat.


Weirdly, and i fully think this is just some cognitive bias I don't have the knowledge to name, the ai seems very happy to please me. Like when it gets something done in one shot, it seems very happy to do so.


It's because expressing emotion tests well in RLHF (reinforcement learning, human feedback), which is the layer on top of the next-token-predictor LLM. As a bonus, it helps manipulate operator reactions to incorrect output, and improve engagement (aka token use).

The "thought process" of an LLM only exists as inference response to next token prediction prompts. It's the illusion of emotion.


Well if the spec is incomplete it sounds like you should lower scope for the AI, and then go from there. I wouldn't be too keen to give a junior engineer free reign and expect awesomeness


My agents often write themselves scripts. Isn't that effectively what you're asking for? Prompting for scripts can also be a useful time and accuracy tactic when you know it'll be a good fit for it.


The problem is that code it spits out on the fly is untested and untrustworthy. Identify the parts of your workflow that could be accomplished with regular code - write and unit test that code, with LLM help if you want, and use the llm as the orchestrator only.


Yeah, the problem is that I do not think the agents is good at reusing scripts and stitching it together.At least for me it's recreating to much similar. I hope we will see platforms like windmill.dev find the optimal solution for this. I have not been able to test it enough. But have a platform that gives you some observability out of the box and protect secrets from llm is nice


I noticed that too. Unless you _ask_ for a script, they throw away the scripts they write.

They are particularly bad at complex multiline parsing. Writing all sorts of weird/crude python/awk scripts and getting confused in the process.

I wish they would use Perl6/Grammer or Haskell/Parsec or similar and write better parsing scripts.


For the non haskell folks like myself, what would that look like/ why is parsing better? Perl i get


Perl has powerful regular expressions, but it only goes so far. Doing multiline/nested structured parsing is too painful.

Perl6/Raku has built in grammers that can do that idiomatically.

If you have a couple minutes, give this a glance. It will give you an idea.

https://andrewshitov.com/2018/10/31/a-simple-parser-in-perl-...

I am no expert in haskell either. But parsec is similar in concept.


This has been our experience as well. Initially we had a list of tools that the agent could use to manipulate a data structure in certain ways. This approach was quite brittle. Now we are using a small DSL (domain specific language) and a single tool where the agent can input scripts written in the DSL. We are getting more dynamic use-cases now and wrong syntax can easily be catched by the parser and relayed to the agent.


Do you have an example of type of data and DSL? I feel I’d just give it access to write python/js to manipulate data


We decided not to go with Python/JS to make executing safe and simple.

The data structure is a recursive list of simple objects that form a table of content.

DSL uses Python syntax though. For example:

swap_section(a, b) create_section(after=2) delete_section(2)

This proved to safe a ton of explanatory prompts that would be needed if every command was a tool instead. And it’s faster and more reliable.


> write code to accomplish as much of the task in as repeatable/verifiable/deterministic a way as possible

Correct. The concept of having probabilistic output with deterministic acceptance “guardrails” is illogical. If the domain resists deterministic modeling such that you’re using an LLM, the guardrails don’t magically gain that capability.


This is so true have been working on a project for exactly this principle -

https://www.decisional.com/blog/workflow-automation-should-b...

I think there is a fundamental incentive problem - code + llm + harness is bound to be more efficient but the labs want you to burn tokens so they are not going to tell you to use the code, just burn more tokens. They are asking us to forget about the token cost and reliability for now - model will become better.

This means that most people just believe that their agent should just be able to do anything with the help of some Model fairy dust with prompts + skills.

People need to watch their agents fail in production to be able to come to the right conclusion unfortunately.


Skills are not fairy dust but a combination of prompts and deterministic code, so that you get the best of both worlds.

Eg. Loop in the code, process the subagents non-deterministic response for the individual task.

This takes 10 minutes to set up, you just need to run something like /skill-builder and describe the desired workflow.

I imagine many people just don’t know that it’s possible. I only discovered it a few days ago myself.

It worked on the first try.


yup, the standard way of thinking about agents seems backwards and probably costly. Use LLMs to write scripts, then stick all your scripts in your own looping harness and call out for LLMs for those parts that are too hard to automate with some deterministic validation at the end.


We have a rule that the LLM cannot perform any actions that result in actual money or stuff moving. Those can only be done by API calls that have lots of validation and checks on them, and adding or changing an API call is gated behind human review. The LLM is then free to make as many API calls as it likes, we're confident that it can't screw anything up too badly.


Wouldn't this immediately put the American companies producing these models at a significant disadvantage? Just use an unmolested model hosted by a provider in Vancouver.

If anything, this measure seems like it would create a scenario where services hosted outside the US would become a lot more attractive relative to Trumped AI.


The general idea is to have the LLM maintain longer-term context/background by storing it in a format/structure that's akin to a standard Wiki. The result is (hopefully) a series of human-readable and editable documents that's developed and maintained by the agent.

There's great coverage of it at https://gist.github.com/karpathy/442a6bf555914893e9891c11519...

It's actually also now a base capability in the Hermes agent and has been really helpful for me, at least.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: