Hacker Newsnew | past | comments | ask | show | jobs | submit | nostrademons's commentslogin

It's the attention mechanism at work, along with a fair bit of Internet one-up-manship. The LLM has ingested all of the text on the Internet, as well as Github code repositories, pull requests, StackOverflow posts, code reviews, mailing lists, etc. In a number of those content sources, there will be people saying "Actually, if you go into the details of..." or "If you look at the intricacies of the problem" or "If you understood the problem deeply" followed by a very deep, expert-level explication of exactly what you should've done differently. You want the model to use the code in the correction, not the one in the original StackOverflow question.

Same reason that "Pretend you are an MIT professor" or "You are a leading Python expert" or similar works in prompts. It tells the model to pay attention to the part of the corpus that has those terms, weighting them more highly than all the other programming samples that it's run across.


I don’t think this is a result of the base training data („the internet“). It’s a post training behavior, created during reinforcement learning. Codex has a totally different behavior in that regard. Codex reads per default a lot of potentially relevant files before it goes and writes files.

Maybe you remember that, without reinforcement learning, the models of 2019 just completed the sentences you gave them. There were no tool calls like reading files. Tool calling behavior is company specific and highly tuned to their harnesses. How often they call a tool, is not part of the base training data.


Modern LLM are certainly fine tuned on data that includes examples of tool use, mostly the tools built into their respective harnesses, but also external/mock tools so they dont overfit on only using the toolset they expect to see in their harnesses.

IDK the current state, but I remember that, last year, the open source coding harnesses needed to provide exactly the tools that the LLM expected, or the error rate went through the roof. Some, like grok and gemini, only recently managed to make tool calls somewhat reliable.

Of course I can't be certain, but I think the "mixture of experts" design plays into it too. Metaphorically, there's a mid-level manager who looks at your prompt and tries to decide which experts it should be sent to. If he thinks you won't notice, he saves money by sending it to the undergraduate intern.

Just a theory.


Notice that MOE isn’t different experts for different types of problems. It’s per token and not really connect to problem type.

So if you send a python code then the first one in function can be one expert, second another expert and so on.


Can you back this up with documentation? I don't believe that this is the case.

The router that routes the tokens between the "experts" is part of the training itself as well. The name MoE is really not a good acronym as it makes people believe it's on a more coarse level and that each of the experts somehow is trained by different corpus etc. But what do I know, there are new archs every week and someone might have done a MoE differently.

Check out Unsloths REAP models, you can outright delete a few of the lesser used experts without the model going braindead since they all can handle each token but some are better posed to do so.

This is such a good explanation. Thanks

>> Same reason that "Pretend you are an MIT professor" or "You are a leading Python expert" or similar works in prompts.

This pretend-you-are-a-[persona] is cargo cult prompting at this point. The persona framing is just decoration.

A brief purpose statement describing what the skill [skill.md] does is more honest and just as effective.


I think it does more harm than good on recent models. The LLM has to override its system prompt to role-play, wasting context and computing cycles instead of working on the task.

You will never convince me that this isn't confirmation bias, or the equivalent of a slot machine player thinking the order in which they push buttons impacts the output, or some other gambler-esque superstition.

These tools are literally designed to make people behave like gamblers. And its working, except the house in this case takes the money you give them and lights it on fire.


Your ignorance is my opportunity. May I ask which markets you are developing for?

"The equivalent of saying, which slot machine were you sitting at It'll make me money"

Different sets of people, and different audiences. The CEO / corporate executive crowd loves AI. Why? Because they can use it to replace workers. The general public / ordinary employee crowd hates AI. Why? Because they are the ones being replaced.

The startups, founders, VCs, executives, employees, etc. crowing about how they love AI are pandering to the first group of people, because they are the ones who hold budgets that they can direct toward AI tools.

This is also why people might want to remain anonymous when doing an AI experiment. This lets them crow about it in private to an audience of founders, executives, VCs, etc. who might open their wallets, while protecting themselves from reputational damage amongst the general public.


This is an unnecessarily cynical view.

People are excited about AI because it's new powerful technology. They aren't "pandering" to anyone.


I have been in dozens of meetings over the past year where directors have told me to use AI to enable us to fire 100% of our contract staff.

I have been in meetings where my director has said that AI will enable us to shrink the team by 50%.

Every single one of my friends who do knowledge work has been told that AI is likely to make their job obsolete in the next few years, often by their bosses.

We have mortgages to pay and children to feed.


People are afraid because they need to work to eat. People who don't need to work to eat are less likely to be afraid.

I have yet to meet anyone except managers be excited about LLM's or generative AI.

And the only people actually excited about the useful kinds of "AI", traditional machine learning, are researchers.


You don' have to look past this very forum, most people here seem to be very positive about gen AI, when it comes to software development specifically.

Lots of folk here will happily tell you about how LLMs made them 10x more productive, and then their custom agent orchestrator made them 20x more productive on top of that (stacking multiplicatively of course, for a total of 200x productivity gain).


I assume those people are managers, have a vested interest in AI, or have only just started programming.

How would you find out if you were wrong?

You're presented with hundreds of people that prove you wrong, and your response is "no, I assume I'm right"?


This is obviously a rhetorical statement. I'm not claiming a categorical fact, but a fuzzy one.

Most of these peoples are managers, investors, or junior.


I don't know what is your bubble, but I'm a regular programmer and I'm absolutely excited even if a little uncomfortable. I know a lot of people who are the same.

Interesting, every developer I've spoken to is extremely skeptical and has not found any actual productivity boosts.

Ok that's not true. I know one junior who is very excited, but considering his regular code quality I would not put much weight on his opinion.


I am using AI a lot to do tasks that just would not get done because they would take too long. Also, getting it to iterate on a React web application meant I can think about what I want it to do rather than worry about all the typing I would have to do. Especially powerful when moving things around, hand-written code has a "mental load" to move that telling an AI to do it does not. Obviously not everything is 100% but this is the most productive I have felt for a very long time. And I've been in the game for 25 years.

Why do you need to move things around? And how is that difficult?

Surely you have an LSP in your editor and are able to use sed? I've never had moving files take more than fifteen minutes (for really big changes), and even then most of the time is spent thinking about where to move things.

LLM's have been reported to specifically make you "feel" productive without actually increasing your productivity.


I mean there are two different things. One is whether there are actual productivity boosts right now. And the second is the excitement about the technology.

I am definitely more productive. A lot of this productivity is wasted on stuff I probably shouldn't be writing anyways. But since using coding agent, I'm both more productive at my day job and I'm building so many small hobby projects that I would have never found time for otherwise.

But the main topic of discussion in this thread is the excitement about technology. And I have a bit mixed feelings, because on one hand side I feel like a turkey being excited for the Thanksgiving. On the other hand, I think the programming future is bright. there will be so much more software build and for a lot of that you will still need programmers.

My excitement comes from the fact that I can do so much more things that I wouldn't even think about being able to do a few months ago.

Just as an example, in last month I have used the agents to add features to the applications I'm using daily. Text editor, podcast application, Android keyboard. The agents were capable to fork, build, and implement a feature I asked for in a project where I have no idea about the technology. Iif I were hired to do those features, I would be happy if I implemented them after two weeks on the job. With an agent, I get tailor made features in half of a morning. Spending less than ten minutes prompting.

I am building educational games for my kids. They learn a new topic at school? Let me quickly vibe the game to make learning it fun. A project that wouldn't be worth my weekend, but is worth 15 minutes. https://kuboble.com/math/games/snake/index.html?mode=multipl...

So I'm excited because I think coding agents will be for coding what pencil and paper were for writing.


I don't understand the idea that you "could not think about implementing a feature".

I can think of roughly 0 fratures of run-of-the-mill software that would be impossible to implement for a semi-competent software developer. Especially for the kinds of applications you mention.

Also it sounds less like you're productive and more like the vibeslop projects are distracting you.


I'm claiming it's both.

I produce more good (imo) production features despite being distracted.

The features I mention is something that I would be able to do, but only with a lot of learning and great effort - so in practical terms I would not.

It is probably a skill issue but in the past many times I downloaded the open source project and just couldn't build and run it. Cryptic build errors, figuring out dependencies. And I see claude gets the same errors but he just knows how to work around those errors. Setting up local development environment (db, dummy auth, dummy data) for a project outside of my competence area is already more work than I'm willing to do for a simple feature. Now it's free.

>I can think of roughly 0 fratures of run-of-the-mill software that would be impossible to implement for a semi-competent software developer.

Yes. I'm my area of competence it can do the coding tasks I know exactly how to do just a bit faster. Right now for those tasks I'd say it can one shot code that would take me a day.

But it enables me to do things in the area where I don't have expertise. And getting this expertise is very expensive.


Out of interest, could you give me an example of a feature that it one-shotted that would have taken you a whole day?

The example from yesterday:

I have a large C# application. In this application I have a functionality to convert some group of settings into a tree model (a list of commands to generate this tree). There are a lot of weird settings and special cases.

I asked claude to extract this logic into a separate python module.

It succesfully one-shot that, and I would estimate it as 2 days work for me (and I wrote the original C# code).

This is probabaly the best possible kind of task for the coding agents, given that it's very well defined task with already existing testcases.


Seems reasonable, but if it's just copy pasting, doesn't seem like that would take you a whole day. Maybe on the order of an hour at most.

Were you exaggerating earlier or do you have more examples?


This is a two-day task for me. If you could do it in one hour, then you're a 10x programmer compared to me.

You can browse the code at <my_username>.com/slop/hn_tb/

I have also sloped the simple code viewer. So you can make your judgement if it looks like 1 hour task.


You've always been able to delete for 2 hours and then the post becomes effectively permanent, modulo emailing dang to get it deleted by an admin.

Have they stated the justification for this anywhere? You'd think a site that brands itself as being for hackers would value its users having control over their comments/privacy.

There's value in editing for clarity within a window of a live discussion. After the live discussion is less active, it's important to be able to reference things or see a coherent view of the discussion and what people were responding to.

Yes, it's because the comments create a discussion thread that then becomes impossible to follow (or worse, misleading) if certain comments within it are either deleted or edited to say something different. The idea is that what you write becomes communal property once it's been responded to, because it's part of a community discussion that loses meaning if people start deleting individual comments.

I believe that, even within that two hour window, you cannot delete if anyone has replied to it.

You can still edit it to say "[deleted]" or something, though.


I've seen videos where people will put in removable drywall panels that can just be lifted out for access.

There are a lot of downsides though. You lose airsealing, if you don't have an airtight building envelope on the outside of the drywall. You lose fire resistance. You often lose aesthetics, although I've seen this done extremely tastefully. You lose childproofing, and run the risk of a kid electrocuting themselves or destroying your plumbing or dropping stuff in the wall. You impose constraints on what can go on the walls and where your furniture can go.

Given that drywall is pretty easy to cut and replace, most people figure it's just not worth the costs for something you do infrequently.


The difference is emblematic of the difficulty in getting attention for climate mitigation. AI succeeds because you can sell a service to an individual human which will give them advantages over other humans. Climate change mitigation fails because you are trying to sell a service to humanity which will result in a better end state over some other hypothetical imagined future. Humans make decisions, not humanity, and many of them are pretty bad with both hypotheticals and imagination. It's no wonder that a product designed to make them do better at what they do, right now is more successful than one designed to make everybody do better than what would otherwise have resulted, 50-100 years in the future when they'll likely be dead.

Any kind of workable solution to large, societal-level problems needs to deal with the principal agent issue. Society doesn't actually exist; humanity doesn't actually exist. These are abstractions we use to label the behavior of individual people. You need to operate on the level of individual people to get any sort of outcome.

(FWIW, this is a major reason why concepts like markets, capitalism, democracy, rule of law, and federalism have been successful. They work by aligning incentives so that when one person takes an action that is good for themselves, they more-or-less end up benefitting the people around them too.)


It started as Testing on the Toilet, which was an effort to get people to actually care about unit-testing their code and software quality and writing maintainable code that doesn't break in 6 months. Later was expanded to Learning on the Loo, general tips and tricks, and then Testing on the Toilet became Tech on the Toilet. It's been going on for a good 20 years now, so that's about 1000 articles (they change them out weekly) and there aren't really 1000 articles you can write about unit testing.

The insight is actually pretty similar to Google's core business model: when you're going to the bathroom, there isn't a whole lot else you're doing, so it's the perfect time to put up a 2-3 minute read to reinforce a message that you want people to hear but might not get attention for otherwise.


Actually, that is also a way to surrepticiously abuse you: not even your toilet time should be "yours".

I was in a fraternity in college, 20 years ago. We put weekly bathroom notes on the inside of the stall doors. Something interesting, something funny, upcoming news. The elected fraternity secretary was responsible for making those weekly, among many other things.

If they were a day late the amount of pestering they would get until the did that weekly job was hilarious. We all got a kick out of them.

Your toilet time can be yours, just don’t fucking read them lol. Back then razr phones were the hotness, nobody sat on a smartphone and had ads blasted at them while they took a shit.


I guess, if you equate "influence" with "abuse". An awful lot pillars of our society would become abuse then. Ask any parent of a toddler whether their toilet time is actually "theirs".

Employers should not be treating employees like toddlers and try to brainwash them on the goddamn toilet

My point is the opposite actually: if you are the parent of a toddler, you'll know that your toilet time is not actually yours, because your toddler will try every effort to get your attention and influence you, up to and including crawling into your lap while you are doing your business; tantrumming on the bathroom floor; tantrumming outside the bathroom door; cutting up the mail you really need to file; spilling food all over the floor; unlatching childproofing; moving furniture; and enlisting their siblings.

I play chess on the toilet at work.

Anecdotally my experience is dramatically different.

Last week I arrived by car right near the beginning of dropoff time. Pulling in right in front of me was the mom of one of my kid's classmates, carpooling with another kid who lives in the same apartment complex. The three of them met up as soon as they got out of the car, and then another one of their friends (who lives across the street from the school and usually walks) joined them from his driveway. They met up with a 5th friend before they crossed the street.

Then I walked - well, more like ran - with the 5 of them down the 111 steps that take us from the street level to the schoolyard. When they reached the bottom, they met up with 3 more friends who had just been let out of the drop-off zone in front of the school itself. Said a quick goodbye to my kid, but he wasn't really paying attention, he was already ensconced in his pack of 8.

I've gotten there with my kid before drop-off time, walked down the stairs with him, and there's been a pack of about 20-30 kids and 2-3 parents usually milling around before the school gates open.

I realize that this is somewhat atypical in 21st-century America, and we specifically chose this community because, well, it actually has a sense of community, but it's not unique. In preschool I'd take my son over to his preschool bestie's house (she lived about 2 cities away), and there'd be a whole pack of kids roaming the neighborhood going over unannounced to each other's houses.


Have seen this in Portland (lots of e-bikes with child carriers as well, even in the cold and rain), but not in more spread out cities.

I think it is crazy that you have gates to get into the school grounds (buildings should be locked, I get that). Like my BIL in Sydney suburbs, he lives right next to a school with super nice basketball court etc, but can kids use those on weekends? Sadly no.

The gates here are open when school is not in session, and we (and other families) do in fact use the school grounds for playdates on weekends.

But yes, it sucks that they have to exist, and that my kids have active shooter drills and the school has a plan for what to do in a mass-casualty event. Though so far, every time they've triggered the secure campus protocols, it's because a baby coyote likes to hang out on the stairs.


How do you find communities like that? It’s not exactly a Redfin search.

Word of mouth and on-the-ground sleuthing.

The community in question was put on our radar screen when we attended a party that one of my wife's business school friends threw. It's not well-known; even in our metro area, most people probably wouldn't recognize the name or be able to place it on a map.

But then when we were house-hunting, I just drove through all the residential neighborhoods within commuting distance of our jobs. And took note of where I saw people a.) out walking and b.) talking to their neighbors. Reported to my wife (who thought this was a nutty waste of time, but really values community) "I think you'll like it here", then paid the exorbitant home value to actually buy a home in the area. Indeed, we did like it here.


The sun's spectrum doesn't have the most energy in the visible light band, though it's close. Most of the energy is in the infrared band:

https://sunwindsolar.com/blog/solar-radiation-spectrum/?v=0b...

Both the "because that's what the sun emits" and "because we are mostly water" explanations are incomplete. There are plenty of other animals [1] that can "see" infrared.

The real reason is simply because that's how we evolved. That's how the "because those are the frequencies that pass through water" explanation comes into play: vision first evolved in aquatic animals, so frequencies that don't penetrate water wouldn't have been all that helpful to their survival and reproductive success, and so wouldn't be selected for. But that's incomplete too: salmon are one of the top IR-sensing animals and they live in water, so when there's an evolutionary need to select for IR vision, it happens. The reason we "see" in the visible light range is simply that that's how we've defined "visible".

There are some physics reasons as well, notably that most mammalian body structures emit heat, which would blind an animal that relies on infrared to see (notice how most of the animals that can see infrared are cold-blooded reptiles, fish, and insects), and that most of the high-resolution biochemical mechanisms that can convert electromagnetic waves to electrochemical nerve impulses operate in the visible light range. Structures that convert infrared radiation to nerve impulses are more complex and more costly to support, so unless there's a clear survival benefit for the species, they tend to get selected away.

[1] https://a-z-animals.com/animals/lists/animals-that-can-see-i...


The thing is that real security isn't something that a checklist can guarantee. You have to build it into the product architecture and mindset of every engineer that works on the project. At every single stage, you have to be thinking "How do I minimize this attack surface? What inputs might come in that I don't expect? What are the ways that this code might be exploited that I haven't thought about? What privileges does it have that it doesn't need?"

I can almost guarantee you that your ordinary feature developer working on a deadline is not thinking about that. They're thinking about how they can ship on time with the features that the salesguy has promised the client. Inverting that - and thinking about what "features" you're shipping that you haven't promised the client - costs a lot of money that isn't necessary for making the sale.

So when the reinsurance company mandates a checklist, they get a checklist, with all the boxes dutifully checked off. Any suitably diligent attacker will still be able to get in, but now there's a very strong incentive to not report data breaches and have your insurance premiums go up or government regulation come down. The ecosystem settles into an equilibrium of parasites (hackers, who have silently pwned a wide variety of computer systems and can use that to setup systems for their advantage) and blowhards (executives who claim their software has security guarantees that it doesn't really).


> but now there's a very strong incentive to not report data breaches and have your insurance premiums go up or government regulation come down

I would argue the opposite is true. Insurance doesn’t pay out if you don’t self-report in time. Big data breaches usually get discovered when the hacker tries to peddle off the data in a darknet marketplace so not reporting is gambling that this won’t happen.


Curious how the compromised company can report if the compromise has not been detected

There need to be much more powerful automated tools. And they need to meet critical systems where they are.

Not very long ago actual security existed basically nowhere (except air-gapping, most of the time ;)). And today it still mostly doesn't because we can't properly isolate software and system resources (and we're very far away from routinely proving actual security). Mobile is much better by default, but limited in other ways.

Heck, I could be infected with something nasty and never know about it: the surface to surveil is far too large and constantly changing. Gave up configuring SELinux years ago because it was too time-consuming.

I'll admit that much has changed since then and I want to give it a go again, maybe with a simpler solution to start with (e.g. never grant full filesystem access and network for anything).

We must gain sufficiently powerful (and comfortable...) tools for this. The script in question should never have had the kind of access it did.


> The thing is that real security isn't something that a checklist can guarantee.

I've taken this even further. You cannot do security with a checklist. Trying to do so will inevitably lead to bad outcomes.

Couple of years back I finally figured out how to dress this in a suitably snarky soundbite: doing security with a spreadsheet is like trying to estimate the health of a leper colony by their number of remaining limbs.


You are asserting that security has to be hand-crafted. That is a very strong claim, if you think about it.

Is it not possible to have secure software components that only work when assembled in secure ways? Why not?

Conversely, what security claims about a component can one rely upon, without verifying it oneself?

How would a non-professional verify claims of security professionals, who have a strong interest in people depending upon their work and not challenging its utility?


Not the person you are responding to, but: I would agree that at the stage of full maturity of cybersecurity tooling and corporate deployment, configuration would be canonical and painless, and robust and independent verification of security would be possible by less-than-expert auditors. At such a stage of maturity, checklist-style approaches make perfect sense.

I do not think we're at that stage of maturity. I think it would be hubris to imitate the practices of that stage of maturity, enshrining those practices in the eyes of insurance underwriters.


Corporate security is beyond merely making sure software itself is secure.

Phishing for example requires no security vulnerabilities, and is one of the primary initial attack vectors into a company.

You need proper training and the right incentives for people to actually care and think before they act.


You’re making many assumptions which fit your worldview.

I can assure you that insurers don’t work like that.

If underwriting was as sloppy as you think it is insurance as a business model wouldn’t work.


Err, cybersecurity insurance as a business model has not worked. I have seen analyst reports showing that there have been multiple large claims that are each individually larger than all premiums ever collected industry wide. Those same reports indicated that all the large cybersecurity insurance vendors were basically no longer issuing policies with significant coverage, capping out at the few million dollar range. Cybersecurity insurance is picking up pennies in front of a steamroller; you wonder why no one else is picking up this free money on the ground until you get crushed.

Note, that is not to say that cybersecurity insurance if fundamentally impossible, just that the current cost structure and risk mitigation structure is untenable and should not be pointed at as evidence of function.


The financial sector is famously sloppy and it’s still doing just fine.

Archive blocks VPNs. If you're on one, that could be why.

I've also found that archive.ph is significantly less accessible than archive.is despite hosting the same content. Pausing my VPN for a few minutes and then changing the .ph to .is fixed a similar captcha loop for me, though I still did need to solve a captcha for it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: