Hacker Newsnew | past | comments | ask | show | jobs | submit | p337's commentslogin

You end up wasting tokens on implementation, debugging, execution, and parsing when you could just use the tool (tool description gets used instead).

Also, once you give it this general access, it opens up essentially infinite directions for the model to go to. Repeatability and testing become very difficult in that situation. One time it may write a bash script to solve the problem. The next, it may want to use python, pip install a few libraries to solve that same problem. Yes, both are valid, but if you desire a particular flow, you need to create a prompt for it that you'll hope it'll comply with. It's about shifting certain decisions away from the model so that it can have more room for the stuff you need it to do while ensuring that performance is somewhat consistent.

For now, managing the context window still matters, even if you don't care about efficient token usage. So burning 5-10% on re-writing the same API calls makes the model dumber.


> You end up wasting tokens on implementation, debugging, execution, and parsing when you could just use the tool (tool description gets used instead).

The token are not wasted, because I rewind to before it started building the tool. That it can build and manipulate its own tools to me is the benefit, not the downside. The internal work to manipulate the tools does not waste any context because it's a side adventure that does not affect my context.


Maybe I'm not understanding the scenario well. I'm imagining an autonomous agent as a sort of baseline. Are you saying the agent says "I need to write a tool", it takes a snapshot, and once it's done, it rewinds to the snapshot but this time, it has the tool it desired? That's actually a really cool idea to do autonomously!

If you mean manually, that's still interesting, but that kind of feels like the same thing to me. The idea is - don't let the agent burn context writing tools, it should just use them. Isn't that exactly what yours is doing? Instead of rewinding to a snapshot, I have a separate code base for it. As tools get more complex, it seems nice to have them well-tested with standardized input and output. Generating tools on the fly, rewinding, and using tools is just the same thing. You even would need to provide some context that says what the tool is and how to use it, which is basically what the mcp server is doing.


> Are you saying the agent says "I need to write a tool", it takes a snapshot, and once it's done, it rewinds to the snapshot but this time, it has the tool it desired? That's actually a really cool idea to do autonomously!

I'm basically saying this except I currently don't give the agent a tool yet to do it automatically because it's not really RL'ed to that extend. So I use the branching and compaction functionality of my harness manually when it should do that.

> If you mean manually, that's still interesting, but that kind of feels like the same thing to me.

It's similar, but it retains the context and feels very naturally. There are many ways to skin the cat :)


Wait, what data are you seeing where most cyber attacks are originating from the US? I work in security at a place with some of the best threat intelligence globally, and there are indeed attacks from the US, even the government, but the idea that MOST cyberattacks originate from the US would be completely shocking to me. Is there some qualifier you're not including or maybe you misremembered "most targeted" as originated?

I'm not really trying to get into the political part of it fwiw.


On the topic of comparing OpenAI models with Anthropocene models, I have a hybrid approach that seems really nice.

I set up an MCP tool to use gpt-5 with high reasoning with Claude Code (like tools with "personas" like architect, security reviewer, etc), and I feel that it SIGNIFICANTLY amplifies the performance of Claude alone. I don't see other people using LLMs as tools in these environments, and it's making me wonder if I'm either missing something or somehow ahead of the curve.

Basically instead of "do x (with details)" I say "ask the architect tool for how you should implement X" and it gets into this back and forth that's more productive because it's forcing some "introspection" on the plan.


This is an established, though advanced, idea.

Sourcegraph Amp (https://sourcegraph.com/amp) has had this exact feature built in for quite a while: "ask the oracle" triggered an O1 Pro sub-agent (now, I believe, GPT-5 High), and searching can be delegated to cheaper, faster, longer-context sub-agents based on Gemini 2.5 Flash.


They say that by using it, you agree to receive marketing emails and agree to their ToS. Not very altruistic of them.


I did a data request, and then encrypted every single one of my comments, and made a web app to decrypt it lol. I randomly get messages from people still decrypting the most random stuff and either mocking me or complimenting the idea. Highly recommend this route vs deleting because a rando can still read your content, but it's not as useful to Reddit itself.


If randos are decrypting your reddit comments then you didn't use very strong encryption.


Haha, in the comment, I say to go to my profile to decrypt, and then I have a web app linked for them with instructions. It's just AES-128 iirc, so I don't think people are decrypting it otherwise.


Maybe they have a comment on their profile linking to the web app


Not sure if it was you but I have seen such encrypted comments before.


I was going to regale you with how important my high school curriculum of programming and networking courses was in inspiring me to work in this field, but then I saw you said good developer. I ended up going into offsec instead of programming, but I still love coding and without those classes, I think I probably would've joined the military instead of ending up at a big tech company.


If you use the drive integration, you share it with Slack. Slack then creates a thumbnail that is visible in that channel. Imagine pasting a sensitive HR document in the big company chat with everyone in it. No one in the group may have permission via Google, but they can see the thumbnail (and search its contents!) if they have access to the slack room.

Edit: I should note, this is my fuzzy recollection of how it worked 4 years ago when I reported it to Slack. YMMV


I disclosed this personally 4 years ago via hacker one. The larger issue, imo, is that it indexes the content and allows an attacker to craft search terms which reveal the full contents of the document sort of like a blind SQLi. I was told it was working as intended and my report was black-holed on h1 and was told via email that it was "informational" and not a vulnerability.

It's lame to come on here and act like people reporting this are acting in bad faith. I asked for permission to talk about it and was granted it, so I don't see why the author of this post shouldn't be able to do the same considering he doesn't even get into the search indexing aspect. The company is in a vulnerable state due to negligence in addressing the issue, not because it was publicly disclosed.


> How is providing policy makers with insights from foreign politics and possible industrial espionage not giving an advantage to domestic companies, if those policy makers act appropriately?

Let's imagine OpenAI was a Russian company operating mostly in secret. This RU OpenAI secretly discover and use GPT-4-like technology, and show promise that they are not done innovating. While these LLMs are often overhyped, these recent innovations no doubt present a policy issue, right? I'd say there are legitimate national security reasons to know about that technology, not just about making money or making a better product for cheap.

The distinction being made is that the NSA may steal data related to this, but they aren't just giving it to Google to make Bard better. They are getting intel and giving lawmakers the tools to fund research, write policy, or whatever else our elected representatives deem beneficial. Any side action or under the table dealings would make this distinction meaningless of course. So, for the example above, if we started funding departments to research the threat of LLMs/AI, inform the public of the issue, and inform companies that their data is being pillaged to train AI... that is all very different from just stealing a cool new widget design and getting it to market first.

I think there's no debating that this is morally gray, but I think it's a few steps off of what other nation states are doing by stealing tech and implementing it in "private" companies. It's certainly worthy of criticism, but I think it's unhelpful to bucket it with the other type.

If the LLM example isn't your thing, it also makes a lot of sense for the NSA to steal information related to weapon/defense tech, even if developed by a private company, and even if we use what we stole to implement countermeasures. I can't honestly be morally outraged about invading the privacy of someone developing tools of war against you. Fwiw, I wouldn't blame Russia or China for trying this against the US gov or defense contractors either, but it's not like I'd be happy about it. My point is that that is not so much economic espionage or corporate espionage as much as it is just plain old espionage. It saves lives and protects American hegemony - which I recognize may be counter to many people's ideal situation.

It's a nuanced thing. When you take two morally questionable things and reduce them down to both just being bad, the ones doing the worse things benefit. E.g. "all politicians lie" is a handy phrase for truly corrupt politicians because the ones who make small mistakes or half-truths are in the same bucket as them, and the outcome is apathy for the issue rather than being upset at all of it. Kinda the classic whataboutism trope - not to imply you are doing that, but just to say that's where it often leads.


So we're evaluating the US policy on international espionage on constructed examples now?

> Let's imagine OpenAI was a Russian company

Nevermind that they're not and that Russia can't currently develop these models, due to lack of silicon. All targets I mentioned, with the exception of the brazillian oil company we're in european states, at the time (and still!) closely allied with the US.

> The distinction being made is that the NSA may steal data related to this, but they aren't just giving it to Google to make Bard better.

How would you even know at this point? Who controls the NSA? There haven't been any leaks since the Snowden revelations and there likely won't ever be any again, since Snowden could only make his move due to some misconfigured/outdated network quota control software.

Hell you can't even FOIA information about these policies, and agencies will go so far to withhold evidence in court when it concerns espionage! And soon as a court case involves this information, the court recedes from the public and is held in secret.

My hostility against US policy is by no means anywhere above the european average, but when it comes to public statements about surveillance, I have no reason to trust the US Government. The Bush administration has proven that it is possible to flout the US constitution on a massive scale with just 10-12 people. At this point I can't blame people putting forward some crazy conspiracy theories about the deep state or qanon, because the US gov has given no indication to be believably concerned about compliance with their own laws.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: