Hacker Newsnew | past | comments | ask | show | jobs | submit | ssk42's commentslogin

I’m sorry but in this day and age, why would you not use AI with safeguards? With giving it the proper context and best practices you’re looking for. These are all very solved in Claude and any agentic system. Are you saying that you don’t? This just feels insulting to those of us who do care about code but do love Claude

Can you defend that though? Does living mean needing cells? Does it mean possessing the ability to think and reason? Is Claude thinking and reasoning?

Fun to see you not on tildes.

Setting up a clean room is one of the only ways to do Evals on agentic harnesses. Especially prevalent with Windsurf which doesn’t have an easy CLI start.

So how? The easiest answer when allowed is docker. Literally new image per prompt. There’s also flags with Claude to not use memory and from there you can use -p to have it just be like a normal cli tool. Windsurf requires manual effort of starting it up in a new dir.


Sounds interesting, but I'm not quite getting the relevance for people writing code with an agent. Should I be doing evals?

Well I mean yes. I think people ought be aware for how the harnesses compare for their stacks. But clean room applies for this RGR situation too

you are replying to a bot, that's why.

What

ClawdBots can now more easily interact with the Internet than regular agents, so you wind up with Moltbook leaking


The biggest clue I’ve seen is someone using it to do cold calls on websites. Claw searches for shoddy-looking construction sites, makes a better version on Vercel, and sends out a pitch.


Are we visiting the same sites/talking to same people? Because I heard about this “use case” couple of days ago.


Probably. It was on my tiktok FYP


I saw an interesting comment somewhere that this will ultimately be Anthropic’s goal: creating the next generation of of App Store. The idea that Anthropic could just be building the mini vibe code to whatever the user wants. This post clearly exemplifies that me-apps can and should be a thing. That anyone can put forth reasonable effort and make an app without having to cross hard learning barriers of effort.


Context engineering is a critical part of being able to use the tool. And it's ok to not understand how to use a new tool. The different models combined with different stacks require different ways of grappling with the technology. And it all changes! It sucks that you've tried it for your stack (Elixir, whatever that is) in your way and it was disappointing.

To me, the tool inherently makes sense and vibes with my own personality. It allows me to write code that I would otherwise procrastinate on. It allows me to turn ideas into reality, so much faster.

Maybe you're just hyper focused on metrics? Productivity, especially when dealing with code, is hard to quanitfy. This is a new paradigm and so it's also hard to compare apples to oranges. Does this help?


So your take is that every real software developer I know is simply bad at using this magical tool that performs on the level of mid-senior level software engineer in the hands of a few chosen ones? But the chosen ones never build anything in public where it can be observed, evaluated, and critiqued. How unfortunate is that?

The people I talked to use a wide variety of environments and their experience is similar across the board, whether they're working in Nodejs, React, Vue, Ruby, PHP, Java, Elixir, or Python.

> Productivity, especially when dealing with code, is hard to quanitfy.

Indeed, that's why I think most people claiming these obscene benefits are really bad at evaluating their own performance and/or started from a really low baseline.

I always think back to a study I read a while ago where people without ADHD were given stimulant medication and reported massive improvements in productivity but objective measurements showed that their real-world performance was equal to, or slightly lower than their baseline.

I think it's very relevant to the psychology behind this AI worship. Some people are being elevated from a low baseline whilst others are imagining the benefits.


People do build in public from vibe-coding, absolutely. This tells me that you have not done your research and just gone off of general guesses or pessimism/frustration from not knowing how to use the tool. The easiest way to be able to find this on Github is to look for where Claude is a contributor. Claude will tag itself in the PR or pushes. Another easy way to that I've seen come up for this is there is a whole "BuildInPublic" tag in the Threads app which has been inundated with Vibe coding. While these might not be in your algorithm, they do exist. You'll be able to see that while there is a lot of crud that there are also products being made are actually versatile, complex, and completely vibe-coded. Most people are not making up these stories. It's very real.


Of course people vibe-code in public - I was clear that I wanted to see evidence of these amazing productivity improvements. If people are building something decent but it takes them 3 or 4 times as long as it would take me, I don't care. That's great for them but it's worthless to me because it's not evidence of a productivity increase.

> there are also products being made are actually versatile, complex, and completely vibe-coded.

Which ones? I'm looking for repositories that are at least partially video-documented to see the author's process in action.


Gosh, it's almost like a proper IDE has synonymous features with LLMs


the good news is that you can self-host bitwarden pretty easily and so it doesn't have to be a hassle/risk


Grandma is self-hosting what???


I am going to be honest, Grandma is already compromised.


that's where you come in sonny


This.

Grandma, and Uncle Rob, and your cousins, and anyone else you have a long standing relationship with, can use your VaultWarden instance if you let them.

But! You now get to maintain uptime (Rob travels and is frequently awake at 3am your time) and make sure that the backups are working... and remember that their access to their bank accounts is now in your hands, so be responsible. Have a second site and teach your niece how to sysadmin.


Yeah, they have the world class Salesforce engineers there. One of Google's Salesforce's last tech leads wound up becoming the Director of the proprietary Salesforce language Apex.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: