Hacker Newsnew | past | comments | ask | show | jobs | submit | mierz00's commentslogin

I highly rate Braintrust.

It wouldn’t be too difficult to build something like that for your own usage, but I found it pretty easy to get datasets set up.

Essentially a game changer in understanding if your prompts are working. Especially if you’re doing something which requires high levels of consistency.

In our case we would use LLM for classification which fits in perfectly with evals.


Have some good takeaways / feedback on this? First time I hear about Braintrust (the eval platform) so I'll look into it but I'm curious on your experience with it so far.

If I am being honest, the value came from doing evals and testing against different models.

Essentially all I needed was a way to upload a data set, run tests against that data set and spit out a percentage of pass fail.

Braintrust makes this pretty easy, but If I was to do it again I would vibecode the same functionality.


“I admit that I still disagreed with him after the exchange, but I had a new respect for him as a designer because he was able to articulate a rationale for his decision.”

Any competent designer gets really good at justifying their decisions. Everyone has an opinion about design and thinks that their taste is correct.

I’m glad I don’t have to deal with that on the software side.


I had to deal with that on the software side.

The ability to win arguments about technical choices is not always aligned with the ability to make good technical choices.


Aren’t people researching the companies they are applying for?

Also, I don’t think I have ever applied to a fake job.


I’m sure the welfare of the Iranian people is a top priority for Trump.


Marginally related, I feel the same way about honesty, especially in a work context.

I’ve always prided myself in being an honest but considerate person.

A recent experience with a colleague who weaponised my honesty in an attempt to manipulate me has left a foul taste in my mouth. Luckily their contract ended and the problem resolved itself.

But I remember distinctly feeling that I will be professional and polite but I do not automatically owe anyone my honesty.


Talk to people.

There are an infinite amount of problems to solve.

Deciding whether they’re worth solving is the hard part.


Are any of these people willing to fund an answer to these problems?


We analyse thousands of lines from a csv using an LLM. The only thing that worked for us was to send each individual line and analyse it one by one.

I’m not sure if that would work in your use case, but you could classify each line into a value using an LLM then hard code the trends you are looking for.

For example if you’re analysing something like support tickets. Use an LLM to classify the sentiment, and you can plot the sentiment on a graph and see if it’s trending up or down.


I think that is probably what I'll end up doing. Since the data is text based data. Combined that with the approach of pre analzying quantitative data. To feed to the LLM

I figured I ask this question because there might've been a technique I'm not aware about


I’m really not sure I follow this argument.

A lot of software has friction to get to the value. This is often because of constraints not choice.

To give a concrete example of this, in my company we had users upload files for analysis. To get the export for the file, it took many steps. Not hard, but a lot to get done.

We switched it to an integration and now it’s 3 clicks. We’ve gone from 10% of users onboarding to 100%.

It doesn’t mean we get people to stay, but the barrier to understanding if our tool provides value to them has completely disappeared.

I’m very curious though, what value did you strip away when trying to make your product easier to use?


This also goes the other way too, you don’t assume the original is incorrect.

I see this a lot with developers who come in and start to criticise before understanding.

There is always a reason for why something is as it is, and it’s unlikely that the people before you were just idiots.


How do you introduce any tool/change to a team of people?

You get buy in, start having conversations see what AI people have explored. Have they tried claude? Do they prefer other tools? If so why? What are the objections. Actually listen. I’d also showcase what you can do. I love to present what codex has found when debugging something, or a prototype I’ve put together.

If you have the budget pay for subscriptions so they can play around.

Also, you say that development velocity is a big problem, but I would dive into why that is. You may be disappointed when velocity remains the same with AI tools.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: