Hacker Newsnew | past | comments | ask | show | jobs | submit | bcyn's commentslogin

Very cool! Could you support sorting of some kind? Would love to at least view most recent first.

This is really awesome! Dream home project for me as well, but can't justify the cost of large e-ink displays so far (was shocked at the nearly ~$2k sticker price of that Boox Mira Pro!)


Which models perform anywhere close to Opus 4.5? In my experience none of the local models are even in the same ballpark.


This week: look at Qwen3 Coder Next and GLM 4.7 but it's changing fast.

I wrote this for the scenario you've run out of quota for the day or week but want a back up plan to keep going to give some options with obvious speed and quality trade-offs. There is also always the option to upgrade if your project and use case needs Opus 4.5.


I don't think the point about 401k stagnation is true. At most fee structures and optionality of funds change. How did that cost you 500k exactly?


Why? How do you draw the line between people who deserve to be "surveilled" (if you can even call it that in this case...) vs. people who don't?

You are entitled to your opinion of course but it just seems extremely arbitrary.


I don't have a good, rational answer.

I think the idea is vaguely that the upper-upper class statistically must've done something wrong or have the power to cause extreme harm, therefore it's okay to snitch on them but not your regular Joe.

I'm just espousing the standard American middle class views about freedom here. Not trying to argue they are sound or rational.


Great read, thanks! Could you dive a little deeper into example 2 & pre-registration? Conceptually I understand how the probability of false positives increases with the number of variants.

But how does a simple act such as "pre-registration" change anything? It's not as if observing another metric that already existed changes anything about what you experimented with.


If you have many metrics that could possibly be construed as "this was what we were trying to improve", that's many different possibilities for random variation to give you a false positive. If you're explicit at the start of an experiment that you're considering only a single metric a success, it turns any other results you get into "hmm, this is an interesting pattern that merits further exploration" and not "this is a significant result that confirms whatever I thought at the beginning."

It's basically a variation on the multiple comparisons, but sneakier: it's easy to spend an hour going through data and, over that time, test dozens of different hypotheses. At that point, whatever p-value you'd compute for a single comparison isn't relevant, because after that many comparisons you'd expect at least one to have uncorrected p = 0.05 by random chance.


There are many resources that will explain this rigorously if you search for the term “p-hacking”.

The TLDR as I understand it is:

All data has patterns. If you look hard enough, you will find something.

How do you tell the difference between random variance and an actual pattern?

It’s simple and rigorously correct to only search the data for a single metric; other methods, eg. Bonferroni correction (divide p by k) exist, but are controversial (1).

Basically, are you a statistician? If not, sticking to the best practices in experimentation means your results are going to be meaningful.

If you see a pattern in another metric, run another experiment.

[1] - https://pmc.ncbi.nlm.nih.gov/articles/PMC1112991/


This doesn't make as much sense as you think it does. If you could predictably trade a flip from bearish to bullish (for example, of course there are other trend-based signals), you would not share that signal because others would overcrowd your trade (by buying/shorting and moving the price more quickly towards the trending direction than you).

A potential argument is that these signals are only applicable to a certain bracket of portfolio sizes (e.g. larger AUM funds would not be able to trade this strategy) -- but you are sharing this with folks presumably in your range of portfolio size.


Overcrowding an entry on highly liquid assets is something that is so far from reality for our service.


The more highly liquid an asset, the more efficient it is and the fewer trading opportunities after accounting for transaction costs. In something like the S&P 500, everything is already priced in.

Meme stocks and shitcoins being manipulated by whales are not efficient and also not as liquid.

The larger point remains that none of the above considerations are discussed on this product's page.


I'm realizing there's a lot of confusion about what trend based models actually are. I was under the assumption the concept was more widely understood, but I'm realizing we need to explain it better.

To be clear, there's nothing new or innovative about a trend based model. It's one of the most commonly used investment strategies by intuitions, etc. It's been widely utilized for far longer than I've been alive.


There's no confusion about the type of edge. Just pointing out that if you are selling an edge rather than trading it yourself, you're either grifting or naive.


It read to me as a (very) small discount for choosing to be billed annually vs. monthly.


The discount is applied to $10/month and is said to make it effectively $8.5 per month, but is actually $8.25 per month, since they claim to charge $99 per year.


Yeah, it's a $20 discount billed annually.


Then that would be $100 per year and none of your numbers for annual plans are accurate. Your website claims $99 per year and claims that is equivalent to $8.50 per month. None of these 3 numbers are equivalent.


You're correct! I made the fix.


You're correct about valuation, but the parent post was meant to address "how much liquid dollars should you expect to receive vs. 409a." You are likely to receive less in most cases (read: unless there are wildly successful public liquidity events) due to liquidation preferences.


Plenty of (non-VC backed) startups raise some money and then sell privately; it’s often the case that preference does not cause the common stock value to drop below the most recent 409a in these cases.

(In my experience, the 409a is on the order of 20% of the most recent raise, and preference is not more than 50%, in my area. And obviously you hope to sell for more than the last raise!).


Any reasonable 409a will be fully aware of those preference terms and will have factored them in.


Very interested to see what the next steps are to evolve the "retrieval" model - I strongly believe that this is where we'll see the next stepwise improvement in coding models.

Just thinking about how a human engineer approaches a problem. You don't just ingest entire relevant source files into your head's "context" -- well, maybe if your code is broken into very granular files, but often files contain a lot of irrelevant context.

Between architecture diagrams, class relationship diagrams, ASTs, and tracing codepaths through a codebase, there should intuitively be some model of "all relevant context needed to make a code change" - exciting that you all are searching for it.


I have a different pov on retrieval. It's a hard problem to solve in a generalizable format with embeddings. I believe this can be solved at a model level where its used to fix an issue. With the model providers (oai, anthropic) going full stack, there is a possibility they solve it at reinforcement learning level. Eg: when you teach a model to solve issues in a codebase, the first step is literally getting the right files. Here basic search (with grep) would work very well as with enough training, you want the model to have an instinct about what to search given a problem. similar to how an experienced dev has that instinct about a given issue. (This might be what the tools like cursor are also looking at). (nothing against anyone, just sharing a pov, i might be wrong)

However, the fast apply model is a thing of beauty. Aider uses it and it's just super accurate and very fast.


Definitely agree with you that it's a problem that will be hard to generalize a solution for, and that the eventual solution is likely not embeddings (at least not alone).


Relevant interview extract from the Claude Code team: https://x.com/pashmerepat/status/1926717705660375463

> Boris from the Claude Code team explains why they ditched RAG for agentic discovery. > "It outperformed everything. By a lot"


This is very cool. They explained the solution better than I did. If I knew, I would have just linked this :)


Adding extra structural information about the codebase is an avenue we're actively exploring. Agentic exploration is a structure-aware system where you're using a frontier model (Claude 4 Sonnet or equivalent) that gives you an implicit binary relevance score based on whatever you're putting into context -- filenames, graph structures, etc.

If a file is "relevant" the agent looks at it and decides if it should keep it in context or not. This process repeats until there's satisfactory context to make changes to the codebase.

The question is whether we actually need a 200b+ parameter model to do this or if we can distill the functionality onto a much smaller, more economical model. A lot of people are already choosing to do it with Gemeni (due to the 1m context window), and they write the code with Claude 4 Sonnet.

Ideally, we want to be able to run this process cheaply in parallel to get really fast generations. That's the ultimate goal we're aiming towards


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: