Hacker Newsnew | past | comments | ask | show | jobs | submit | tempusalaria's commentslogin

Most of EA’s revenue comes from franchise games that are way below typical AAA standard. EA’s value is from IP not talent


All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs. Devs should focus on working directly with model generate apis and not using all the decoration.


Me? I love some lock in. Give me the coolest stuff and I'll be your customer forever. I do not care about trying to be my own AI company. I'd feel the same about OpenAI if they got me first... but they didn't. I am team Anthropic.


I vastly prefer the manual caching. There are several aspects of automatic caching that are suboptimal, with only moderately less developer burden. I don’t use Anthropic much but I wish the others had manual cache options


What's sub-optimal about the OpenAI approach, where you get 90% discount on tokens that you've previously sent within X minutes?


Lots of situations, here are 2 I’ve faced recently (cannot give too much detail for privacy reasons, but should be clear enough)

1) low latency desired, long user prompt 2) function runs many parallel requests, but is not fired with common prefix very often. OpenAI was very inconsistent about properly caching the prefix for use across all requests, but with Anthropic it’s very easy to pre-fire


Is it wherever the tokens are, or is it the N first tokens they've seen before? Ie if my prompt is 99% the same, except for the first token, will it be cached?


The prefix has to be stable. If you are 99% the same but the first token is different it won't cache at all. You end up having to design your prompts to accommodate this.


which is important to bear in mind if people are introducing a "drop earliest messages" sliding window for context management in a "chat-like" experience. once you're at that context limit and start dropping the earliest messages, you're guaranteeing every message afterwards will be a cache miss.

a simple alternative approach is to introduce hysteresis by having both a high and low context limit. if you hit the higher limit, trim to the lower. this batches together the cache misses.

if users are able to edit, remove or re-generate earlier messages, you can further improve on that by keeping track of cache prefixes and their TTLs, so rather than blindly trimming to the lower limit, you instead trim to the longest active cache prefix. only if there are none, do you trim to the lower limit.


That's what I thought, thanks Simon.


because you can have multiple breakpoints with Anthropic's approach, whereas with OpenAI, you only have breakpoints for what was sent.

for example if a user sends a large number of tokens, like a file, and a question, and then they change the question.


I thought OpenAI would still handle case? Their cache would work up to the end of the file and you would then pay for uncached tokens for the user's question. Have I misunderstood how their caching works?


not if call #1 is the file + the question, call #2 is the file + a different question, no.

if call #1 is the file, call #2 is the file + the question, call #3 is the file + a different question, then yes.

and consider that "the file" can equally be a lengthy chat history, especially after the cache TTL has elapsed.


I vibe-coded up a quick UI for exploring this: https://tools.simonwillison.net/prompt-caching

As far as I can tell it will indeed reuse the cache up to the point, so this works:

Prompt A + B + C - uncached

Prompt A + B + D - uses cache for A + B

Prompt A + E - uses cache for A


A lot of the current code and science capabilities do not come from NTP training.

Indeed in seems in most language model RL there is not even process supervision, so a long way from NTP


Cerebras has very limited scale. Mistral has very few users so they can use cerebra’s in inference whereas OpenAI and Anthropic cannot. If mistral grows a lot they will stop using cerebras


Fast tire changes only matter a very limited amount of the time (pretty much only if the extra time drops you a place, so there has to be 1 car/20 in a specific 1 second window on what is typically a 90s lap for 3s (a slow stop) vs 2s (a fast stop) to matter. Maybe 20% of the time a slow stop happens, it costs a driver.

Strategy matters a lot and good strategy is worth at least a few positions in a race.


You're forgetting the offset from undercutting. The cars don't need to be within 1s on track for it to matter - you could be within 4s and that extra 0.5s in the pit stop costs you the position if you pit later. Un-lapped 'traffic' is also critical. If you're trying to find a gap to pit into and it's tight an extra second could put you behind a slow car and cost you 'real' positions later.


I imagine it runs civ 2 pretty well


Dude imagine mounting a virtual drive on that GPU and loading the game straight from that. You won’t even notice the loading screens!


WhatsApp is certainly worth less today than what they paid for it plus the extra funding it has required over time. Let alone producing anything close to ROI. Has lost them more money than the metaverse stuff.

Insta was a huge hit for sure but since then Meta Capital allocation has been a disaster including a lot of badly timed buybacks


SFT is part of the classic RLHF process though


Yes this write-up is not about agents.

In fact it’s a great illustration of why the hype around agents is misplaced!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: