I tried it on a non trivial, but also well documented and self contained task. It did amazingly well. I used deepseek v4 pro via deepseek platform. The model is very fast and also it is super cheap. I burned only 0.06 USD (I reckon how the same task would have cost me had I used e.g., amp).
PS. mentioning amp because i used to use it and I pay directly for token. I topped up 5 usd so I will be going to use it and see how far can it take me. But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.
> But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.
My understanding is that DeepSeek V4 Pro is going to be uniquely good at working on consumer platforms with SSD offload, due to its extremely lean KV cache. Even if you only have a slow consumer platform, you should be able to just let it grind on a huge batch of tasks in parallel entirely unattended, and wake up later to a finished job.
AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow. (This used to be considered a bad idea with bulky KV caches, due to concerns about wearout and performance, but the much leaner KV cache of DeepSeek V4 changes the picture quite radically.)
Good. It's hard to overstate how nervous most executives are about relying on cloud-based providers.
AI currently works basically by sending your entire codebase and workflow, and internal communication over the internet to some third party provider, and your only protection is some legal document say they pinky promise they won't train on your data.
And said promise is made by people whose entire business model relies on being able to slurp up all the licensed content on the internet and ignore said licensing, on the defense of being too big to fail.
Yes, this is the most straightforward argument for local AI inference. "Why buy cloud-based SOTA AI? We have SOTA AI at home." It's great that DeepSeek may now be about to make this possible, once the support in local inference frameworks is up to the task.
Is there any place I can read about KV? Excuse my ignorance as I'm not familiar with this topic and I read scattered notes that deepseek's cost are well optimized due to how their kv cache work. But I want to read more how kv cache relates to the inference stack and where does it actually sit.
> AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow.
Especially this point. Any reason that this idea was considered bad? Is it due to the speed difference between the GPU VRAM to the RAM?
KV cache generally grows linearly with your current context; it gets filled-in with your prompts during prompt processing, and newly created context gets tacked on during token generation. LLM inference uses it to semantically relate the currently-processed token to its pre-existing context.
> Any reason that this idea was considered bad?
Because the KV cache was too big, even for a small context. This is still an issue with open models other than DeepSeek V4, though to a somewhat smaller extent than used to be the case. But the tiny KV of DeepSeek V4 is genuinely new.
> even when model subsidization is done, those open source models are quite viable alternatives.
Model inference was never subsidized. Inference is highly profitable with today's prices. That's why you have many inference providers. My guess, the prices for inference will go down, as more competition starts cutting the margin.
It's model training, development and R&D that cost a lot, and companies creating closed models don't have any business model except astroturfing and trying to recover training costs through overpriced inference.
>people are just using the latest and most expensive models because they can,
While I agree with the sentiment, I think that might have been initially driven by older models being nerfed and/or newer ones were better at token/$. And there is this notion that those labs don't constraint the model on the first days after its release.
I (codex) made a plugin for stremio to stream my collection from real-debrid.
I tried existing plug-ins first and non was working. Just prompted chatgpt to refine my initial specs, and asked on another session to build that. And later used codex for the last mile. Nothing fancy though and nothing can be particularly useful to others but damn it was too useful to me and my wife.
Ultrathink isn’t “removed.” Its behavior is different. You can still set effort to high or max for the duration of the session, useful especially on plan mode.
Which subscription do you have to use it? Via Google ai pro and gemini cli i always get timeouts due to model being under heavy usage. The chat interface is there and I do have 3.1 pro as well, but wondering if the chat is the only way of accessing it.
The only thing i don't like about gemini models (gemini cli) is that there's no transparency on which model I'm using. I can start with pro and it can be downgraded sometimes even to gemini 2.5 flash lite.
Sublime is quite good. I have always been using sublime for quick edits, dumping notes etc. But lately I came to appreciate it more as a lightweight IDE. I use go (lsp and some plugins) and ST (sqltools) in addition to package manager and project manager. I like how fast it is, how well polished the editor is. And generally all plug-ins i use work nicely. Also claude helped a lot (eg writing some shortcuts for specific scenarios that I tend to use a lot).
Not to say anything against Zed though. But sublime with one session of claude can help you build your very own customized ide.
I love(d) Sublime and it's still getting updates from time to time, but unfortunately its ecosystem died five ish years ago, its package repository is a lot of "last updated 10 years ago". It's still a viable editor, but without community support it's not going to be good enough long term.
That said, ST (and its predecessor, forgot the name) set the standard for "lightweight" (lighter than IDEs) editors - Atom, VS Code, now Zed, can all trace their common patterns back to ST.
> Atom, VS Code, now Zed, can all trace their common patterns back to ST.
True, but Zed is the only spiritual successor IMO, Atom and VSCode do not care about speed or snappiness, which is the nicest thing about Sublime Text (for me.)
TextMate? It's been surprisingly influential for an editor I've never seen anyone use; maybe in the US, where people actually buy Mac, it was different.
I didn’t notice that it hasn’t been updated since ‘21 (TM2), but I still use it every day. Just a reliable, minimal, fully native (no electron, etc) editor that is flexible enough to keep adding new bundles to. I’m sad it’s not in development, but happy it’s an oasis from AI coding.
I was a big TM user who ended up on ST because I needed more of the community integrations and so on... which are now turning into a weakness of ST.
I'm still on SublimeText because I can't deal with the sluggishness of VS Code, and I'll pay for the latest version, but I am starting to worry about the future of what is still a great editor. Rust coding in particular is a bit of a nightmare.
The sad thing is that both of these were the products of business models I enthusiastically support and want to see more of: the solo dev (TM) and the small business (ST), or maybe it's solo dev pretending to be small business, I can't really tell.
I still love notepad++. Basically one of only a handful of apps I miss from when I used Windows. First release about a year before textmate, so for me it's the real og.
Eh, I don't think it's really a problem. The much-vaunted VSCode ecosystem isn't actually all that useful imo, so it doesn't bother me that people aren't making lots of Sublime plugins. There's an LSP plugin which is basically all one needs.
I don’t think people realize how easy it is to make Sublime plugins. They can be as simple as a single .py file. No dependencies install step, no meta files, no rituals to go through. Just drop the .py file in the right directory. I’ve used Gemini to build about 3-4 plugins of various complexity and they all work great
Can I drop it in the 'wrong' directory and have ST pick it up from there? I like apps that are as flexible as possible when it comes to file organization.
I believe if you are using a project workspace (working in a directory with a .sublime-project file) then you can also write plugins right there in the project. But, if you wanted, you can use symlinks. Sublime will follow symlinks. I use this to sync my settings to Dropbox so I can have the same setup on any computer
Same, I'm still using Sublime with SublimeLSP, and it works for 95% of my use cases because I tend to rely on the terminal for everything else. Although I do hope to switch to Zed eventually because of it's built-in debugger and ability to select-copy text in popups (I can't believe Sublime still doesn't allow this in 2026) -- Zed still had some rough edges last time I tried it, and Sublime still seemed to perform better.
I do love how Sublime Text doesn't even blink after getting huge file, where most other editors struggle. And overall speed and responsiveness is unbelievable. I would really like to see any other editor trying to overtake Sublime Text on those metrics.
VSCode is actually one of the best for large files. Not as good as Sublime Text, but it can happily edit million line files, and you'd be surprised how many other editors can't do that. Zed couldn't until recently (not sure if it can now; I haven't checked recently).
Once you get into the GB range there are very very few editors that can edit those files unfortunately.
I have to turn off my config (-u NONE) for large files (e.g., multi-GB JSON files), or everything slows to a crawl. I never profiled it to know what's causing the slowdown. It might be treesitter.
Can you share the plugins name/configuration for Go in Sublime. There are so many options and configurations published so its hard to find a configuration that works as good as Vscode.
I use emacs as my "lightweight" IDE in the terminal and it's not light at all. Always a big install on a new server and takes ages to start up. I wish I had learned vim instead.
I was doing this but I got worried I will lose touch with my critical thinking (or really just thinking for that matter). As it was too easy to just copy paste and delegate the thinking to The Oracle.
PS. mentioning amp because i used to use it and I pay directly for token. I topped up 5 usd so I will be going to use it and see how far can it take me. But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.
reply