Is there a reasonable place to run the unquantized version of this for less than Claude or OpenAI?
It seems to be priced the same and if it’s being hosted somewhere vs run locally it’s still a worse model, the only advantage would be it is not Anthropic or OpenAI.
I don’t think it does but if the person leading the (relatively small ~10 person) engineering team is dismissive and not championing it then it ends up in this weird place where people are unsure if they can/should use it.
The idea works well with or without direct integration. You can have a cli agent read arbitrary state of any tmux session and have it drive work through it. I use it for everything from dev work to system debugging. It turns out a portable and callable binary with simple parameters is still easier to use for agents than protocols and skills: https://github.com/tikimcfee/gomuxai
There’s no special support needed; it’s just a bash command that any CLI agent can use. For agents that have skills, the corresponding skill helps leverage more easily. I’ll add that to the README
All codex conversations need to be caveat with the model because it varies significantly. Codex requires very little tweaking but you do need to select the highest thinking model if you’re writing code and recommend the highest thinking NON-code model for planning. That’s really it, it takes task time up to 5-20m but it’s usually great.
Then I ask Opus to take a pass and clean up to match codebase specs and it’s usually sufficient. Most of what I do now is detailed briefs for Codex, which is…fine.
I will jump between a ChatGPT window and a VSCode window with the Codex plugin. I'll create an initial prompt in ChatGPT, which will ask the coding agent to audit the current implementation, then draft an implementation plan. The plan bounces between Chat and Codex about 5 times, with Chat telling Codex how to improve. Then Codex implements, creates an implementation summary, which I give to Chat. Chat then asks to add a couple of things fixes, then it's done.
Why non-thinking model? Also 5-20 minutes?! I guess I don’t know what kind of code you are writing but for my web app backends/frontends planning takes like 2-5 minutes tops with Sonnet and I have yet to feel the need to even try Opus.
I probably write overly detailed starting prompts but it means I get pretty aligned results. It does take longer but I try to think through the implementation first before the planning starts.
Lee is a marketer (not in title but in truth) for Cursor. He wrote a post to market their new CMS/WYSIWYG feature.
We spend ~$120/month on our CMS which hosts hundreds of people across different spaces.
Nobody manages it, it just works.
That’s why people build software so you don’t need someone like Lee to burn a weekend to build an extremely brittle proprietary system that may or may not actually work for the 3 people that use it.
Engineers love to build software, marketers working for gen ai companies love to point to a sector and say “just use us instead!”, just shuffling monthly spend bills around.
But after you hand roll your brittle thing that never gets updates but for some reason uses NextJS and it’s exploited by the nth bug and the marketer that built it is on to the next company suddenly the cheap managed service starts looking pretty good.
Anyway, it’s just marketing from both sides, embarrassing how easily people get one-shot by ads like this.
(I wrote the response) Just because it's marketing, doesn't mean it can also be educational?
I am a marketer and a developer. But I also know that you don't get far by trying to trick people into your product. As a marketer, I also get front row seat seeing how software plays out for a lot of businesses out there, and I have done so for a lot of years. I wanted to share those perspectives in response to Lee's write-up.
So yes, obviously both these pieces make a case for how the software we're employed by solves problems. And anyone who has been in developer marketing for a while knows that the best strategy is to educate and try to do so with credibility.
(I wrote the original post) I'm a developer, but you can call me a marketer if you want. I don't think it changes the point of my post.
The point was that bad abstractions can be easily replaced by AI now, and this might work well for some people/companies who were in a similar situation as me. I was not trying to say you don't need a CMS at all. In fact, I recommended most people still use one.
What you describe as an "extremely brittle proprietary system" is working great for us, and that's all that I care about. I don't "love to build software" for the sake of building software. The post is about solving a problem of unnecessary complexity.
I built a CMS back in 2010 in Ruby on Rails (it powered a once popular site that I shut down for unrelated personal reasons). It originally used a thin layer of javascript along with a few buttons to wrap around some HTML. I later extended it to use markdown for fast editing. I didn't spend more than maybe 3-5 days on the entire project, including testing/deployment, and it stood up for over a decade until I retired it due to reasons mentioned.
I bring that up because when I see headlines like this, I know EXACTLY the type of person who wrote the content.
For my part, there were a few occasional issues/bugs early on, however I was able to catch them and fix them quicky thanks to testing, user input, and understanding of the code base.
Side note: I still own the domain. It sits on Cloudflare and resolves to an IP address which isn't valid. The AI traffic that has been hitting my domain has been about 4X the user base I had. This isn't CF spitting this number out...I've verified it.
Thankfully CF doesn't really have usage limits that folks like me would ever notice.
You see, although what you say makes sense, paid software can also be extremely brittle systems. The only benefit is you can put the blame on someone else, which for the corporate life is a great hack. But that is not good engineering, much less use NextJS which is the same problem.
Customized software is as good as the team developing them are and trusting others to do that is proven to not work all the time, React proving it to all of us the last days with 4 different CVEs.
yep and thankfully Lee will always be at cursor and definitely not switch companies in the future
the chance of the software that does one thing well being maintained by the dedicated company is higher than the chance of Lee not switching jobs once the once vesting cliff has been reached again
Based on the fact that there are very few up-to-date English-language search indexes (Google, Bing, and Brave if you count it), it must be incredibly costly. I doubt they are maintaining their own.
I've been wondering can't this be done p2p? Didn't we solve most of the technical problems in the late 90s / early 2000s? And then just abandoned that entire way of thinking for some reason?
If many thousands of people care about having a free / private / distributed search engine, wouldn't it make sense for them to donate 1% of their CPU/storage/network to an indexer / db that they they then all benefit from?
Well, flesh it out more and it doesn't sound solved at all.
How do you make it trustless. How do you fetch/crawl the index when it's scattered across arbitrary devices. How do you index the decentralized index. What is actually stored on nodes. When you want to do something useful with the crawled info, what does that look like.
I think you could do it hierarchically, and with redundancy.
You'd figure out a replication strategy based on observed reliability (Lindy effect + uptime %).
It would be less "5 million flaky randoms" and more "5,000 very reliable volunteers".
Though for the crawling layer you can and should absolutely utilize 5 million flaky randoms. That's actually the holy grail of crawling. One request per random consumer device.
I think the actual issue wouldn't be the technical issue but the selection. How do you decide what's worth keeping.
You could just do it on a volunteer basis. One volunteer really likes Lizard Facts and volunteers to host that. Or you could dynamically generate the "desired semantic subspace" based on the search traffic...
Let me illustrate this with a more poetic example.
In 2015, I was working at a startup incubator hosted inside of an art academy.
I took a nap on the couch. I was the only person in the building, so my full attention was devoted to the strange sounds produced by the computers.
There were dozens of computers there. They were all on. They were all wasting hundreds of watts. They were all doing essentially nothing. Nothing useful.
I could feel the power there. I could feel, suddenly, all the computers in a thousand mile radius. All sitting there, all wasting time and energy.
perplexity added API today, got the following email:
> Dear API user,
We’re excited to launch the Perplexity Search API — giving developers direct access to the same real-time, high-quality web index that powers Perplexity’s answers.
Not particularly. Indexes are sort of like railroads. They're costly to build and maintain. They have significant external costs. (For railroads, in land use. For indexes, in crawler pressure on hosting costs.)
If you build an index, you should be entitled to a return on your investment. But you should also be required to share that investment with others (at a cost to them, of course).
This is now (more) dated. Copilots as an interface are dated. The current initiative is full agents with human in the loop at the very start, occasionally the middle and the end.
Huds are just good UI, something copilots can natively exist as part of in the form of contextual insights and alerts.
We’re moving on to agency where it’s everything else vs an entirely different entity taking the action of flying the plane from take off to landing.
I don't think you understood what a "copilot" and a "HUD" is. Confusingly, GitHub Copilot (the original one that suggests completions) is basically a HUD. On the other hand, agents that you give a task are clearly copilots that work via a natural language interface.
The article also mentions that agents are copilots:
> Here’s another personal example from AI coding. Let’s say you want to fix a bug. The obvious “copilot” way is to open an agent chat and ask it to do the fix.
I don’t understand this one at all. Say you need to update a somewhat unique implementation of a component across 5 files. In pseudocode, it might take you 30 seconds to type out whatever needs to be done. It would take maybe 3-4 minutes to do it.
I set that up to run then do something different. I come back in a couple minutes, scan the diffs which match expectations and move on to the next task.
That’s not everything but those menial tasks where you know what needs to be done and what the final shape should look like are great for AI. Pass it off while you work on more interesting problems.
It seems to be priced the same and if it’s being hosted somewhere vs run locally it’s still a worse model, the only advantage would be it is not Anthropic or OpenAI.