Hacker Newsnew | past | comments | ask | show | jobs | submit | ttoinou's commentslogin

No he said something like “well yes, only for the parts of profits made in France”

Why would it be any other way?

French people have this pipe dream all others french people to pay 75% of what they produce worldwide to pay for their retreats, hospital, useless schools system and all theirs “comité Théodule”

No but they will despise you for bringing the problem up

In the long run, good code makes everyone much happier than code that is bad because people are being "nice" and letting things slide in code review to avoid confrontation.

Using Parallels yes. Whats “on the metal” ?

I mean booting into it directly, like we used to be able to do with bootcamp

With M3 Max with 64GB of unified ram you can code with a local LLM, so the bar is much lower

But why? Spending several thousand dollars to run sub-par models when the break-even point could still be years away seems bizarre for any real usecase where your goal is productivity over novelty. Anyone who has used Codex or Opus can attest that the difference between those and a locally available model like Qwen or Codestral is night and day.

To be clear, I totally get the idea of running local LLMs for toy reasons. But in a business context the sell on a stack of Mac Pros seems misguided at best.


I ran the qwen 3.5 35b a3b q4 model locally on a ryzen server with 64k context window and 5-8 tokens a second.

It is the first local model I've tried which could reason properly. Similar to Gemini 2.5 or sonnet 3.5. I gave it some tools to call , asked claude to order it around, (download quotes, print charts, set up a gnome extension) even claude was sort of impressed that it could get the job done.

Point is, it is really close. It isn't opus 4.5 yet, but very promising given the size. Local is definitely getting there and even without GPUs.

But you're right, I see no reason to spend right now.


Getting Opus to call something local sounds interesting, since that's more or less what it's doing with Sonnet anyway if you're using Claude Code. How are you getting it to call out to local models? Skills? Or paying the API costs and using Pi?

I just start llama.cpp serve with the gguf which creates an openai compatible endpoint.

The session so far is stored in a file like /tmp/s.json messages array. Claude reads that file, appends its response/query, sends it to the API and reads the response.

I simply wrapped this process in a python script and added tool calling as well. Tools run on the client side. If you have Claude, just paste this in :-)


Sometimes you can't push your working data to third party service, by law, by contract, or by preference.

I started doing it to hedge myself for inevitable disappearance of cheap inference.

Couldn’t github replace all public commits author info email by a [email protected] email automagically ?

You can’t change anything about a commit without breaking the chain of SHA hashes in the commits, which causes pulls to break.

GitHub hides the emails on their web UI, but nothing stops people from pulling the repository with a Git client and looking at the emails in the commit log after doing so.


Which is why you should be careful to never use your actual email in git commits.

When I made a patch to the Linux kernel I did have to use a real email, since you have to send to their mailing list. I used a throwaway email for it, which I have since edited on my mail server config to forward to /dev/null (yes, I'm one of the weirdos still self hosting email in 2026). The amount of spam I got was insane, and not even developer relevant spam.


This makes me wonder how the Linux kernel git system deals with GDPR data deletion requests. Are they even legally allowed to deny them?

You have to configure your own Git client manually. But you can configure GitHub to block pushes from any email other than the no reply email GH generates for you.

Will you support Gitlab Issues ?


We could! Do you use them?


Yes !

So, what's your business model ? Is this an YC product, or a tool you developed while working on a YC product ?


We're figuring our business model out. There're two avenues that we principally think about (1) bundled coding agent subscription and (2)enterprise version with auth, team management, sharing of agent interactions. Admittedly, it's early and this can change. What won't change is that this UI layer for running multiple coding agents is and will be open-source. Emdash itself is funded by YC. Initially developed as a tool while working on another product, but we weren't funded then.


(2) sounds like a great idea if you can ensure private company data never reaches your servers, with features like remote controlling agents from a central place


Thank you, and yes!


How much tokens per seconds are you getting ?

Whats the advantage of qwen code cli over opencode ?


320 tok/s PP and 42 tok/s TG with 4bit quant and MLX. Llama.cpp was half for this model but afaik has improved a few days ago, I haven't yet tested though.

I have tried many tools locally and was never really happy with any. I tried finally Qwen Code CLI assuming that it would run well with a Qwen model and it does. YMMV, I mostly do javascript and Python. Most important setting was to set the max context size, it then auto compacts before reaching it. I run with 65536 but may raise this a bit.

Last not least OpenCode is VC funded, at some point they will have to make money while Gemini CLI / Qwen CLI are not the primary products of the companies but definitely dog-fooded.


Works for me, but sometimes there's an issue with the tool template from Qwen, past chats are changed, thus KV cache gets invalidated and it needs to reprocess input tokens from scratch. Doesn't happen all the time though

Btw I also get 42-60 tps on M4 Max with the MLX 4 bit quants hosted by LM Studio, which software do you use to run it ?


I use MLX server directly from the MLX community project (by Apple). 42 tps is with 0-5000 token context. Starts to drop from there, I have never seen 60.

Yesterday I tested the latest llama.cpp and the result is that PP has made a huge jump to 420 tps which is 30% faster than MLX on my M1. TG is now 25 tps which is below MLX but does not degrade much, at 50k context it is still 22-23 tps.

Together with Qwen code CLI llama.cpp does a lot less often re-process the full KV cache. So for now I am switching back to llama.cpp.

It is worth to spend some time with the settings. I am really annoyed by the silly jokes (was it Claude that started this?). You can disable them with customWittyPhrases. Also setting contextWindowSize will make the CLI auto compress, which works really well for me.

And depending on what you do, maybe set privacy.usageStatisticsEnabled to false.

Like Gemini, Qwen CLI supports OpenTelemetry. When I have time I'll have a look why the KV cache gets invalidated.


Great thanks ! I am so annoyed by a specific phrase which is "launching wit.exe", not funny when it could actually be talking for real about software running on your machine


There’s this developer called nightmedia who converts a lot of models to apple MLX. I can run Qwen3 coder next at 60 tps on my m4 max. It works


Nice but how do those services combine with each others ? How do you combine notion, slack, your git hosting, linear and your CI/CD ? If there are only URLs between each others it’s hard to link all the work together


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: