This looks very interesting. I wish it came with some guides for using it with a...

reissbaker · 2025-08-07T19:30:11 1754595011

I'll add docs! Tl;DR: in the onboarding (or in the Add Model menu section), you can select adding a custom LLM. It'll ask you for your API base URL, which is whatever localhost+port setup you're using, and then an env var to use as an API credential. Just put in any non-empty credential, since local models typically don't actually use authentication. Then you're good to go.

IMO gpt-oss-120b is actually a very competent local coding agent — and it should fit on your 128GB Macbook Pro. I've used it while testing Octo actually, it's quite good for a local model. The best open model in my opinion is zai-org/GLM-4.5, but it probably won't fit on your machine (although it works well with APIs — my tip is to avoid OpenRouter though since quite a few of the round-robin hosts have broken implementations.)

earino · 2025-08-07T19:34:43 1754595283

Ok wonderful! Thanks.

I'm trying to set it up right now with lmstudio with qwen3-coder-30b. Hopefully it's going to work. Happy to take any pointers on anything y'all have tried that seemed particularly promising.

reissbaker · 2025-08-07T19:42:19 1754595739

For sure! We also have a Discord server if you need any help: https://discord.gg/syntheticlab

earino · 2025-08-07T20:14:02 1754597642

Follow up question, can the diff apply and fix json models be run locally as well with octofriend, or do they have to hit your servers? Thanks!

reissbaker · 2025-08-07T20:20:29 1754598029

They're just Llama 3.1 8b Instruct LoRAs, so yes — you can run them locally! Probably the easiest way is to merge the weights, since AFAIK ollama and llama.cpp don't support LoRAs directly — although llama.cpp has utilities for doing the merge. In the settings menu or the config file you should be able to set up any API base URL + env var credential for the autofix models, just like any other model, which allows you to point to your local server :)

The weights are here:

https://huggingface.co/syntheticlab/diff-apply

https://huggingface.co/syntheticlab/fix-json

And if you're curious about how they're trained (or want to train your own), the entire training pipeline is in the Octofriend repo.

jasonjmcghee · 2025-08-07T22:47:21 1754606841

I think this might be your best bet right now. GLM-4.5-Air is probably next best. I'd run them at 8-bit using MLX.