More

mchiang · 2026-01-22T20:59:23 1769115563

hey, thanks for sharing. I had to go to the Twitter feed to find the GitHub link:

dang · 2026-01-22T21:31:54 1769117514

Thanks for catching that. I've changed the URL at the top to that from https://twitter.com/serafimcloud/status/2014266928853110862 now.

mchiang · 2025-10-16T08:02:02 1760601722

this one is exciting. It'll enable and accelerate a lot of devices on Ollama - especially around AMD GPUs not fully supported by ROCm, Intel GPUs, and iGPUs across different hardware vendors.

mchiang · 2025-10-16T07:15:28 1760598928

Qwen3-coder:30b is in the blog post. This is one that most users will be able to run locally.

We are in this together! Hoping for more models to come from the labs in varying sizes that will fit on devices.

bigyabai · 2025-10-16T07:18:58 1760599138

I'm looking forward to future ollama releases that might attempt parity with the cloud offerings. I've since moved onto the Ollama compatibility API on KoboldCPP since they don't have any such limits with their inference server.

mchiang · 2025-10-16T07:26:54 1760599614

I am super hopeful! Hardware is improving, inference costs will continue to decrease, models will only improve...

Balinares · 2025-10-16T08:49:14 1760604554

How does Qwen3-Coder:30B compare to Instruct-2507 as a coding agent backend? I was under the impression that Instruct was intended to supersede Coder?

hephaes7us · 2025-10-16T18:38:05 1760639885

In this case, it's not about whether it fits on my physical hardware or not. It's about what seems like an arbitrary restriction designed to start pushing users to their cloud offering.

mchiang · 2025-10-16T06:33:06 1760596386

https://github.com/ollama/ollama?tab=readme-ov-file#supporte...

mchiang · 2025-10-16T06:32:30 1760596350

Z.ai team is awesome and very supportive. I have yet to try synthetic.new. What's the reason for using multiple? Is it mainly to try different models or are you hitting some kind of rate limit / usage limit?

jhancock · 2025-10-16T06:40:45 1760596845

I tried synthetic.new prior to GLM-4.6...Starting in August...So I already had a subscription.

When z.ia launched GLM-4.6, I subscribed to their Coding Pro plan. Although I haven't been coding as heavy this month as the prior two months, I used to hit Claude limits almost daily, often twice a day. That was with both the $20 and $100 plans. I have yet to hit a limit with z.ai and the server response is at least as good as Claude.

I mention synthetic.new as it's good to have options and I do appreciate them sponsoring the dev of Octofriend. z.ai is a China company and I think hosts in Singapore. That could be a blocker for some.

mchiang · 2025-10-16T06:46:57 1760597217

Do you find yourself sticking with GLM 4.6 over Claude for some tasks? Or do you find yourself still wanting to reach for Claude?

jhancock · 2025-10-16T07:00:30 1760598030

I have been subscribing to both Claude and ChatGPT for over two years. Spent several months on Claude's $100 plan and couple months on ChatGPT's $200 plan but otherwise using their $20/month plans.

I cancelled Claude two weeks ago. Pure GLM-4.6 now and a tad of codex with my ChatGPT Pro subscription. I sometimes use ChatGPT for extended research stuff and non-tech.

theshrike79 · 2025-10-16T22:13:19 1760652799

I was a hardcore Claude fan too, but Sonnet 4.5 + the new weekly limits are really annoying.

I could deal with the limits, but holy shit is Sonnet 4.5 chatty. It produces as much useless crap as Opus 4.1 did. Might feel fun for Vibe Coders when the model pumps out tons of crap, but I want it to do what I asked, not try to get extra credit with "advanced" solutions and 500+ row "reports" after it's done. FFS.

Been testing crush + z.ai GLM 4.6 through Openrouter (had some credits in there it seems =) for this evening and I'm kinda loving it.

riskable · 2025-10-16T19:18:06 1760642286

Z.ai is on the US Entities (banned from export/collab) list:

> “These entities advance the People’s Republic of China’s military modernization through the development and integration of advanced artificial intelligence research. This activity is contrary to the national security and foreign policy interests of the United States under Section 744.11 of the EAR.”

https://medium.com/ai-disruption/zhipu-ai-chinas-leading-lar...

tinfoilhatter · 2025-10-16T20:54:46 1760648086

And Microsoft has been instrumental in helping to facilitate Israel's genocide of Palestinian people. Meta / Facebook did it in Myanmar. If you're paying to use any AI product, you're more than likely giving money to companies that either directly or indirectly contribute to genocide.

mchiang · 2025-10-16T06:23:41 1760595821

but that is VC funded

CaptainOfCoit · 2025-10-16T10:18:52 1760609932

> but that is VC funded

The difference between Ollama and llama.cpp boils down to "venture-backed product company" vs "community OSS project; creator’s separate company has angel/VC-style pre-seed". I hope even you could squint and see the difference :)

Btw, I feel like it's somewhat poor taste to comment about something that is effectively a competitor to you (even though you base your own product on it) and not disclose that you're working full-time at Ollama Inc. At the very least put the info in your profile.

mchiang · 2025-10-16T06:21:31 1760595691

sorry, I don't use 4chan, so I don't know what's said there.

May I ask what system you are using where you are getting memory estimations wrong? This is an area Ollama has been working on and improved quite a bit on.

Latest version of Ollama is 0.12.5 and with a pre-release of 0.12.6

0.7.1 is 28 versions behind.

thot_experiment · 2025-10-16T06:44:47 1760597087

I recently tested every version from 0.7 to 0.11.1 trying to run q5 mistral-3.1 on a system with 48GB of available vram across 2 GPUs. Everything past 0.7.0 gave me OOM or other errors. Now that I've migrated back to llama.cpp I'm not particularly interested in fucking around with ollama again.

as for 4chan, they've hated ollama for a long time because they built on top of llama.cpp and then didn't contribute upstream or give credit to the original project

mchiang · 2025-10-16T06:52:45 1760597565

ah! This must be downloaded from elsewhere and not from Ollama? So sorry about this.

To help future optimizations for given quantizations, we have been trying to limit the quantizations to ones that fit for majority of users.

In the case of mistral-small3.1, Ollama supports ~4bit (q4_k_m), ~8bit (q8_0) and fp16.

https://ollama.com/library/mistral-small3.1/tags

I'm hopeful that in the future, more and more model providers will help optimize for given model quantizations - 4 bit (i.e. NVFP4, MXFP4), 8 bit, and a 'full' model.

thot_experiment · 2025-10-16T07:26:59 1760599619

Yeah, I think the idea that models that don't come from ollama.com are second class citizens was what made me fist start to think about migrating back to llama.cpp and then the memory stuff just broke the camel's back. I don't want to use a project that editorializes about what models and quants I should be using, if I wanted a product I don't have control over I'd just use a commercial provider. For what it's worth I actually did download the full fp16 and quant it using ollama and still had the memory error for completion's sake.

I truly don't understand the reasoning behind removing support for all the other quants, it's really baffling to me considering how much more useful running a 70b parameter at q3 is that not being able to run a 70b parameter model at all, etc. Not to mention forcing me to download hundreds of gigabytes of fp16 because compatibility with other quants is apparently broken, and forcing me to quant models myself.

mchiang · 2025-09-26T15:11:00 1758899460

It’s so much fun when you can mix passion and seeing students chat about what they want to build the most.

ausbah · 2025-09-26T17:45:53 1758908753

vs a corporate hackathon where you’re suppose to somehow make something with a possible product fit while still meeting your existing deadlines

frabonacci · 2025-09-26T18:53:17 1758912797

Yeah, a lot of these corporate hackathons are basically just lead gen in disguise. "Use our SaaS product, maybe we’ll give you a t-shirt." They're more about getting conversions than actually teaching anything useful to the students.

mchiang · 2025-09-26T02:45:10 1758854710

No, Ollama is it's own project and separate. You can check it out via GitHub

https://github.com/ollama/ollama

mchiang · 2025-09-26T02:22:07 1758853327

Sorry about this. We are working really hard on providing a usage based pricing.

During the preview period we want to start offering a $20 / month plan tailored for individuals - and we are monitoring the usage and making changes as people hit rate limits so we can satisfy most use cases, and be generous.