Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great recommendation about the community

Any other resources like that you could share?

Also, what kind of models do you run with mlx and what do you use them for?

Lately I’ve been pretty happy with gemma3:12b for a wide range of things (generating stories, some light coding, image recognition). Sometimes I’ve been surprised by qwen2.5-coder:32b. And I’m really impressed by the speed and versatility, at such tiny size, of qwen2.5:0.5b (playing with fine tuning it to see if I can get it to generate some decent conversations roleplaying as a character)



I've shared a bunch of notes on MLX over the past year, many of them with snippets of code I've used to try out models: https://simonwillison.net/tags/mlx/

I mainly use MLX for LLMs (with https://github.com/ml-explore/mlx-lm and my own https://github.com/simonw/llm-mlx which wraps that), vision LLMs (via https://github.com/Blaizzy/mlx-vlm) and running Whisper (https://github.com/ml-explore/mlx-examples/tree/main/whisper)

I haven't tried mlx-audio yet (which can synthesize speech) but it looks interesting too: https://github.com/Blaizzy/mlx-audio

The two best people to follow for MLX stuff are Apple's Awni Hannun - https://twitter.com/awnihannun and https://github.com/awni - and community member Prince Canuma who's responsible for both mlx-vlm and mlx-audio: https://twitter.com/Prince_Canuma and https://github.com/Blaizzy


Very cool insight, Simonw! I will check out the audio mlx stuff soon. I think that is kinda new still. Prince Canuma is the GOAT.


Amazing. Thank you for the great resources!


Hey Nico,

Very cool to hear your perspective in how you are using the small LLMs! I’ve been experimenting extensively with local LLM stacks on:

• M1 Max (MLX native)

• LM Studio (GLM, MLX, GGUFs)

• Llama.cp (GGUFs)

• n8n for orchestration + automation (multi-stage LLM workflows)

My emerging use cases: -Rapid narration scripting -Roleplay agents with embedded prompt personas -Reviewing image/video attachments + structuring copy for clarity -Local RAG and eval pipelines

My current lineup of small LLMs (this changes every month depending on what is updated):

MLX-native models (mlx-community):

-Qwen2.5-VL-7B-Instruct-bf16 → excellent VQA and instruction following

-InternVL3-8B-3bit → fast, memory-light, solid for doc summarization

-GLM-Z1-9B-bf16 → reliable multilingual output + inference density

GGUF via LM Studio / llama.cpp:

-Gemma-3-12B-it-qat → well-aligned, solid for RP dialogue

-Qwen2.5-0.5B-MLX-4bit → blazing fast; chaining 2+ agents at once

-GLM-4-32B-0414-8bit (Cobra4687) → great for iterative copy drafts

Emerging / niche models tested:

MedFound-7B-GGUF → early tests for narrative medicine tasks

X-Ray_Alpha-mlx-8Bit → experimental story/dialogue hybrid

llama-3.2-3B-storyteller-Q4_K_M → small, quick, capable of structured hooks

PersonalityParty_saiga_fp32-i1 → RP grounding experiments (still rough)

I test most new LLMs on release. QAT models in particular are showing promise, balancing speed + fidelity for chained inference. The meta-trend: models are getting better, smaller, faster, especially for edge workflows.

Happy to swap notes if others are mixing MLX, GGUF, and RAG in low-latency pipelines.


Impressive! Thank you for the amazing notes, I have a lot to learn and test




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: