Gemma 4 12B - MLX Kernel
View original on reddit.comSummary
I've mentioned this kernel project I was working on in a few posts and figured I would just open the project code for anyone curious: MLX Gemma 12B The main constraints for this on my end is an M5 16GB Macbook Pro. I usually do a model development on clusters in the cloud but have been playing with more smaller local models with SFT and fine tuning. Under the hood the MLX and CUDA libraries are not far off in trying to work with some of the models. Last night I attempted to integrate DSpark
SpinGraph analysis pending — check back after processing.
Ask AI about this story
See how AI engines summarize this narrative — one click, prompt included.
More from Reddit r/LocalLLaMA
View all →- Qwen3.6 27B on a 5090, 6.4k sample tok/s distribution after tuning MTP/cache settings
- DGX Spark and Overtemps
- Using local models with Hermes vs Claude code
- I merged fixes for quantized KV cache into my DeepSeek V4 branch
- Ran a classic(medival europe) fantasy RP/agentic benchmark across 8 local models Qwen3.6-27B held up better than its size suggests
- Local OpenSource LLM's future feels very exciting, my ideal future model "wishlist" and attempted predictions for future local models.
Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO