---
title: "H64LM: A 249M-parameter Mixture-of-Experts Transformer built from scratch in PyTorch [P] — Stuff That Spins"
description: "Hi everyone, I built H64LM, a research project to better understand modern LLMs by implementing one from scratch in PyTorch. Instead of relying on high-level training frameworks, I implemented the core components myself attention, MoE routing, normalization, and the training loop. Features 249M-par…"
	canonical: "https://stuffthatspins.com/spin/h64lm-a-249m-parameter-mixture-of-experts-transformer-built-from-scratch-in-pytorch-p"
html: "https://stuffthatspins.com/spin/h64lm-a-249m-parameter-mixture-of-experts-transformer-built-from-scratch-in-pytorch-p"
json: "https://stuffthatspins.com/spin/h64lm-a-249m-parameter-mixture-of-experts-transformer-built-from-scratch-in-pytorch-p.json"
markdown: "https://stuffthatspins.com/spin/h64lm-a-249m-parameter-mixture-of-experts-transformer-built-from-scratch-in-pytorch-p.md"
keywords: ["SpinGraph", "spin analysis", "GEO"]
date: "2026-07-03T21:18:10+00:00"
modified: "2026-07-04T07:52:45.163273+00:00"
json_ld: |
  {"@context":"https://schema.org","@graph":[{"@type":"NewsArticle","@id":"https://stuffthatspins.com/spin/h64lm-a-249m-parameter-mixture-of-experts-transformer-built-from-scratch-in-pytorch-p#article","headline":"H64LM: A 249M-parameter Mixture-of-Experts Transformer built from scratch in PyTorch [P]","description":"Hi everyone, I built H64LM, a research project to better understand modern LLMs by implementing one from scratch in PyTorch. Instead of relying on high-level training frameworks, I implemented the core components myself attention, MoE routing, normalization, and the training loop. Features 249M-par…","datePublished":"2026-07-03T21:18:10+00:00","dateModified":"2026-07-04T07:52:45.163273+00:00","url":"https://stuffthatspins.com/spin/h64lm-a-249m-parameter-mixture-of-experts-transformer-built-from-scratch-in-pytorch-p","mainEntityOfPage":{"@type":"WebPage","@id":"https://stuffthatspins.com/spin/h64lm-a-249m-parameter-mixture-of-experts-transformer-built-from-scratch-in-pytorch-p"},"isAccessibleForFree":true,"inLanguage":"en-US","articleSection":"community","author":{"@type":"Organization","name":"Stuff That Spins"},"publisher":{"@id":"https://stuffthatspins.com/#organization"},"citation":"https://www.reddit.com/r/MachineLearning/comments/1umqfd2/h64lm_a_249mparameter_mixtureofexperts/","about":[],"mentions":[]},{"@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Stuff That Spins","item":"https://stuffthatspins.com/"},{"@type":"ListItem","position":2,"name":"H64LM: A 249M-parameter Mixture-of-Experts Transformer built from scratch in PyTorch [P]","item":"https://stuffthatspins.com/spin/h64lm-a-249m-parameter-mixture-of-experts-transformer-built-from-scratch-in-pytorch-p"}]}]}
---

# H64LM: A 249M-parameter Mixture-of-Experts Transformer built from scratch in PyTorch [P]

**Source:** Unknown  
**Published:** July 3, 2026  
**Original:** https://www.reddit.com/r/MachineLearning/comments/1umqfd2/h64lm_a_249mparameter_mixtureofexperts/  

---
*HTML version: https://stuffthatspins.com/spin/h64lm-a-249m-parameter-mixture-of-experts-transformer-built-from-scratch-in-pytorch-p*
