[Paper] Multi-Block Diffusion Language Models
View original on reddit.comSummary
Block Diffusion Language Models (BD-LMs) improve diffusion-based text generation with KV caching and flexible-length generation. A natural next step is to extend them from Single-Block Diffusion (SingleBD) to Multi-Block Diffusion (MultiBD), where a running-set of consecutive blocks is decoded concurrently for inter-block parallelism. However, existing BD-LMs are mostly trained under teacher forcing, where the model observes only one noisy block conditioned on a clean prefix. While the recent di
SpinGraph analysis pending — check back after processing.
Ask AI about this story
Opens with the SpinGraph .md URL and structured context — one click, prompt included.
More from Reddit r/LocalLLaMA
View all →- Concurrency plus nvfp4 on Blackwell
- 5060 worth it?
- Getting close to 100K context on 32GB VRAM with Qwen3.6-27 at Q8
- I benchmarked 13 models at 65K-128K context to find out what actually matters for agentic workloads
- PSA: Upscaling Gemma 4 requires a proportional layer_scalar adjustment
- Using "applications" to make a smaller model more effective at bigger tasks.
Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO