SPIN Unprocessed July 3, 2026 ai_technology research
How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size
View original on arxiv.orgSummary
arXiv:2607.01487v1 Announce Type: new Abstract: We propose a scaling law that takes into account model size and training data while explicitly splitting the latter into training steps and batch size (called three-term law). Fitting the proposed law on a large set of training runs, we find that it correctly recovers the scaling of the optimal batch size. Moreover, because it makes use of training runs with suboptimal batch size, our proposed law can be robustly fit with a significantly smaller am
SpinGraph analysis pending — check back after processing.
Ask AI about this story
See how AI engines summarize this narrative — one click, prompt included.
More from arXiv Machine Learning
View all →- Class-Grouped Normalized Momentum and Faster Hyperparameter Exploration to Tackle Class Imbalance in Federated Learning
- Token Geometry
- Geometry-Aware R-Structured Kolmogorov-Arnold Networks
- On the Utility and Factual Reliability of Pruned Mixture-of-Experts Models in the Biomedical Domain
- Conditional Inference Trees and Forests for Feature Selection
- The Rollout Infrastructure Tax in Coding-Agent Reinforcement Learning
Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO