PSA: Upscaling Gemma 4 requires a proportional layer_scalar adjustment

Summary

A lot of people seem to be confused or mystified about this so figured I'd spell it out. I played around with RYS and realized that it broke Gemma 4 models. Turns out there's a `layer_scalar` value that is applied at each layer. If you don't adjust that so that the resulting model gets "the same amount", you break it. Since it's multiplicative, you have to do `s^(1/N)`, where `s` is the original scalar and `N` is the number of times the layer occurs (duplications + 1 fo

SpinGraph analysis pending — check back after processing.

Ask AI about this story

See how AI engines summarize this narrative — one click, prompt included.

ChatGPT Claude Perplexity Gemini Grok

More from Reddit r/LocalLLaMA