SPIN Unprocessed July 3, 2026 ai_technology community
Contrastive Decoding Diffing (CDD): recovering verbatim finetuning data from logits alone, no weight access needed[R]
View original on reddit.comSummary
We built a model diffing method that recovers verbatim content from narrowly finetuned LLMs using only grey-box logit access (no weights, no activations, no probe corpus). Recent work (Minder, Dumas et al., "Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences") showed that finetuning leaves detectable traces in activation differences between base and finetuned models. Their method, Activation Difference Lens (ADL), steers generation using these differences, but i
SpinGraph analysis pending — check back after processing.
Ask AI about this story
See how AI engines summarize this narrative — one click, prompt included.
More from Reddit r/MachineLearning
View all →- What does "Safe AI" look like? [D]
- Small Language Model SLM [D]
- Tom Yeh's AI by hand? is it worth it? [D]
- I built my 'first' flow matching image generator, here's what I learned [P]
- H64LM: A 249M-parameter Mixture-of-Experts Transformer built from scratch in PyTorch [P]
- A system-level approach to prompt injection: separating instruction and data channels in LLM agents [P]
Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO