SPIN Unprocessed July 3, 2026 ai_technology research
Breaking Safety at the Token Boundary: How BPE Tokenization Creates Exploitable Gaps in LLM Alignment
View original on arxiv.orgSummary
arXiv:2607.01239v1 Announce Type: new Abstract: Character-level perturbations bypass safety alignment in modern LLMs despite leaving prompts human-readable. We identify and test a central structural mechanism: BPE tokenization fragments safety-critical words into sub-word pieces, and the three public alignment datasets we surveyed contain no intentionally fragmented inputs. The mechanism is a chain, tested end-to-end on five model families (Qwen-3-4B, Qwen-2.5-7B, Gemma-3-4B, Llama-3.1-8B, Mistr
SpinGraph analysis pending — check back after processing.
Ask AI about this story
See how AI engines summarize this narrative — one click, prompt included.
More from arXiv Computation and Language
View all →- Can Language Models Actually Retrieve In-Context? Drowning in Documents at Million Token Scale
- Parameter Golf: What Really Works?
- From Monolingual to Multilingual: Evaluating Mamba for ASR in South African Languages
- Comparing Architectures for Supervised Political Scaling
- Grounded Optimization: A Layered Engineering Framework for Reducing LLM Hallucination in Automated Personal Document Rewriting
- FaithMed: Training LLMs For Faithful Evidence-Based Medical Reasoning
Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO