Self-Adapting Language Models (SEAL)
Updated: Oct 13, 2025
Self-Adapting Language Models are large language models (LLMs) that can autonomously update their own weights using self generated training data and optimization directives. Instead of relying on external fine tuning datasets or human crafted instructions, SEAL models create self edits textual instructions describing how to modify their parameters and apply these edits through supervised fine tuning. These self edits may include reformulated data, inferred facts, or optimization hyperparameters. Reinforcement learning (RL) trains the model to generate effective self edits by rewarding those that lead to improved downstream task performance.
Core Mechanism
- Self-Edit Generation:
The model receives new input or context and generates “self-edits,” i.e., its own fine-tuning data or update directives.
- Weight Update:
Using supervised fine tuning (often via LoRA), the model updates its weights based on the generated self-edits.
- Reinforcement Learning Loop:
The model’s performance on a downstream task determines a reward signal that guides future self-edit generation.
- Meta Learning Framework: SEAL learns how to learn it meta-learns strategies for producing self-improving updates without external supervision.
Important Statistics and Experimental Results
Metric / Experiment | Setup | Result / Improvement |
---|---|---|
Knowledge Incorporation (SQuAD) | Qwen2.5-7B model fine-tuned on synthetic data | Accuracy improved from 33.5% → 47.0% (no-context QA) |
Comparison to GPT-4.1 Synthetic Data | GPT-4.1 generated implications used for finetuning | SEAL outperformed GPT-4.1 (47.0% vs. 46.3%) |
Few-Shot Learning (ARC benchmark) | Llama-3.2-1B-Instruct model with SEAL self-edits | 72.5% success rate, compared to 20% (without RL) and 0% (ICL only) |
Oracle Upper Bound | Human-optimized TTT configuration | 100% (used as comparison) |
Continued Pretraining (CPT) | Aggregated 200 passages | SEAL achieved 58.2%, compared to 59.4% for GPT-4.1 synthetic data |
Computation Cost | Each self-edit evaluation | ~30–45 seconds per update |
RL Convergence | ReSTEM training iterations | Converged in 2 outer RL iterations |
Catastrophic Forgetting | Sequential self-edits across passages | Gradual decline in retention over time, but no total collapse |
Key Takeaways
- SEAL allows LLMs to self direct adaptation to new data or tasks.
- Reinforcement learning enables the model to evaluate and refine its own updates over time.
- Demonstrated ability to outperform synthetic data generated by much larger models (GPT-4.1).
- Offers a step toward continual learning and agentic AI systems capable of autonomous self-improvement.