May 8, 2024
Sander Land, Max Bartolo
https://arxiv.org/pdf/2405.05417v1
1. Motivation
- LLM can exhibit inconsistent performance on different tokens due to uneven training data distribution.
- Under-trained tokens lead to poor performance and potential safety issues when deployed.
- Existing methods to detect under-trained tokens rely on manual curation, which is inefficient for massive LLMs.
2. Proposed Solution
- The paper introduces an automated approach called "Fishing for Magikarp" to detect under-trained tokens in LLMs.
- It leverages the model's own predictions to identify tokens with low confidence or inconsistent outputs.
3. Method Details
- The method generates prompts containing the target token and feeds them to the LLM to obtain predictions.
- It computes a "Magikarp score" based on the entropy and variance of the predictions across prompts.
- Tokens with high Magikarp scores are flagged as potentially under-trained.
4. Evaluation
- They evaluate the approach on various LLMs like GPT-3, PaLM, and LaMDA, demonstrating its effectiveness.