Key Points

1. Auto-regressive large language models (LLMs) exhibit a surprising failure of generalization known as the "Reversal Curse," where models trained on the pattern "A is B" fail to automatically predict the reverse pattern "B is A."

2. The study presents evidence for the Reversal Curse by finetuning models on fictitious statements and evaluating their ability to answer related questions in both the original and reverse orders. The models consistently fail to generalize to the reverse direction, indicating the robustness of the curse across model sizes and families.

3. The Reversal Curse is demonstrated through experiments where models are trained on synthetic data and evaluated for their ability to generalize to reversed patterns, showing an inability to significantly increase the likelihood of the correct answer when prompted with the reversed pattern.

4. The Reversal Curse is linked to a fundamental failure of logical deduction in the LLMs' training process, as models do not automatically infer the reverse pattern even though logical deduction would suggest that they should be able to do so.

5. The paper provides evidence that the Reversal Curse affects practical generalization in state-of-the-art models by testing GPT-4 on pairs of questions about real-world celebrities and their relationships, showing a significant difference in accuracy between questions in the original and reverse order.

6. Further evidence for the Reversal Curse is provided by contemporary work using influence functions to determine how training examples influence an LLM's outputs, as well as research on knowledge editing and factual recall in LLMs.

7. The study raises questions for future research, including exploring other types of relations for reversal failures, finding reversal failures via entity-linking, and analyzing the practical impact of the Reversal Curse in large and diverse pretraining datasets.

8. The study highlights that the Reversal Curse may reflect a mechanism in auto-regressive LLMs where gradient updates are myopic, altering representations of entities based on the task at hand rather than considering both directions of inference.

9. The authors acknowledge the contributions of several individuals to the design and implementation of the experiments, paper writing, and project management, as well as the support from various organizations for the research.

Summary

The research paper examines the Reversal Curse in auto-regressive large language models (LLMs), focusing on the failure of LLMs to generalize from "<name> is <description>" to "<description> is <name>" prompts. The paper presents finetuning experiments on synthetic data to demonstrate the Reversal Curse, showing that LLMs fail to correctly predict the reverse direction.

Additionally, experiments on real-world celebrity data provide evidence of practical generalization issues in state-of-the-art models. The paper also discusses a theoretical perspective on the Reversal Curse, highlighting the failure of logical deduction and the inability of LLMs to generalize beyond the training data. The researchers provide evidence for the Reversal Curse by showing that LLMs suffer from this phenomenon, and they explore potential implications and future research directions related to the Reversal Curse.

The paper also includes discussions on the potential causes of the Reversal Curse and its implications. Furthermore, the researchers experiment with different setups in an attempt to help LLMs overcome the Reversal Curse, but ultimately find that it persists across model sizes and model families. The paper raises questions for future research, such as studying other types of relations and analyzing the practical impact of the Reversal Curse.

Overall, the paper provides evidence and insights into the Reversal Curse in LLMs and sets the stage for further investigation into this phenomenon.

Reference: https://arxiv.org/abs/2309.12288