Key Points
1. Large language models (LLMs) can produce errors in tasks such as summarization and document-based question answering, which can pose serious issues in applications like retrieval-augmented generation (RAG).
2. Prior studies have proposed methods to combat hallucinations, generally using the LLM’s representations, such as hidden states, MLP outputs, attention block outputs, and attention head outputs. However, these methods focus on close-book hallucinations and do not account for contextual information.
3. This study introduces a lightweight classifier, the Lookback Lens, which detects contextual hallucinations based on the ratio of attention weights on the given context versus the newly generated tokens. This approach hypothesizes that contextual hallucinations are related to the extent to which an LLM attends to the provided contextual information.
4. The Lookback Lens performs on par with, and sometimes even surpasses, more complex feature-based detectors that utilize hidden states from LLMs or text-based entailment models.
5. The study demonstrates the potential of combating contextual hallucination by leveraging information from attention maps, which provides a human-meaningful measure of how much weight is given to the context during generation.
6. The Lookback Lens is found to transfer across tasks and even models, allowing a detector that is trained on a smaller LLM model to be applied to a larger model without retraining, reducing hallucinations in tasks such as summarization and question-answering.
7. Lookback Lens guided decoding improves performance in mitigating hallucinations and reduces the number of hallucinated examples.
8. The use of attention maps in large language models for detecting contextual hallucinations provides a lightweight and interpretable solution compared to complex hidden representation methods.
9. The Lookback Lens and its decoding approach opens up new possibilities for leveraging attention map information to combat hallucinations in large language models. However, it requires careful consideration before implementation in real-world applications, and it has several limitations such as sampling capabilities, inference time, and the need for annotated examples.
Summary
This paper introduces the Lookback Lens, a lightweight classifier designed to detect contextual hallucinations in large language models (LLMs) by leveraging attention map information. The key hypothesis is that contextual hallucinations are related to the extent to which an LLM attends to the provided contextual information versus its own generated outputs.
The Lookback Lens computes a simple "lookback ratio" feature, which is the ratio of attention weights on the context tokens versus newly generated tokens, for each attention head in the LLM. A linear classifier is then trained on these lookback ratio features to predict whether a given text span is factual or hallucinated.
The paper finds that the Lookback Lens performs on par with, and sometimes even surpasses, more complex detectors that use the entire hidden states of the LLM or a text-based entailment model. Importantly, the Lookback Lens is also found to transfer well across tasks and even across different LLM models, allowing a detector trained on a 7B model to be applied (without retraining) to a 13B model.
Furthermore, the paper introduces a Lookback Lens Guided Decoding strategy that integrates the Lookback Lens detector into the decoding process to mitigate contextual hallucinations. This approach is shown to reduce hallucinations by 9.6% in the XSum summarization task compared to greedy decoding. The cross-model transferability of the Lookback Lens also enables a 13B model to benefit from the detector trained on the 7B model, achieving a 3.2% reduction in hallucinations on XSum.
Overall, the paper demonstrates the effectiveness of leveraging attention maps for detecting and mitigating contextual hallucinations in LLMs, opening up new possibilities for improving the factuality and reliability of these powerful language models.
Reference: https://arxiv.org/abs/2407.07071