Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? (AI summary)

Key Points

- Large language models (LLMs) may struggle to integrate new factual knowledge through fine-tuning, as they mostly learn to use their pre-existing knowledge. This suggests that LLMs are more inclined to acquire knowledge through pre-training rather than fine-tuning.

- The impact of new factual knowledge on the model's tendency to hallucinate was investigated through a controlled study. It was found that fine-tuning examples introducing new knowledge were learned slower, and as the model eventually learned new knowledge through fine-tuning, it became more prone to hallucinations with respect to its pre-existing knowledge.

- The study also showed that filtering out unknown fine-tuning examples substantially reduced the risk of overfitting without sacrificing performance, indicating the potential for unintended consequences when introducing new knowledge through fine-tuning.

- HighlyKnown fine-tuning examples represented facts with a high degree of knowledge, MaybeKnown represented facts with lower degrees of certainty, and WeaklyKnown represented facts with the weakest degree of certainty. Surprisingly, it was found that MaybeKnown fine-tuning examples were essential for the model to handle such examples correctly during inference, indicating their importance.

- The study proposed SliCK, a categorization of facts with respect to the model's knowledge, to guide future research on categorization methods. The categorization was shown to provide meaningful distinctions in the analysis of fine-tuning dynamics and model performance.

- It was shown that supervised fine-tuning may be more useful as a mechanism to enhance the utilization of pre-existing knowledge, and that LLMs struggle to integrate new factual knowledge through fine-tuning. This raises questions about fine-tuning practices and highlights the risks involved in updating LLMs' knowledge through fine-tuning.

Summary

The paper explores the potential impact of supervised fine-tuning on large language models (LLMs) and the potential of introducing new factual information during fine-tuning to encourage the generation of factually incorrect responses, also known as "hallucinations." The study focuses on the integration of new factual knowledge during fine-tuning and its impact on the model's tendency to produce factually incorrect responses.

Methodology
The authors employ a controlled setup for their study, focusing on closed-book question answering (QA) and varying the proportion of fine-tuning examples that introduce new factual knowledge. They introduce SliCK, a categorization of facts with respect to the model's knowledge, to assess whether a single fine-tuning example is consistent with the model's knowledge. The study provides empirical evidence that large language models struggle to integrate new factual knowledge through fine-tuning and mostly learn to use their pre-existing knowledge more efficiently.

Findings
The paper highlights that fine-tuning examples that introduce new knowledge are learned significantly slower than those consistent with the model's knowledge. However, as these examples with new knowledge are eventually learned, they linearly increase the model's tendency to hallucinate factually incorrect responses. The authors demonstrate that learning new factual knowledge through fine-tuning is correlated with the model's tendency to hallucinate with respect to its pre-existing knowledge.

Dynamics of Fine-tuning
Additionally, the paper presents insights into the dynamics of fine-tuning, with findings such as the impact of fitting unknown fine-tuning examples on the model's performance. The study suggests that unknown examples are harmful, particularly leading to overfitting, but their negative effect can be minimized using early-stopping or filtering out unknown fine-tuning examples. The paper also provides a breakdown of the impact of training examples from different knowledge categories and emphasizes the importance of filtering out unknown fine-tuning examples to reduce the risk of overfitting.

Practical Implications
Furthermore, the authors explore the potential practical implications of their findings, suggesting that supervised fine-tuning may have unintended consequences when introducing new knowledge. They highlight the need for further exploration of fine-tuning practices and potential approaches to mitigate the negative impact of unknown fine-tuning examples.

Overall, the paper provides a comprehensive exploration of the impact of supervised fine-tuning on large language models and its potential to induce hallucinations due to the introduction of new factual knowledge. The study contributes valuable insights into the integration of new knowledge during fine-tuning and identifies potential risks and areas for future exploration in fine-tuning practices and model knowledge acquisition.

Reference: https://arxiv.org/abs/2405.059...

ML and AI papers

Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? (AI summary)

Recent posts

Foundational Models Defining a New Era in Vision: A Survey and Outlook (AI summary)

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (AI summary)

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents (AI summary)