Key Points

1. Despite the success of integrating large language models into conversational systems, many studies have shown the effectiveness of retrieving and augmenting external knowledge for informative responses.

2. Existing studies commonly assume the always need for Retrieval Augmented Generation (RAG) in a conversational system without explicit control, raising the question about such necessity.

3. The study proposes to investigate the need for each turn of system response to be augmented with external knowledge by developing RAGate, a gating model that predicts if a conversational system requires RAG.

4. Extensive experiments are conducted on the KETOD dataset to devise and apply RAGate to conversational models, with analyses of different conversational scenarios.

5. The results and analysis indicate the effective application of RAGate in RAG-based conversational systems, identifying appropriate system responses for RAG with high-quality responses and high generation confidence.

6. The study identifies the correlation between the generation's confidence level and the relevance of the augmented knowledge.

7. The paper discusses various approaches to knowledge retrieval and joint optimization of retriever and generator for conversational systems.

8. The key contribution is investigating the adaptive use of retrieval-augmented generation for advanced conversational systems, addressing the gap in existing studies.

9. The paper explores three variants of RAGate using language model prompting, parameter-efficient fine-tuning, and a multi-head attention neural encoder.

Summary

This research paper investigates the necessity of using Retrieval Augmented Generation (RAG) in conversational systems. It proposes a model called RAGate that can predict when RAG is needed for improved responses based on the conversation context and relevant inputs. The paper first discusses the limitations of using only large language models (LLMs) for conversational systems, such as lack of up-to-date knowledge, generation of non-factual or hallucinated content, and restricted domain adaptability. To address these issues, a common approach is to retrieve and augment LLMs with external knowledge, which has shown promising results in enhancing conversational responses.

RAGate Mechanism
However, the authors argue that overusing external knowledge could result in system responses that are information-conditioned with limited diversity and assume specific user preferences, going against the core criteria of providing factual, relevant, and appropriate responses. To address this, the paper introduces RAGate, a binary knowledge gate mechanism that manipulates the use of external knowledge for a conversational system.

Exploration of RAGate Variants
The authors explore three variants of RAGate: RAGate-Prompt using prompts with pre-trained LLMs, RAGate-PEFT using parameter-efficient fine-tuning of LLMs, and RAGate-MHA with a multi-head attention encoder. Extensive experiments are conducted on the KETOD dataset, which includes human annotations on the need for knowledge augmentation in conversations.

Results and Findings
The results show that RAGate-PEFT and RAGate-MHA can effectively identify the necessity of knowledge augmentation, capturing trends such as more augmentation at the beginning of conversations and in certain domains. When applied to a conversational system, RAGate-enabled models can generate responses of comparable quality to always-augmented models, but with higher generation confidence, indicating a lower risk of hallucination. The study also finds a positive correlation between the generation confidence and the relevance of the augmented knowledge.

In summary, this paper addresses the fundamental research question of whether retrieval-augmented generation is always necessary in conversational systems. The proposed RAGate model provides an adaptive solution to intelligently control the use of external knowledge, leading to more effective and faithful conversational responses.

Reference: https://arxiv.org/abs/2407.21712