Key Points

1. Large language models (LLMs) often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications.

2. Previous work on eliciting confidence from LLMs includes prompting-based and training-based approaches, but these have limitations in performance and binary/inaccurate confidence estimates.

3. SaySelf is a training framework that teaches LLMs to express more accurate fine-grained confidence estimates and self-reflective rationales that identify gaps in their knowledge and explain their uncertainty.

4. SaySelf automatically generates a model-specific dataset for supervised fine-tuning by sampling multiple reasoning chains, clustering them, and using an LLM to summarize the uncertainties.

5. SaySelf employs reinforcement learning with a carefully crafted reward function to calibrate the confidence estimates, incentivizing accurate high-confidence predictions and penalizing overconfidence in errors.

6. Experiments on knowledge-intensive QA tasks show SaySelf significantly reduces confidence calibration error while maintaining task performance.

7. The generated self-reflective rationales effectively capture the internal uncertainty and can further improve the calibration.

8. SaySelf has potential impact on improving trustworthiness, guiding LLMs' interactions, and enabling proactive learning algorithms.

9. SaySelf departs from existing explainability methods by generating rationales that elucidate both the predictions and confidence estimates, based on LLMs' internal reasoning.

Summary

This paper presents SaySelf, a training framework that teaches large language models (LLMs) to express more accurate and fine-grained confidence estimates, as well as generate self-reflective rationales to explain their uncertainties.

The key aspects of the SaySelf approach are:

Experimental Results
The experimental results show that SaySelf significantly reduces the confidence calibration error and maintains the task performance, both in-distribution and out-of-distribution. The generated self-reflective rationales are found to be reasonable and can further improve the confidence calibration.

Ablation Studies
Ablation studies verify the effectiveness of the key components in SaySelf, including the use of self-reflective rationales, confidence estimates, and the reinforcement learning stage. Case studies demonstrate SaySelf's ability to detect and summarize the internal uncertainties of the LLM, providing insights into the model's reasoning.

Potential Impact
The authors highlight the potential impact of SaySelf on improving the trustworthiness and reliability of LLM-based systems, as the generated confidence estimates and self-reflective rationales can guide subsequent interactions, such as invoking external tools or asking clarification questions. The approach also has implications for proactive learning algorithms that enhance LLMs' interactions with humans for continued learning.

Reference: https://arxiv.org/abs/2405.20974