The Art of Refusal: A Survey of Abstention in Large Language Models (AI summary)

Key Points

1. The survey introduces a framework to examine abstention behavior in large language models (LLMs) from three perspectives: the query, the model, and human values.

2. The survey reviews the literature on abstention methods categorized based on the development stages of LLMs (pretraining, alignment, and inference).

3. The survey discusses the merits and limitations of prior work on abstention and identifies areas for future research, such as encouraging the study of abstention as a meta-capability across tasks and customizing abstention abilities based on context.

4. The survey provides an analysis of evaluation benchmarks and metrics used to assess abstention capabilities in LLMs.

5. The survey finds that while abstention has been shown to enhance model safety and reliability, its application has largely been constrained to narrowly defined contexts and tasks.

6. The survey discusses the vulnerability of abstention mechanisms to manipulation through techniques like persuasive language and strategic prompt engineering.

7. The survey highlights the need for more generalizable evaluation and customization of abstention capabilities to user needs, as well as the potential for abstention to introduce biases across demographic groups.

8. The survey encourages research on expanding abstention strategies to encompass broader applications and more dynamic contexts, balancing abstention with helpfulness.

9. The survey emphasizes the importance of strategic abstention in LLMs to enhance their reliability and safety, and calls for refining abstention mechanisms to be more adaptive and context-aware.

Summary

This paper introduces a framework for examining abstention behavior in large language models (LLMs). Abstention refers to the refusal of LLMs to provide an answer, and is increasingly recognized as a way to mitigate hallucinations and enhance safety in building LLM systems. The authors propose analyzing abstention from three perspectives: the query, the model, and human values. The query perspective focuses on whether the input is ambiguous, incomplete, lacks sufficient context, or has knowledge conflicts, in which case the system should abstain.

The model knowledge perspective examines the limitations and biases of the AI model, and the system should abstain if the model is sufficiently unsure about the correctness of its output. The human values perspective considers the ethical implications and societal norms that influence whether a query should be answered, and the system should abstain if the query or response may compromise safety, privacy, fairness, or other values.

The paper provides a detailed survey of existing abstention methods, categorized based on the model lifecycle (pretraining, alignment, and inference). Pretraining methods are rare, with one notable exception being data augmentation to encourage LLMs to predict "unanswerable" when presented with empty or randomly sampled documents. Instruction tuning on abstention-aware data is a common approach to improve abstention capabilities, though researchers disagree on whether this helps LLMs learn abstention as a meta-capability. Learning from preferences, either through direct preference optimization or using safety-aligned reward models, can help reduce over-abstention introduced by instruction tuning.

The authors also survey evaluation benchmarks and metrics used to assess abstention, including datasets focused on query answerability, model knowledge boundaries, and human value alignment. They discuss the limitations of current approaches, such as the vulnerability of abstention to manipulation, the potential for introducing biases, and the need to view abstention as a dynamic component of dialogue progression rather than a static endpoint.

The paper concludes by highlighting several promising research directions, such as studying abstention as a meta-capability across tasks, enhancing privacy and copyright protections through abstention-aware designs, and improving multilingual abstention. The authors emphasize the importance of strategic abstention in LLMs to enhance their reliability and safety, and call for the development of more adaptive and context-aware abstention mechanisms.

Reference: https://arxiv.org/abs/2407.18418

ML and AI papers

The Art of Refusal: A Survey of Abstention in Large Language Models (AI summary)

Recent posts

Foundational Models Defining a New Era in Vision: A Survey and Outlook (AI summary)

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (AI summary)

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents (AI summary)