Fine-grained Hallucination Detection and Editing for Language Models (AI summary)

Key Points

1. The research paper evaluates the quality of text generation models by analyzing the generation of synthetic data and training details.

2. The Community Reinvestment Act is discussed as a federal law aimed at reducing discriminatory credit practices against low-income neighborhoods.

3. The National Dodgeball League in the US is described in terms of its establishment, organization, and participation from different countries.

4. The impact of "Red Channels" in creating a blacklist for individuals in the entertainment industry due to their political beliefs is highlighted.

5. The career of The Sandman, a semi-retired American professional wrestler, is briefly discussed.

6. James Brown's song "I Don’t Mind" and its performance on the R&B "Billboard" charts are mentioned.

7. The professional rugby league footballer Robert Lui is introduced, including his birth date and team association.

8. George Sperling's proposition for improving American Sign Language communication is briefly outlined.

9. The Blissful Ignorance Effect in consumer behavior and the exploration of Nidulariaceae fungi are also covered in the paper.

Summary

Training data creation. We use the GPT-4 to generate synthetic training data. The code then wraps errors in error tags and performs 50/50 deletion-edit for the editing section. We then use Contriever to retrieve five relevant documents for each passage, forming the final references for each training instance (c, y, y*). We generate a total of 35,074 training instances for FAVA.

Training and inference. FAVA consists of two components: a retriever Mret and an editing LM Medit. Mret takes the original output LM y and optionally input prompt x and retrieves top relevant documents C = Mret(x, y). Subsequently, the editing model detects and, if possible, edits factual errors in y given the retrieved context: ŷ = Medit(x, y, C). At inference time, we retrieve the top five documents using Contriever and let FAVA identify and correct factual errors, incorporating the retrieved knowledge to enhance factuality in text generation.

Aside from the synthetic training data created through GPT-4, we also have a set of human annotated data for our benchmark testing to evaluate both the detection and editing capabilities of FAVA.
The research paper investigates the issue of hallucinations in natural language generation using large language models. It addresses challenges posed by factually incorrect text generations and the prior research in the field. The paper provides a link to available code, data, and a demo. It also discusses the distribution of error types across all generated passages, as well as statistics of the generated training data.

Additionally, the paper delves into the training details, including the base model and the training hyperparameters. Moreover, the paper presents the results and analysis, specifically focusing on the detection results, fine-grained detection task, human evaluation on model outputs, and manual analysis on generated data for FAVA training.

Finally, it includes more ablations, such as the retrieval ablations, providing the full results of the retrieval process and emphasizing the importance of retrieval and careful pipeline design.

Reference: https://arxiv.org/abs/2401.06855

ML and AI papers

Fine-grained Hallucination Detection and Editing for Language Models (AI summary)

Recent posts

Foundational Models Defining a New Era in Vision: A Survey and Outlook (AI summary)

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (AI summary)

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents (AI summary)