Show, Don't Tell: Aligning Language Models with Demonstrated Feedback (AI summary)

Key Points

1. Language models (LLMs) are often trained for general-purpose use but can often mismatch with specific tasks and user preferences, resulting in unopinionated and generic outputs.

2. Existing approaches such as supervised or preference finetuning can be effective in aligning LLMs to specific tasks and users, but they require large datasets and significant effort from individuals.

3. This paper introduces a framework called Demonstration ITerated Task Optimization (DITTO) that aligns LLMs to specific settings by leveraging a small number of demonstrations (less than 10) as feedback.

4. DITTO directly aligns language model outputs to a user’s demonstrated behaviors and generates online comparison data by treating users’ demonstrations as preferred over model output from both the original LLM and earlier training iterations.

5. DITTO is evaluated across various domains such as news articles, emails, and blog posts, and it outperforms few-shot prompting, supervised fine-tuning, and other self-play methods by an average of 19% points.

6. Large language models trained on vast amounts of data have been known to perform well with careful prompting, but it can be tedious to design and sensitive to variations, making it necessary to finetune these models on large curated instruction following datasets.

7. DITTO works better than SFT (supervised fine-tuning) alone and provides strong alignment with individuals by leveraging a small number of user-provided examples of desired behavior.

8. DITTO's effectiveness is confirmed through actual user evaluations, where it outperforms zero-shot, few-shot, and self-prompted GPT-4 baselines, as well as SFT, in aligning to demonstrated preferences.

9. DITTO shows higher sample efficiency compared to pairwise preferences, and a smaller number of demonstrated behaviors can provide a strong signal for preferences specific to an individual. It also aligns language models in a more naturalistic setting, as corroborated by user studies.

Summary

This paper introduces a technique called Demonstration ITerated Task Optimization (DITTO) that directly aligns language model outputs to a user's demonstrated behaviors. The key insight of DITTO is that the language model itself, along with the expert demonstrations, can generate comparison datasets for alignment, removing the need to collect a large number of pairwise preferences.

The paper first discusses the problem of aligning large language models (LLMs) to specific settings. While existing approaches like supervised or preference finetuning are effective, they can require a large corpus of acceptable behavior, which is often prohibitively expensive for individual users. DITTO instead leverages a small number (< 10) of user-provided demonstrations as feedback to align the language model.

The DITTO algorithm works by iteratively generating comparisons between the user's demonstrations and samples from the current language model. It treats the user's demonstrations as preferred over the model's outputs, including samples from earlier iterations of the model. This allows DITTO to construct a substantial dataset of preference comparisons from just a few demonstrations. DITTO then uses this dataset to update the language model using a KL-constrained optimization objective.

DITTO Evaluation
The paper evaluates DITTO across several benchmarks, including datasets of author-specific writing styles (emails, blog posts, news articles) as well as a user study where participants provided demonstrations for email-writing tasks. In the author-specific writing evaluation, DITTO outperforms other methods like supervised finetuning, self-play, and few-shot prompting by an average of 19 percentage points in head-to-head comparisons using GPT-4 as the evaluator. The user study results again show DITTO outperforming baselines like zero-shot prompting, few-shot prompting, and supervised finetuning, by 23.9, 27.9, and 12 percentage points respectively.

Theoretical Derivation
The paper also provides a theoretical derivation of DITTO from an online imitation learning perspective. This connection allows the authors to prove that DITTO can extrapolate beyond the performance of the expert demonstrator, in certain conditions.

The paper highlights the effectiveness of using demonstrations as a form of personalized feedback for language model alignment, in contrast to more commonly used approaches like principles or pairwise preferences. By leveraging the language model itself to generate comparison data, DITTO is able to achieve strong alignment with just a handful of examples, making it a promising technique for enabling user-specific customization of large language models.

Reference: https://arxiv.org/abs/2406.00888

ML and AI papers

Show, Don't Tell: Aligning Language Models with Demonstrated Feedback (AI summary)

Recent posts

Foundational Models Defining a New Era in Vision: A Survey and Outlook (AI summary)

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (AI summary)

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents (AI summary)