Key Points
1. Large Language Models (LLMs) have shown remarkable performance in a wide range of Natural Language Understanding tasks, but still rely on carefully designed prompts to achieve optimal performance.
2. Self-ensembling techniques like Self-Consistency have demonstrated significant performance gains in text generation with LLMs, but they depend on accurate answer extraction and incur higher inference cost compared to Greedy Decoding.
3. Recent work has shown that leveraging diverse exemplars in LLM prompts can induce diversity in the model outputs, which can be combined with self-ensembling approaches.
4. The paper introduces PEDAL (Prompts based on Exemplar Diversity Aggregated using LLMs), a hybrid self-ensembling approach that combines diverse exemplar based prompts and LLM based aggregation to achieve improved performance over Greedy Decoding with lower inference cost compared to Self Consistency.
5. PEDAL generates multiple candidate responses using Greedy Decoding with diverse prompts and then aggregates them using the same LLM to produce the final output.
6. Experiments on the SVAMP and ARC datasets show that PEDAL achieves better accuracy than Greedy Decoding and lower inference cost compared to Self Consistency.
7. PEDAL outperforms the "Unified Diverse Exemplars" baseline, which directly combines all the diverse exemplars into a single prompt, indicating the benefits of the proposed multi-prompt approach.
8. Increasing the number of diverse prompts in PEDAL leads to slight performance improvements on the SVAMP dataset, but no clear pattern is observed on the ARC dataset.
9. The paper discusses the potential future work of extending the proposed self-ensembling strategies to a wider range of problem settings involving free-form text generation.
Summary
The paper introduces a hybrid self-ensembling approach called PEDAL (Prompts based on Exemplar Diversity Aggregated using LLMs) that combines the strengths of diverse exemplar-based prompts and LLM-based aggregation to achieve improved performance in text generation compared to Greedy Decoding and Self-Consistency techniques.
Large Language Models (LLMs) have demonstrated remarkable capabilities, but they still rely on carefully designed prompts to achieve optimal performance. Previous work has shown that self-ensembling techniques like Self-Consistency can improve LLM reasoning by generating diverse "Chain-of-Thought" paths and aggregating them. However, Self-Consistency approaches have drawbacks - they depend on accurate answer extraction and incur higher inference cost due to generating more output tokens compared to Greedy Decoding.
The paper observes that using diverse exemplars in LLM prompts can induce diversity in the outputs, and this can be leveraged in self-ensembling approaches. PEDAL leverages this insight - it generates multiple candidate responses using Greedy Decoding with prompts based on diverse exemplars, and then aggregates these responses using an LLM to produce the final output.
Experiments on the SVAMP and ARC datasets show that PEDAL can achieve better accuracy than Greedy Decoding while incurring lower inference cost compared to Self-Consistency. With the Qwen2-7B-Instruct model, PEDAL achieves 1.89% higher accuracy than Greedy Decoding on SVAMP, and with Llama-3-8B-Instruct, it achieves 3.89% higher accuracy. In terms of inference cost, PEDAL processes significantly fewer output tokens than Self-Consistency, providing a favorable trade-off between accuracy and efficiency.
The paper also explores the impact of using different numbers of prompts in PEDAL, observing slight performance improvements as the number of prompts is increased on the SVAMP dataset, but no clear patterns on the ARC dataset.
Overall, the work demonstrates that the combination of diverse exemplar-based prompts and LLM-based aggregation in PEDAL can enhance text generation performance compared to both Greedy Decoding and Self-Consistency approaches, providing a practical and cost-effective solution. The authors plan to further explore extending such ensembling strategies to a wider range of free-form text generation tasks in future work.
Reference: https://arxiv.org/abs/2408.08869