Key Points
1. The study aims to improve the performance of large-scale language models (LSLMs) by conditioning them on external evidence from the web, using few-shot prompting without the need for additional training or parameters.
2. The study used Google Search to retrieve relevant documents and used few-shot prompting to condition the pre-trained LSLMs to answer questions based on the retrieved evidence.
3. Conditioning LSLMs on the web using few-shot prompting led to improved performance over closed-book models, particularly in open-domain question answering tasks.
4. The retrieval performance using Google Search was effective, particularly for generating extractive answers from the evidence, with queries having a high recall rate.
5. Reranking candidate answers using different scoring functions further improved the performance of the models.
6. Conditioning smaller models on retrieved evidence was particularly beneficial for generation tasks; smaller open-book models often surpassed the performance of larger closed-book models.
7. The study found that retrieving evidence from Google Search also improved the factuality of the models, despite the challenges of conflicting parametric and contextual knowledge for answering questions beyond the training data.
8. The study highlighted the potential benefits of integrating search engines with LSLMs for enhancing factuality and grounding to factual and up-to-date information, thereby providing more effective ways to use existing models.
9. The study also emphasized the importance of addressing limitations such as the deterioration of search results for complex queries and potential safety issues related to using the whole web as a knowledge source.
Summary
Leveraging Internet Search for Language Models</b>
The paper proposes a method to overcome the limitations of large-scale language models (LSLMs) by utilizing the Internet as a source of up-to-date knowledge. The authors present a system that uses a retrieval model to fetch relevant documents from the Internet for open-domain question answering. The paper tests the effectiveness of this method on various language generation and classification tasks and finds that equipping LSLMs with Internet search through few-shot prompting results in performance gains ranging from 15% to 30%. Additionally, the paper suggests that the approach presents a lightweight method applicable to virtually any pre-trained LM without the need for fine-tuning or adding extra learnable parameters. The authors also propose a shift in focus from scaling up the size of the models to better utilization of models' few-shot capabilities in combination with increasing inference-time compute as a more scalable approach.
<b>Enhancing Language Models with External Retrieved Evidence</b>
The authors capitalize on the unique few-shot capabilities of large-scale language models to overcome challenges with respect to grounding to factual and up-to-date information. They use the Internet as a source of up-to-date knowledge and propose a method to condition language models on external retrieved evidence, resulting in improved performance on open-domain question answering tasks. The paper provides detailed insights into the retrieval and prompting process, the effectiveness of different scoring functions, and the impact of the method on models of varying sizes. The results show that conditioning language models on Google search results leads to performance improvements, especially for generation tasks, and that increasing inference-time compute via sampling multiple answers and reranking further enhances performance. The findings highlight the potential of integrating search engines with LSLMs and suggest that inference-time interventions can bring significant gains, potentially offering a more effective approach than solely focusing on scaling model parameters. However, the paper also acknowledges limitations, such as performance gaps with fine-tuned models and challenges with retrieval results for multi-hop questions. Nonetheless, the proposed method offers a promising direction for improving language models by leveraging up-to-date knowledge from the Internet.
Reference: https://arxiv.org/abs/2203.05115