Key Points
- The paper introduces the concept of Decision QA, which involves answering the best decision given a decision-making question, business rules, and a structured database. The authors propose a benchmark for Decision QA called DQA, which includes two scenarios: Locating and Building. They construct the benchmark using data extracted from two video games (Europa Universalis IV and Victoria 3) that imitate real business situations requiring decision-making.
- The paper presents a new RAG (Retrieval-Augmented Generation) technique called PlanRAG, which aims to enhance the decision-making capabilities of Large Language Models (LLMs). The PlanRAG-based LLM generates a plan for decision-making and retrieves queries for data analysis, outperforming the state-of-the-art iterative RAG method by 15.8% in the Locating scenario and by 7.4% in the Building scenario.
- The authors discuss the role of decision-making in business situations, citing examples such as minimizing production costs while maintaining on-time delivery in the pharmaceutical distribution network. They highlight the complexity of decision-making, which involves making a plan for necessary analysis, retrieving necessary data using queries, and making a decision based on the data.
- To address the limitations of existing methods in handling Step (1) of the decision-making task (making a plan for decision), the authors propose the iterative plan-then-retrieval augmented generation technique, PlanRAG, which extends the iterative RAG technique for Decision QA. This technique involves planning, retrieving and analyzing data, and re-planning iteratively to make a decision.
- The paper discusses the experimental results, demonstrating the effectiveness of PlanRAG for the Decision QA task. It compares PlanRAG with the state-of-the-art iterative RAG-based LM and highlights that PlanRAG significantly outperforms the existing method for both the Locating and Building scenarios.
- The authors analyze the performance of PlanRAG-LM for single retrieval (SR) and multiple retrievals (MR) questions and demonstrate that PlanRAG-LM is more effective for both types of questions. They also compare the accuracy of LLMs for relational databases (RDB) and graph databases (GDB) in the DQA benchmark, showing that PlanRAG-LM is more effective in both scenarios.
- The paper discusses the limitations and potential future directions of the research, highlighting the focus on Decision QA using graph or relational databases and the need to explore decision-making based on other types of databases. The authors also acknowledge the limitations of the study and the potential issues related to the use of LLMs for decision-making tasks.
- The authors address potential ethical considerations related to the use of historical data from video games to construct the benchmark and simulator. They also consider potential biases and ethical usage of the content. Additionally, the paper provides acknowledgments for the support received for the research.
Summary
This paper introduces a new benchmark, Decision QA (DQA), and proposes a novel technique called iterative plan-then-retrieval augmented generation (PlanRAG) to address decision-making tasks that require complex data analysis.
Decision making is a crucial task in many business situations, involving analyzing data to select the most suitable alternative to achieve a specific goal. Traditionally, this process has involved three steps: (1) making a plan for the needed analysis, (2) retrieving necessary data using queries, and (3) making a decision based on the data. While steps 2 and 3 have been increasingly automated using decision support systems, step 1 has remained a human-driven task.
Research Goal and Definition of Decision QA Task
The goal of this research is to investigate the possibility of replacing the human role in step 1 with a Large Language Model (LLM) that can perform all three steps end-to-end. To achieve this, the authors define the Decision QA task, which takes a database D, business rules R, and a decision-making question Q as input, and generates the best decision as output.
DQA Benchmark
The paper proposes a benchmark for Decision QA, called DQA, which consists of two scenarios: Locating and Building. The Locating scenario involves questions like "Which trade node should I locate a merchant on?", while the Building scenario involves questions like "How many woods should I supply to a factory?". The DQA benchmark was constructed by extracting 301 specific situations from two video games, Europa Universalis IV and Victoria 3, which mimic real business situations.
To address Decision QA effectively, the authors propose the iterative plan-then-retrieval augmented generation (PlanRAG) technique, which extends the iterative Retrieval-Augmented Generation (RAG) approach. PlanRAG-based LMs first make a plan for the required data analysis, then retrieve the necessary data by generating and posing queries, and finally assess whether further analysis is needed, iterating the planning and retrieval steps.
Experiment Results
Experiments on the DQA benchmark show that PlanRAG-based LMs significantly outperform state-of-the-art iterative RAG-based LMs, improving accuracy by 15.8% in the Locating scenario and 7.4% in the Building scenario. The authors also provide detailed analyses on the effectiveness of PlanRAG for different types of questions and database formats.
In summary, this research defines a new challenging task, Decision QA, and proposes the PlanRAG technique to effectively address it, demonstrating the potential of LLMs as decision makers in complex, data-driven business scenarios.
Reference: https://arxiv.org/abs/2406.12430