Key Points
1. Alignment of code capabilities with user objectives.
2. Evaluation of user feedback and code effectiveness.
3. Formulation of a reasoning process for code analysis and evaluation.
4. Assessment of task completion.
5. Evaluation of the code's generality with a scoring system.
6. Output format specification, including a JSON with reasoning process, task completion judgment, and code generality score.
Summary
OS-Copilot Framework and FRIDAY Agent
The paper introduces OS-Copilot, a framework for building generalist computer agents capable of interacting with various elements in an operating system. Leveraging OS-Copilot, the authors developed FRIDAY, an embodied agent for automating general computer tasks. FRIDAY demonstrated strong generalization to unseen applications via accumulated skills from previous tasks. The paper provides running examples of FRIDAY when deployed on MacOS, including preparing a focused working environment, calculating and drawing a chart in Excel, and creating a website for OS-Copilot. FRIDAY is able to learn to control and self-improve on applications such as Excel and PowerPoint with minimal supervision. The paper also highlights the infrastructure and insights for future research on more capable and general-purpose computer agents. Additionally, the authors presented a detailed description of the FRIDAY system, including its core components such as the planner, tool generator, refiner, and executor, along with their respective prompts.
Potential Use Cases and Limitations of OS-Copilot and FRIDAY
The paper also discusses potential use cases of OS-Copilot within the operating system, ranging from data processing and analysis to controlling applications using Python libraries and external services, desktop applications control, and user behavior simulation. The authors also acknowledge the limitations of OS-Copilot and FRIDAY, particularly their reliance on prompt engineering and their incapacity when confronted with closed-source applications, and discuss intriguing future research topics such as safety, interpretability, multimodality, and evaluation challenges in the context of general computer agents.
FRIDAY's Prompts and Capabilities
Furthermore, the paper presents numerous prompts for FRIDAY's core components, including the tool generator, refiner, executor, and detailing relevant information for each prompt. The paper also includes figures and tables displaying visual representations and numerical results to support the findings and capabilities of FRIDAY, such as its performance on GAIA, a benchmark for general AI assistants, and its self-directed learning ability in tasks such as spreadsheet manipulation and PowerPoint slide creation.
Overall, the paper provides a comprehensive overview of the development and capabilities of FRIDAY as an OS-level language agent and provides valuable insights for future research in this area.
Reference: https://arxiv.org/abs/2402.07456