Key Points

1. The paper introduces "Chain of Code (CoC)" as an extension to improve LM code-driven reasoning by encouraging LMs to format semantic sub-tasks in a program as flexible pseudocode that can be explicitly caught and handed off to emulate with an LM (termed an "LMulator").

2. CoC leverages language models at a certain scale to solve complex reasoning questions, particularly those involving numeric or symbolic reasoning, by prompting LMs to write and execute code, enabling them to reason through a mix of natural language, pseudocode, and running code.

3. The approach of CoC involves two steps: (1) Generation, where an LM generates code to reason through a problem, and (2) Execution, which attempts to execute the code via a code interpreter when possible and via an LM when not.

4. CoC's performance outperforms Chain of Thought and other baselines across a variety of benchmarks, achieving high performance on tasks requiring semantic reasoning, numeric reasoning, or a combination of both.

5. CoC is scalable and performs well with both large and small language models, demonstrating significant improvements in reasoning performance over a variety of challenging tasks.

6. The approach of CoC is applicable to a wide variety of challenging numerical and semantic reasoning questions, and its benefits include enabling code use in new regimes, leverages LMs' coding abilities, inherits the benefits of reasoning in code and intermediate steps, and scales well with model size.

7. CoC's performance has been evaluated across varied tasks, showing exceptional performance on algorithmic tasks and significant improvement over baselines and human performance on both few-shot and cross-task prompting.

8. Ablations and comparisons with instruction tuned models show that CoC performs significantly better than other prompting techniques and exhibits potential for general-purpose reasoning.

9. The paper identifies potential limitations of CoC, such as increased context length and computation time, and suggests avenues for future work, such as finetuning language models to be an LMulator and investigating integration with code to enable access to external modalities for new applications.

Summary

The paper explores a new technique called Chain of Code (CoC) to enhance the reasoning capabilities of language models when solving complex reasoning questions. CoC prompts language models to write and execute code, and also selectively simulate the interpreter by generating expected outputs of certain lines of code that cannot be executed. The paper demonstrates that CoC outperforms popular baselines and achieves high performance on challenging numerical and semantic reasoning tasks, setting a new state of the art.

The method is shown to leverage code execution and language model simulation to improve reasoning performance, and is presented as a general-purpose reasoner for a wide range of problems. The paper also discusses the limitations and future implications of the proposed technique. The research demonstrates the effectiveness of CoC in enhancing language models' ability to reason through complex problems by employing an interweaving approach of code writing, code execution, and language model simulation.

Additionally, it highlights the potential applications of CoC in various domains, such as robotics and augmented reality.

Reference: https://arxiv.org/abs/2312.04474