Key Points

1. The paper explores the memory capabilities of Large Language Models (LLMs), considering their impact on human activities and language-related tasks like translation, summarization, and question-answering systems.

2. LLMs demonstrate memory capabilities similar to humans, which are proposed to be explained by the Universal Approximation Theorem (UAT) and operate as "Schrödinger's memory," wherein memory becomes observable when a specific query is posed.

3. The research focuses on two main directions for enhancing LLM memory mechanisms: Expanding Context Length and External Memory Integration, but it notes that the understanding of how memory functions within LLMs is not fully explained.

4. The paper uses UAT theory to explain the ability of LLMs to recall information learned from the past based on input cues, indicating that the fitting of input to a corresponding result constitutes the observed phenomenon of memory, termed as "Schrödinger's memory."

5. Experiments are conducted to verify the memory capabilities of several models, and a new method for evaluating LLMs' memory ability is proposed.

6. The theoretical framework and mathematical form of UAT are briefly explained, and their application in Transformer-based LLMs is presented, indicating their dynamic fitting ability based on input.

7. Experiments are conducted to demonstrate the memory capabilities of LLMs, showing their ability to recall entire content based on minimal input information, aligned with the proposed definition of memory.

8. The comparison between human and LLM thinking abilities reveals that both dynamically fit outputs based on inputs, and the paper suggests that the brain operates as a dynamic model that fits inputs, similar to LLMs.

9. The research explores the advantages of dynamic fitting capability, providing the brain with infinite possibilities and creativity and discusses potential reasons for LLMs' seemingly weak reasoning skills related to model size, data quality and quantity, and model architecture.

Summary

Theoretical Framework

The paper investigates the memory capabilities of Large Language Models (LLMs) based on the Universal Approximation Theorem (UAT). It proposes that LLMs exhibit a form of "Schrödinger's memory" - their memory can only be observed when prompted by specific input, otherwise remaining indeterminate.

Defining Memory Components

The paper first provides a clear definition of memory, arguing that it consists of two key components: input and output. Memory is triggered by input, and the output can be correct, incorrect, or forgotten. This contrasts with the traditional view of memory as a static storage system. Using this definition, the paper demonstrates through experiments that LLMs do possess memory capabilities. By fine-tuning LLMs on Chinese and English poetry datasets, the models were able to accurately recall entire poems based only on partial input like the title and author. However, the models' performance was better on the English dataset, likely due to higher data quality and quantity compared to the Chinese dataset.

Memory Mechanism of LLMs

The paper then explains the memory mechanism of LLMs through the lens of UAT. It proposes that LLMs function as dynamic fitting models, able to adaptively adjust their outputs based on the input. This aligns with the concept of "Schrödinger's memory" - the model's memory is only observable when queried, otherwise remaining uncertain. Furthermore, the paper compares the memory and reasoning capabilities of LLMs and the human brain. It suggests that both operate based on a similar dynamic fitting mechanism, where inputs are used to generate corresponding outputs. The paper argues that this dynamic fitting ability gives the brain (and LLMs) infinite possibilities for creativity and innovation, in contrast with a static memory storage model.

Addressing Weaknesses

Finally, the paper discusses factors that may contribute to LLMs' apparent weaknesses in reasoning tasks, including model size, data quality and quantity, and architectural design. It proposes that larger, more specialized models with improved parallel processing capabilities may help address these limitations.

Reference: https://arxiv.org/abs/2409.10482