Key Points

1. The paper discusses how large language models (LLMs) are commonly made available only via an API, which led the authors to explore the vulnerabilities and capabilities of API-protected LLMs.

2. The authors propose a method to extract detailed information about LLMs, including their embedding size, hidden prompts, model updates, and output layer parameters, from a limited number of API queries, particularly focusing on the low-rank constraints imposed by the softmax bottleneck on LLM outputs.

3. They present several practical applications of their method, such as finding the embedding size of an LLM, detecting and distinguishing model updates, and recovering hidden prompts. They also discuss the potential for monitoring model changes over time.

4. The paper highlights the effectiveness of their method in estimating the parameters of well-known API-protected LLMs, such as OpenAI’s gpt-3.5-turbo, and outlines the empirical verification of their approach in diverse scenarios.

5. The authors discuss the implications of their findings for LLM providers, including the inability to easily mitigate the vulnerabilities without significant architectural changes, such as removing API access to logprobs or logit bias, and propose alternative LLM architectures that do not suffer from a softmax bottleneck.

6. They also predict potential reactions and impacts on LLM providers and their clients, noting that the method does not pose significant harm while offering enhanced accountability and transparency for the providers.

7. Additionally, the paper discusses the remarkable case of simultaneous discovery by another group, highlighting the complementary nature of the findings and methods proposed by the two research groups.

8. The authors acknowledge the insights and feedback from collaborators and other researchers, and express their gratitude for their contributions to the work.

9. The paper concludes by emphasizing that the observed vulnerabilities in API-protected LLMs may be viewed as a feature rather than a bug, offering benefits to API clients while cautioning LLM providers to consider the consequences of their architectures and APIs.

Summary

The research paper explores the potential to extract non-public information about API-protected large language models (LLMs) through a small number of API queries. One key finding is that most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a low-dimensional subspace of the full output space. This fact is exploited to unlock several capabilities, including efficiently discovering the LLM’s hidden size, obtaining cheap full-vocabulary outputs, detecting and disambiguating different model updates, and estimating the output layer parameters. The paper presents methods for estimating the embedding size of OpenAI's gpt-3.5-turbo and mechanisms for LLM providers to guard against such attacks.

Implications for Transparency and Accountability
The implications of these capabilities for transparency and accountability in LLMs are also discussed. For instance, the ability to identify which LLM produced a given output can enable greater accountability and trust between LLM providers and their customers. The paper also discusses the potential applications of the methods, including detecting and distinguishing minor and major model updates, recovering the softmax matrix from outputs, and efficiently obtaining full outputs from the model.

Impact on Trust and Accountability
In the paper, it is also acknowledged that the potential applications of these methods can have a significant impact on building trust between LLM API users and providers, leading to greater accountability and transparency for the providers. However, the paper also addresses potential concerns and discusses ways in which LLM providers can guard against these attacks.

In summary, the paper presents a comprehensive exploration of the capabilities and implications of extracting non-public information about API-protected LLMs, and discusses both the potential applications and the concerns associated with these capabilities.

Reference: https://arxiv.org/abs/2403.09539