Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Prefix Caching

    Also known as:
    Prompt Caching
    System Prompt Cache
    Shared Prefix
    Context Caching
    Updated: 2/9/2026

    Prefix caching stores KV cache computations for frequently reused prompt prefixes (e.g., system prompts) and shares them between requests.

    Quick Summary

    Prefix caching shares KV cache computations between requests with the same system prompt – saves up to 90% compute and costs for recurring prompts.

    Explanation

    When 100 requests use the same system prompt, its KV cache is computed only once and shared. Saves compute proportional to prefix length. Claude, GPT-4, and Gemini offer prompt caching as an API feature.

    Marketing Relevance

    Drastically reduces API costs and latency for repeated system prompts – especially valuable for chatbots, RAG, and agentic workflows.

    Origin & History

    vLLM implemented prefix caching in 2023 as Automatic Prefix Caching (APC). Anthropic introduced prompt caching for Claude in August 2024. Google followed with context caching for Gemini. OpenAI offered cached responses for GPT-4. 2025 prefix caching is standard across all major LLM APIs.

    Comparisons & Differences

    Prefix Caching vs. Standard KV-Cache

    Standard KV cache is isolated per request; prefix caching shares cache between requests with same prefix.

    Related Services

    Related Terms

    👋Questions? Chat with us!