Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence
    (Durchsatz)

    Throughput

    Also known as:
    Tokens per Second
    Requests per Second
    QPS
    TPS
    Updated: 2/12/2026

    The number of tokens or requests a system can process per time unit – a key measure for ML inference efficiency.

    Quick Summary

    Throughput determines cost per token. For high-volume marketing (personalization, A/B tests), throughput optimization is critical for ROI.

    Explanation

    Throughput is measured in: Tokens/second (for LLMs), requests/second, or batches/second. Increases with batch size, decreases with sequence length. Trade-off: Higher throughput often = higher latency per request.

    Marketing Relevance

    Throughput determines cost per token. For high-volume marketing (personalization, A/B tests), throughput optimization is critical for ROI.

    Example

    GPT-4 API: ~100 tokens/second per request. vLLM with LLaMA-70B: 1000+ tokens/second aggregated across batch.

    Common Pitfalls

    Throughput alone misleading – latency matters for UX. Distinguish first-token latency vs. total generation time. Note benchmark conditions.

    Origin & History

    Throughput is an established concept in the field of Artificial Intelligence. The concept has evolved alongside the growing importance of AI and data-driven methods.

    Related Services

    Related Terms

    👋Questions? Chat with us!