Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Test-Time Training (TTT)

    Also known as:
    Inference-Time Adaptation
    Dynamic Model Adaptation
    Self-Supervised Test Adaptation
    Online Adaptation
    Updated: 2/11/2026

    A paradigm where a model adapts to each new input during inference by optimizing a self-supervised loss on the test instance – "learning while predicting".

    Quick Summary

    Test-time training adapts models during inference to each input – increases robustness to domain shift without retraining.

    Explanation

    TTT uses an auxiliary self-supervised task (e.g., rotation prediction, masked token prediction) that can be computed without labels. Before each prediction, some model parameters are fine-tuned on this instance.

    Marketing Relevance

    Increases robustness to distribution shift: Marketing models can dynamically adapt to new markets, trends, or campaigns without retraining. Reduces performance drops on out-of-distribution data.

    Example

    A sentiment model trained on tech reviews is applied to fashion reviews. With TTT, it adapts to the new domain style by performing masked language modeling on each review.

    Common Pitfalls

    Increased inference latency (multiple forward/backward passes per sample). Hyperparameter tuning critical. Not all tasks are suitable for TTT. GPU resources needed at inference.

    Origin & History

    Sun et al. (2020) introduced TTT as self-supervised adaptation. TTT-Linear and TTT-MLP (2024) used TTT as a hidden layer in language models and showed linear scaling as an alternative to KV cache.

    Comparisons & Differences

    Test-Time Training (TTT) vs. Fine-Tuning

    Fine-tuning trains on a dataset before deployment; TTT adapts per input during inference – more dynamic but slower.

    Related Services

    Related Terms

    👋Questions? Chat with us!