Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Pre-Training

    Also known as:
    Foundation Training
    Base Model Training
    Unsupervised Pre-Training
    Initial Training
    Updated: 2/8/2026

    The first training phase of an LLM where the model learns to understand and generate language from massive amounts of text (often trillions of tokens) – before specialized fine-tuning follows.

    Quick Summary

    Pre-Training is the initial training of LLMs on trillions of tokens that builds world knowledge and language understanding – the most expensive and important phase.

    Explanation

    Pre-training uses self-supervised learning: The model learns to predict the next token (GPT-style) or reconstruct masked tokens (BERT-style). This creates a "foundation model" with broad world knowledge that can be adapted for many tasks.

    Marketing Relevance

    Pre-training explains why LLMs know so much: They have "read" the internet. Important for marketing: Model cutoff dates (knowledge only up to training time), and why fine-tuning on own data is often necessary.

    Example

    LLaMA 3 was pre-trained on 15 trillion tokens – equivalent to about 150 million books. This pre-training cost an estimated $100+ million in compute. The resulting base model can then be fine-tuned for specific tasks.

    Common Pitfalls

    Extremely expensive and resource-intensive. Quality depends on training data. Bias in data is learned. Cutoff date limits current knowledge.

    Origin & History

    Pre-training was established through Word2Vec (Mikolov 2013), then ELMo (2018), and BERT (Google 2018). GPT-3 (2020) showed that massive pre-training unlocks emergent capabilities.

    Comparisons & Differences

    Pre-Training vs. Fine-Tuning

    Pre-Training builds general knowledge (trillions of tokens); Fine-Tuning specializes for tasks (thousands of examples).

    Pre-Training vs. Continual Pre-Training

    Standard Pre-Training is one-time; Continual Pre-Training updates models with new data without full retraining.

    Related Services

    Related Terms

    👋Questions? Chat with us!