Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    TF-IDF

    Updated: 2/10/2026

    Statistical measure for evaluating the relevance of a word in a document relative to a document collection.

    Quick Summary

    TF-IDF evaluates word relevance through frequency in document (TF) weighted by rarity in corpus (IDF) – foundation of classical search systems and BM25.

    Explanation

    TF (Term Frequency) measures word frequency in the document, IDF (Inverse Document Frequency) reduces weighting of common words. TF-IDF = TF × IDF. "Marketing" in a marketing blog has high TF but low IDF.

    Marketing Relevance

    TF-IDF is a building block for search engines, information retrieval, and classical NLP.

    Common Pitfalls

    Ignores word meaning and order. Cannot handle synonyms. Increasingly replaced by dense retrieval.

    Origin & History

    Karen Spärck Jones coined the IDF concept in 1972 at Cambridge. TF-IDF became the standard in information retrieval. BM25 (Robertson et al., 1994) improved TF-IDF with document length normalization. Despite dense retrieval, TF-IDF remains relevant in hybrid search systems.

    Comparisons & Differences

    TF-IDF vs. BM25

    BM25 is an evolution of TF-IDF with saturation function and document length normalization – the standard in Elasticsearch and Lucene.

    TF-IDF vs. Dense Retrieval

    TF-IDF uses exact word matching (sparse); dense retrieval uses semantic vectors for meaning similarity.

    Related Services

    Related Terms

    👋Questions? Chat with us!