FastText
Facebook's open-source library for efficient text classification and word embeddings with sub-word information.
FastText generates word embeddings with character n-grams – can represent OOV words and typos, ideal for multilingual text classification.
Explanation
FastText extends Word2Vec with character n-grams: The word "playing" is represented as the sum of "pla", "lay", "ayi", "yin", "ing". This enables meaningful vectorization of OOV words and typos.
Marketing Relevance
FastText is ideal for text classification and embeddings in resource-constrained environments with many languages.
Common Pitfalls
Static embeddings (no context). Larger memory footprint than Word2Vec. Superseded by transformer models for modern NLP.
Origin & History
Facebook AI Research (FAIR) released FastText in 2016 (Bojanowski et al.). Pre-trained vectors for 157 languages followed in 2018. FastText remains relevant for lightweight classification but was superseded by BERT/Sentence Transformers for embeddings.
Comparisons & Differences
FastText vs. Word2Vec
Word2Vec operates at word level; FastText uses character n-grams and can represent OOV words.
FastText vs. Sentence Transformers
FastText creates static word vectors; Sentence Transformers create contextual sentence embeddings with transformer architecture.