Embedding Dimension
Optimal embedding dimensions for LLMs, RAG, classification, and search. Balance memory vs expressiveness using Mikolov heuristic, MTEB benchmarks, and OpenAI dimensions.
Why This ML Metric Matters
Why: Choosing the right dimension affects retrieval quality, memory footprint, and inference speed. Too low loses expressiveness; too high wastes memory.
How: The calculator applies Mikolov heuristic, purpose-based ranges (LLM/RAG/classification/search), and memory budget constraints.
Balance Memory vs Expressiveness for RAG, Search & Classification
Mikolov heuristic, MTEB benchmarks, OpenAI dimensions. Plan vector stores and choose models.
📊 Quick Examples — Click to Load
Inputs
Memory vs Dimension (corpus: 100,000)
Common Model Dimensions
For educational and informational purposes only. Verify with a qualified professional.
🤖 AI & ML Facts
Mikolov 2013: embedding dim ≈ √vocab balances expressiveness and overfitting
— Word2Vec
MTEB leaderboard ranks embedding models by retrieval, clustering, reranking
— MTEB
1M vectors × 1536 dim × 4 bytes ≈ 5.7 GB for float32 vector store
— Memory calc
Lower dim = faster similarity search (cosine, dot product)
— Performance
📋 Key Takeaways
- • Heuristic: dim ≈ √vocab to 4√vocab (Mikolov Word2Vec). Larger vocab → larger dim.
- • RAG/search: 768–1536 typical. OpenAI text-embedding-3-small = 1536.
- • Classification: 256–768 often sufficient. Higher dim = more expressiveness, more memory.
- • Memory = corpus × dim × 4 bytes (float32). Plan vector store size accordingly.
- • MTEB leaderboard benchmarks models by dimension; compare before choosing.
💡 Did You Know
📖 How It Works
1. Heuristic (Mikolov)
Word2Vec: dim ≈ √vocab to 4√vocab. Larger vocabularies need more dimensions to avoid collisions.
2. Purpose-Based Ranges
LLM: 4096–16384. RAG: 768–3072. Classification: 128–768. Search: 384–1024. Based on MTEB and common models.
3. Memory Constraint
Memory = corpus × dim × 4 bytes. If budget is limited, reduce dim or corpus.
4. Task Complexity
High complexity → higher dim. Low complexity → lower dim for faster inference.
🎯 Expert Tips
Check MTEB first
Compare models on retrieval, clustering, reranking before choosing dimension.
Memory planning
Vector store: N × d × 4 bytes. Add index overhead (HNSW, IVF) for production.
RAG pipeline
Chunk size, overlap, and embedding model matter. 1536 (OpenAI) is a solid default.
Quantization
int8 halves memory; minimal accuracy loss for many use cases.
⚖️ Practical Ranges by Purpose
| Purpose | Min | Typical | Max | Examples |
|---|---|---|---|---|
| LLM | 4096 | 12288 | 16384 | Llama 3, GPT-4 |
| RAG | 768 | 1536 | 3072 | OpenAI text-embedding-3 |
| Search | 384 | 768 | 1024 | sentence-BERT, E5 |
| Classification | 128 | 384 | 768 | Lightweight classifiers |
❓ Frequently Asked Questions
What is embedding dimension?
The size of the vector representing each token, sentence, or document. Higher dim = more expressiveness but more memory and compute.
Why sqrt(vocab) heuristic?
Mikolov Word2Vec: dim ≈ √vocab balances capacity and overfitting. Too small → collisions. Too large → overfitting, waste.
RAG: 768 vs 1536?
1536 (OpenAI) often better retrieval. 768 (sentence-BERT) cheaper, faster. Benchmark on your data with MTEB-style eval.
How much memory for 1M vectors?
1M × 1536 × 4 bytes ≈ 5.7 GB float32. Half for float16. Add ~20–50% for HNSW/IVF index overhead.
LLM embedding vs hidden dim?
LLM hidden dim (e.g., 4096) is internal. Embedding table maps vocab→hidden. This calculator focuses on standalone embedding models.
When to use lower dimension?
Fast inference, limited memory, simple tasks (classification), edge deployment. Trade-off: some retrieval quality loss.
MTEB vs custom eval?
MTEB gives baselines. Always evaluate on your domain (e.g., legal, medical) for production decisions.
Quantization impact?
int8 halves memory; typically <1% retrieval quality drop. Test on your data before deploying.
📊 Embedding Dimensions by the Numbers
📚 Official Sources
⚠️ Disclaimer: This calculator provides heuristic guidance for educational and planning purposes. Optimal dimension depends on your data, task, and model. Always benchmark on your domain (MTEB-style or custom). Memory estimates assume float32; float16/int8 reduce footprint. Production vector stores have index overhead (HNSW, IVF). Consult MTEB leaderboard and model documentation for production decisions.
Related Calculators
Cross-Validation Sample Size Calculator
Calculate minimum sample sizes for reliable k-fold cross-validation with stratification and class imbalance.
Machine LearningRAG Optimizer Calculator
Calculate chunk sizes, vector store memory, and token budgets for retrieval-augmented generation pipelines.
Machine LearningTraining Data Size Estimator
How much training data do you need? Chinchilla ratios, fine-tuning guidelines, and LIMA insights for optimal data requirements.
Machine LearningActivation Memory Calculator
Estimate activation memory with and without gradient checkpointing. Based on NVIDIA selective recomputation research.
Machine LearningAI Fairness & Bias Calculator
Calculate demographic parity, equalized odds, equal opportunity, and disparate impact ratio. Based on IBM AIF360 and Microsoft Fairlearn.
Machine LearningAttention Head Configuration Calculator
Configure MHA, MQA, and GQA attention. Calculate head counts, dimensions, KV cache savings, and memory per attention type.
Machine Learning