5 more

DATAData & RAGML Calculator

📊

Cross-Validation Sample Size

Minimum sample sizes for reliable k-fold cross-validation with stratification and class imbalance. Based on Varoquaux & Colliot 2023, Kohavi 1995, and scikit-learn best practices.

Concept Fundamentals

k partitions

k-Fold CV

Cross-validation method

10 × features × classes

Rule of Thumb

Minimum sample size

Preserve class ratio

Stratified

Balanced folds

Model evaluation

Application

Generalization estimate

Sample Size for Reliable CVRule of thumb: n ≥ 10 × features × classes

Why This ML Metric Matters

Why: Proper sample size ensures stable CV estimates and meaningful confidence intervals. Too few samples lead to high variance and unreliable model selection.

How: The calculator applies the 10×f×c rule, adjusts for model complexity, enforces stratification requirements for imbalanced data, and estimates confidence interval width.

Sources:Varoquaux & Colliot 2023 — Evaluating ML ModelsKohavi 1995 — Cross-Validation Study (Stanford)

📊

DATA & RAG

Cross-Validation Sample Size Calculator

Minimum samples for reliable k-fold CV. Stratification, class imbalance. Varoquaux & Colliot 2023, Kohavi 1995, scikit-learn.

Confusion Matrix →Training Data →

📊 Quick Examples — Click to Load

Inputs

Number of Featuresf in rule n≥10×f×c

Expected Accuracye.g., 0.85 for 85%

Confidence Level

Number of Classesc in rule n≥10×f×c

CV Folds (k)k-fold cross-validation

Model Complexity

Class Imbalance Ratio1:ratio (e.g., 10 for 1:10)

cv-sample-size.sh

CALCULATED

Recommended Min Samples

1500

Rule of Thumb (10×f×c)

1000

Min per Fold

300

Stratification Min/Class

CI width @ 95%: ±1.81%

Cross-Validation Sample Size

Recommended Minimum

1500

5-fold CV|50 features|2 classes

numbervibe.com/calculators/machine-learning/cross-validation-sample-calculator

Sample Size vs Accuracy Confidence Band

Stratification Visualization (Samples per Fold)

For educational and informational purposes only. Verify with a qualified professional.

🤖 AI & ML Facts

📐

Varoquaux 2017: with 100 samples, CV error bars can be ±10% — standard error across folds underestimates true variance

— Varoquaux

🎯

Kohavi 1995: 10-fold stratified CV often outperforms leave-one-out for model selection on real datasets

— Kohavi

🧬

High-dimensional genomics (p>>n): rule of thumb breaks down; use regularization, nested CV, or holdout

— ML practice

⚖️

scikit-learn StratifiedKFold preserves class proportions in each fold — requires min samples per class

— scikit-learn

📋 Key Takeaways

• Rule of thumb: n ≥ 10 × features × classes for reliable CV (Varoquaux & Colliot 2023)
• Kohavi 1995: stratified 10–20 fold CV recommended for accuracy estimation
• Stratification requires enough samples per class in each fold — at least 2 per class per fold
• Class imbalance: minority class limits stratification; oversample or use stratified sampling
• Small samples → large error bars; Varoquaux: ±10% error bars common with n=100

💡 Did You Know

📐Varoquaux 2017: with 100 samples, CV error bars can be ±10% — standard error across folds underestimates true variance

🎯Kohavi 1995: 10-fold stratified CV often outperforms leave-one-out for model selection on real datasets

🧬High-dimensional genomics (p>>n): rule of thumb breaks down; use regularization, nested CV, or holdout

⚖️scikit-learn StratifiedKFold preserves class proportions in each fold — requires min samples per class

🎲Repeated k-fold CV (e.g., 5×5) reduces variance compared to single 5-fold — at cost of more compute

🩺Medical ML: rare disease (1:100) needs thousands of samples for minority class representation in CV

📊Confidence level 99% vs 95%: widens CI by ~30% — requires more samples for same precision

🤖Complex models (deep nets) need more samples than simple models (logistic regression) for stable CV

📖 How It Works

1. Rule of thumb (10×f×c)

Minimum samples ≈ 10 × number of features × number of classes. Ensures enough data per dimension and per class.

2. Min per fold

Each fold gets n/k samples. Test fold needs enough for meaningful metric; train fold needs enough for learning.

3. Stratification

Stratified CV preserves class proportions. Requires ≥2 samples per class per fold. With imbalance, minority class limits feasibility.

4. Confidence intervals

Accuracy is a proportion; CI width ∝ 1/√n. Higher confidence (99% vs 95%) widens the interval.

5. Model complexity

Complex models (high capacity) need more samples for stable estimates. Low=1×, medium=1.5×, high=2× multiplier.

🎯 Expert Tips

Use stratification when possible

Preserves class balance across folds. Essential for imbalanced classification.

Repeated CV for stability

5×5 or 10×10 repeated k-fold reduces variance in small-sample settings.

Rare classes need more data

1:100 imbalance: need 100× more samples than balanced case for stratification.

Nested CV for model selection

Outer loop for evaluation, inner for hyperparameters — avoids optimistic bias.

⚖️ Sample Size by Scenario

Scenario	Features	Classes	Min n (rule)	Notes
Binary, low-dim	20	2	400	Standard ML benchmark
Multi-class (10)	100	10	10,000	Image classification
Genomics p>>n	5000	2	100,000	Use regularization, holdout
Rare disease 1:20	30	2	600+	Stratification limits
NLP sentiment	500	3	15,000	Medium complexity

❓ Frequently Asked Questions

Why n ≥ 10 × features × classes?

Rule of thumb from ML practice and Varoquaux: ensures enough samples per dimension and per class for stable estimates. For high-dimensional data (p>>n), this often cannot be met — use regularization and holdout.

When to use stratification?

Always for classification when classes are imbalanced. StratifiedKFold preserves class proportions. Requires at least 2 samples per class per fold.

How many folds (k) should I use?

Kohavi 1995: 10–20 folds for accuracy estimation. 5-fold is common for model selection. More folds = less bias, more variance. Fewer folds = more train data per fold, less variance in each fold.

What if I have very few samples?

Consider leave-one-out (LOO) for regression, or repeated k-fold. Be aware: LOO has high variance. Bootstrap or nested CV can help. Report confidence intervals.

How does class imbalance affect CV?

Minority class limits stratification. Need enough minority samples for each fold. With 1:100 imbalance, you need 100× more total samples than balanced case for same per-fold minority count.

What is nested cross-validation?

Outer loop evaluates model; inner loop selects hyperparameters. Prevents optimistic bias from tuning on same data used for evaluation. Computationally expensive.

When is holdout better than CV?

Very large datasets (millions), or when compute is limited. Single train/test split is simpler. CV preferred for small/medium data.

How does model complexity affect sample size?

Complex models (deep nets, high-capacity) need more samples for stable CV estimates. Simple models (logistic regression) can work with fewer samples.

📊 Cross-Validation by the Numbers

10×f×c

Rule of Thumb

10–20

Kohavi Recommended Folds

±10%

Error Bars @ n=100

≥2

Min/Class/Fold Stratification

📚 Official Sources

Varoquaux & Colliot 2023 — Evaluating ML Models ↗

Cross-validation, confidence intervals, sample size for ML validation

Updated: 2023

Kohavi 1995 — Cross-Validation Study (Stanford) ↗

Stratified k-fold CV, 10–20 folds recommended for accuracy estimation

Updated: 1995

scikit-learn Cross-Validation ↗

K-fold, stratified split, train-test split documentation

Updated: 2024

⚠️ Disclaimer: This calculator provides estimates for educational and planning purposes. The 10×f×c rule is a heuristic; actual requirements depend on problem difficulty, model choice, and noise. For high-dimensional (p>>n) or highly imbalanced data, consult domain-specific guidelines. Always report confidence intervals and consider nested CV for model selection. Verify with scikit-learn or your ML framework.

👈 START HERE

⬅️Jump in and explore the concept!

Cross-Validation Sample Size

Why This ML Metric Matters

Cross-Validation Sample Size Calculator

📊 Quick Examples — Click to Load

Inputs

Sample Size vs Accuracy Confidence Band

Stratification Visualization (Samples per Fold)

🤖 AI & ML Facts

📋 Key Takeaways

💡 Did You Know

📖 How It Works

1. Rule of thumb (10×f×c)

2. Min per fold

3. Stratification

4. Confidence intervals

5. Model complexity

🎯 Expert Tips

Use stratification when possible

Repeated CV for stability

Rare classes need more data

Nested CV for model selection

⚖️ Sample Size by Scenario

❓ Frequently Asked Questions

Why n ≥ 10 × features × classes?

When to use stratification?

How many folds (k) should I use?

What if I have very few samples?

How does class imbalance affect CV?

What is nested cross-validation?

When is holdout better than CV?

How does model complexity affect sample size?

📊 Cross-Validation by the Numbers

📚 Official Sources

Related ML Calculators

Related Calculators

Embedding Dimension Calculator

RAG Optimizer Calculator

Training Data Size Estimator

Activation Memory Calculator

AI Fairness & Bias Calculator

Attention Head Configuration Calculator

We Value Your Privacy