Confusion Matrix & Classification Metrics
Compute Accuracy, Precision, Recall, F1, MCC, Specificity from TP, FP, TN, FN. scikit-learn, Chicco & Jurman 2020, Powers 2020.
Why This ML Metric Matters
Why: Proper metrics matter for imbalanced data. Accuracy can mislead; MCC and F1 are preferred for binary classification.
How: From TP, FP, FN, TN we compute precision, recall, F1, MCC, specificity, balanced accuracy.
Confusion Matrix & Classification Metrics Calculator
Compute Accuracy, Precision, Recall, F1, MCC, Specificity, and Balanced Accuracy from TP, FP, TN, FN. scikit-learn, Chicco & Jurman 2020, Powers 2020.
๐ Quick Examples โ Click to Load
Confusion Matrix Inputs
Confusion Matrix Heatmap
50
5
10
935
Green = correct (TP, TN) ยท Red = errors (FP, FN)
Precision, Recall & F1 Score
All Classification Metrics
For educational and informational purposes only. Verify with a qualified professional.
๐ค AI & ML Facts
Accuracy can be misleading with imbalanced data โ 99% may be useless if 1% positive
โ Chicco & Jurman
MCC is gold standard for imbalanced binary classification. Ranges -1 to +1.
โ Chicco 2020
F1 = harmonic mean of precision and recall. Penalizes imbalance.
โ Powers 2020
Balanced Accuracy = (Recall + Specificity)/2 โ better than raw accuracy for imbalanced
โ scikit-learn
๐ Key Takeaways
- โข Accuracy can be misleading with imbalanced data โ 99% accuracy may be useless if only 1% are positive
- โข Precision answers "Of all positive predictions, how many are correct?"
- โข Recall answers "Of all actual positives, how many did we find?"
- โข F1 Score balances precision and recall โ harmonic mean penalizes imbalance
- โข MCC (Matthews Correlation Coefficient) is the gold standard for imbalanced binary classification โ ranges -1 to +1
- โข Balanced Accuracy = (Recall + Specificity)/2 โ better than raw accuracy for imbalanced classes
๐ก Did You Know
๐ How It Works
1. The Confusion Matrix
2ร2 table of TP, FP, FN, TN. Rows = actual class, columns = predicted class.
2. Accuracy vs Balanced Accuracy
Accuracy = (TP+TN)/total. Balanced Accuracy = (Recall + Specificity)/2 โ better for imbalanced data.
3. Precision and Recall Tradeoff
Raising the threshold increases precision but lowers recall. Lowering it does the opposite.
4. F1 Score โ Harmonic Mean
F1 = 2PR/(P+R). Penalizes extreme imbalance between precision and recall.
5. MCC โ The Gold Standard Metric
MCC ranges -1 to +1. Only metric that considers all four cells and is symmetric. Use for imbalanced binary classification.
๐ฏ Expert Tips
Choose metrics by cost
If false negatives are deadly (cancer), optimize recall. If false positives are costly (spam blocking), optimize precision.
Use MCC for imbalanced data
MCC is the only metric that's reliable when classes are very different sizes.
Always check the confusion matrix
Single metrics hide important details. A model with 90% accuracy might have 0% recall on the minority class.
Threshold tuning
Classification thresholds can be adjusted to trade precision for recall โ plot the PR curve to find the optimal point.
โ๏ธ Metric Selection by Use Case
| Use Case | Primary Metric | Why |
|---|---|---|
| Medical screening | Recall | Missing a case (FN) is critical |
| Spam filter | Precision | Blocking legitimate email (FP) is costly |
| Fraud detection | F1 or MCC | Both FP and FN matter; imbalanced classes |
| Sentiment analysis | F1 | Balanced precision-recall tradeoff |
| Image classification | Accuracy or F1 | Often balanced; F1 for per-class focus |
โ Frequently Asked Questions
Why is accuracy misleading for imbalanced datasets?
When 99% of samples are negative, predicting "negative" for everything gives 99% accuracy but 0% recall on positives. Use precision, recall, F1, or MCC instead.
When should I use F1 vs MCC?
F1 balances precision and recall equally. MCC considers all four confusion matrix cells and is symmetric โ use MCC for imbalanced binary classification (Chicco & Jurman 2020).
What is a good F1 score?
F1 > 0.9 is excellent, 0.7โ0.9 is good, 0.5โ0.7 is moderate, <0.5 is poor. Context matters โ medical screening may require F1 > 0.95.
How do precision and recall trade off?
Raising the classification threshold increases precision (fewer false positives) but decreases recall (more false negatives). Lowering it does the opposite.
What is MCC and when should I use it?
Matthews Correlation Coefficient ranges -1 to +1. Use it for imbalanced binary classification โ it considers all four confusion matrix cells and is symmetric.
What is Balanced Accuracy?
Balanced Accuracy = (Recall + Specificity)/2. It approximates single-threshold AUC and is better than raw accuracy for imbalanced classes.
What is the difference between sensitivity and specificity?
Sensitivity = Recall = TP/(TP+FN) โ how well we find positives. Specificity = TN/(TN+FP) โ how well we find negatives.
Can these metrics be used for multi-class classification?
Yes. Use macro/micro/weighted averaging: macro-averaged F1 = mean of per-class F1; micro-averaged pools TP, FP, FN, TN across classes.
๐ Classification Metrics by the Numbers
๐ Official Sources
โ ๏ธ Disclaimer: This calculator provides classification metrics for educational and professional reference. For critical applications (medical diagnosis, fraud detection, autonomous systems), verify results against established ML frameworks (scikit-learn, etc.) and consult domain experts. Metrics assume binary classification; multi-class requires macro/micro averaging. ROC-AUC requires probability scores across thresholds; Balanced Accuracy approximates single-threshold performance.
Related Calculators
Batch Size & Learning Rate Calculator
Calculate optimal learning rates using linear and square root scaling rules. Visualize warmup and cosine/linear schedules.
Machine LearningNeural Network Parameter Counter
Count total parameters for neural network architectures. Supports Linear, Conv2D, Embedding, LayerNorm, and MultiHeadAttention layers.
Machine LearningActivation Memory Calculator
Estimate activation memory with and without gradient checkpointing. Based on NVIDIA selective recomputation research.
Machine LearningAI Fairness & Bias Calculator
Calculate demographic parity, equalized odds, equal opportunity, and disparate impact ratio. Based on IBM AIF360 and Microsoft Fairlearn.
Machine LearningAttention Head Configuration Calculator
Configure MHA, MQA, and GQA attention. Calculate head counts, dimensions, KV cache savings, and memory per attention type.
Machine LearningCompute-Optimal Model Size Calculator (Chinchilla)
Find the compute-optimal model size and training tokens given a compute budget using Chinchilla scaling laws.
Machine Learning