5 more

OPTIMIZATIONLLM Training & ScalingML Calculator

🧠

LoRA / QLoRA Fine-Tuning Parameters

99%+ parameter reduction — fine-tune LLMs on consumer GPUs. From Llama 70B on a single 48GB GPU to Mistral 7B in minutes. Calculate trainable params, memory savings, and adapter sizes.

Concept Fundamentals

r (4–128 typical)

LoRA Rank

Low-rank dimension

ΔW = B × A

Update Rule

Low-rank decomposition

α/r factor

Scaling

Learning rate adjustment

Hu et al. 2021

Paper

Parameter-efficient fine-tuning

CalculateUse the calculator below to run neural computations

Why This ML Metric Matters

Why: LoRA enables fine-tuning of large language models on consumer hardware by training only low-rank adapter matrices.

How: LoRA adds ΔW = B×A where A is d×r and B is r×d. Total trainable = L × m × 2dr. QLoRA quantizes the base to 4-bit NF4.

●99%+ param reduction vs full fine-tune
●r=16 common default for 7B
●α=2r rule of thumb
●QLoRA enables 70B on 48GB

Sources:Hu et al. 2021 - LoRADettmers et al. 2023 - QLoRA

📋 Quick Examples — Click to Load

Base Model Params (B)e.g., 7 for 7B

Num Layers

Hidden Dim

LoRA Rank (r)

LoRA Alpha (α)

Quantization

Dropout

lora_analysis.shCALCULATED

Trainable

16.78M

Trainable %

0.24%

Memory Savings

79.8%

LoRA Memory

14.17 GB

📊 Trainable vs Frozen Params

📊 Memory Comparison (GB)

📊 Trainable Params vs Rank

📊 Alpha Scaling Factor

For educational and informational purposes only. Verify with a qualified professional.

🤖 AI & ML Facts

🏢

Microsoft invented LoRA (Hu et al. 2021) — now used by virtually every LLM fine-tuning pipeline

— Hu et al.

📊

QLoRA achieves 99.3% of ChatGPT quality on Vicuna with 4-bit base + LoRA adapters

— Dettmers et al.

🦙

Guanaco 65B was fine-tuned on a single 48GB A6000 using QLoRA

— QLoRA paper

📦

HuggingFace PEFT has 10M+ downloads — LoRA is the default for parameter-efficient tuning

— PEFT

LoRA (Low-Rank Adaptation) adds trainable adapter matrices instead of updating the full model. The formula is 2×d×r per module per layer. QLoRA quantizes the base model to 4-bit (NF4), enabling 70B models on a single 48GB GPU. LoRA typically reduces trainable parameters by 99%+ vs full fine-tuning.

99%+

Param Reduction

r=16

Common Default

48GB

QLoRA 65B

2021

LoRA Published

Key Takeaways

• LoRA reduces trainable parameters by 99%+ vs full fine-tuning
• QLoRA adds 4-bit quantization to freeze the base model, enabling 70B on a single 48GB GPU
• Rank choice matters: r=8–64 typical; higher rank = more capacity but more memory
• Alpha/rank ratio (effective α) controls scaling; α=2r is a common rule of thumb

Did You Know?

🔢 Microsoft invented LoRA (Hu et al. 2021) — now used by virtually every LLM fine-tuning pipeline

📊 QLoRA achieves 99.3% of ChatGPT quality on Vicuna with 4-bit base + LoRA adapters

🦙 Guanaco 65B was fine-tuned on a single 48GB A6000 using QLoRA

📦 HuggingFace PEFT has 10M+ downloads — LoRA is the default for parameter-efficient tuning

☁️ Predibase offers LoRA-as-a-service: train adapters without managing GPUs

📐 Typical rank: r=4–64; r=16 is a common default for 7B models

How Does LoRA Work?

1. Low-Rank Decomposition

Instead of updating W directly, LoRA adds ΔW = B×A where A is d×r and B is r×d. Only 2dr parameters per layer vs d².

2. A and B Matrices

A is initialized randomly, B with zeros so training starts from the pretrained model. Output scaled by α/r.

3. QLoRA 4-bit Base

QLoRA quantizes the base model to 4-bit (NF4), keeping it frozen. Only LoRA adapters are FP16. Enables 70B on 48GB.

Expert Tips

Start with r=16 — good default for 7B models. Increase to 32–64 if underfitting.

Alpha = 2×rank rule of thumb — α=2r often works well. Tune if loss plateaus or overfits.

Target all linear layers: q/k/v/o + gate/up for best results. Fewer = faster but less capacity.

Merge adapters for inference — merge LoRA into base weights for zero-overhead deployment.

Comparison Table

Feature	This Calculator	PEFT Config	Manual
LoRA param formula	✅	❌	⚠️
QLoRA memory estimate	✅	❌	❌
Example presets	✅	❌	❌
Charts & visualizations	✅	❌	❌

Frequently Asked Questions

LoRA vs full fine-tuning?

LoRA trains only low-rank adapter matrices (typically <1% of params) while freezing the base model. Full fine-tuning updates all parameters. LoRA saves 99%+ memory and often matches full fine-tune quality.

How do I choose rank (r)?

Start with r=16 for 7B models. Use r=8 for smaller models or limited data; r=32–64 for complex tasks or larger models. Higher rank = more capacity but more memory.

What is alpha (α)?

Alpha scales the LoRA output. Effective scaling = α/r. α=2r is a common rule of thumb. When α=r, scaling is 1:1.

What is QLoRA NF4?

QLoRA uses 4-bit NormalFloat (NF4) quantization for the base model. The base stays frozen; only LoRA adapters are trained in FP16. Enables 70B on a single 48GB GPU.

Which target modules to use?

For best quality: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj. For speed: q_proj and v_proj only. Attention layers (q/k/v/o) are most impactful.

Can I merge LoRA adapters?

Yes. Merge adapters into base weights for zero-overhead inference. Use model.merge_and_unload() in PEFT.

Key Statistics

99%+

Param Reduction

r=16

Common Default

48GB

QLoRA 65B

2021

LoRA Published

Official Data Sources

Hu et al. 2021 - LoRA ↗

Dettmers et al. 2023 - QLoRA ↗

HuggingFace PEFT ↗

AdapterHub ↗

⚠️ Disclaimer: This calculator provides estimates for educational and planning purposes. Actual memory usage depends on implementation (PEFT, bitsandbytes), batch size, sequence length, and optimizer choice. For production, validate with your specific framework and hardware.

👈 START HERE

⬅️Jump in and explore the concept!

LoRA / QLoRA Fine-Tuning Parameters

Why This ML Metric Matters

📋 Quick Examples — Click to Load

📊 Trainable vs Frozen Params

📊 Memory Comparison (GB)

📊 Trainable Params vs Rank

📊 Alpha Scaling Factor

🤖 AI & ML Facts

Key Takeaways

Did You Know?

How Does LoRA Work?

1. Low-Rank Decomposition

2. A and B Matrices

3. QLoRA 4-bit Base

Expert Tips

Comparison Table

Frequently Asked Questions

LoRA vs full fine-tuning?

How do I choose rank (r)?

What is alpha (α)?

What is QLoRA NF4?

Which target modules to use?

Can I merge LoRA adapters?

Key Statistics

Official Data Sources

Related Calculators

Compute-Optimal Model Size Calculator (Chinchilla)

GPU VRAM / Memory Requirements Calculator

Token Count & LLM API Cost Calculator

LLM Training Cost Estimator

Neural Network FLOPs Calculator

Activation Memory Calculator

We Value Your Privacy

LoRA / QLoRA Fine-Tuning Parameters

Why This ML Metric Matters

📋 Quick Examples — Click to Load

📊 Trainable vs Frozen Params

📊 Memory Comparison (GB)

📊 Trainable Params vs Rank

📊 Alpha Scaling Factor

🤖 AI & ML Facts

Key Takeaways

Did You Know?

How Does LoRA Work?

1. Low-Rank Decomposition

2. A and B Matrices

3. QLoRA 4-bit Base

Expert Tips

Comparison Table

Frequently Asked Questions

LoRA vs full fine-tuning?

How do I choose rank (r)?

What is alpha (α)?

What is QLoRA NF4?

Which target modules to use?

Can I merge LoRA adapters?

Key Statistics

Official Data Sources

Related ML Calculators

Related Calculators

Compute-Optimal Model Size Calculator (Chinchilla)

GPU VRAM / Memory Requirements Calculator

Token Count & LLM API Cost Calculator

LLM Training Cost Estimator

Neural Network FLOPs Calculator

Activation Memory Calculator

We Value Your Privacy