GPU Comparison
Select two GPUs to see a detailed side-by-side comparison of ML-relevant specs.
"Can It Run?" ML Model Calculator
Select an AI model to see which GPUs have enough VRAM to run it at different quantization levels.
Price/Performance Rankings
Sort GPUs by performance-per-dollar across different ML-relevant metrics.
All GPUs
Browse the complete GPU database. Filter by VRAM, price, and type.
How to Use the GPU Benchmark Comparator
This tool helps machine learning engineers, data scientists, and hobbyists choose the right GPU for their AI workloads. Instead of manually comparing spec sheets across dozens of product pages, you can evaluate GPUs side by side with metrics that actually matter for ML -- VRAM, FP16 throughput, memory bandwidth, and price-to-performance ratios.
Using the Comparison Tool
Select two GPUs from the dropdowns in the Compare section. The tool displays a side-by-side spec table highlighting which card wins each metric. Green highlighting indicates the superior value in each row. This is useful when you are deciding between two specific cards -- for example, whether to buy an RTX 4090 or save up for an A6000.
The "Can It Run?" Calculator
Select a popular AI model (Llama, Mistral, Stable Diffusion, etc.) from the dropdown. The calculator shows VRAM requirements at different quantization levels (FP16, INT8, INT4) and flags which GPUs in the database can handle each configuration. This answers the most common question in ML hardware planning: will this model fit on my GPU?
Reading GPU Specs for ML
VRAM is the single most important spec because it determines the largest model you can load. FP16 TFLOPS measures raw compute throughput for training and inference at half precision, which is the standard for modern deep learning. Memory bandwidth matters most for inference workloads, especially large language model text generation, where the GPU spends most of its time reading model weights from memory rather than computing.
Gaming vs. AI/ML Workloads
Gaming GPUs prioritize rasterization performance and display output, while ML workloads care about tensor core throughput, VRAM capacity, and memory bandwidth. A card like the RTX 4090 excels at both, but datacenter GPUs like the A100 or H100 trade gaming features for double or quadruple the VRAM and much faster interconnects for multi-GPU training.
Frequently Asked Questions
What does VRAM quantization mean?
Quantization reduces the precision of model weights from 16-bit floats (FP16) to 8-bit integers (INT8) or 4-bit integers (INT4). This cuts VRAM usage by 2x or 4x respectively, with a small accuracy trade-off. It is the most practical way to run large models on consumer GPUs.
Is Apple Silicon competitive for ML?
Apple Silicon's unified memory architecture lets you load very large models (up to 192 GB on the M2 Ultra) that would be impossible on consumer NVIDIA GPUs. However, raw compute throughput for training is significantly lower. Apple Silicon is best for local inference and prototyping, not production training.
Should I buy one expensive GPU or two cheaper ones?
For most hobbyists, one powerful GPU is better. Multi-GPU training requires NVLink or high-bandwidth interconnects, and not all frameworks handle multi-GPU setups cleanly. Two RTX 3090s often perform worse than a single RTX 4090 for training due to inter-GPU communication overhead. The exception is if you need more total VRAM for a model that cannot be quantized.
Where do the prices come from?
Prices reflect approximate US retail or list pricing at the time the database was last updated. GPU prices fluctuate frequently, especially on the used market. Use the prices here as a relative guide for comparing value, and check current retailer pricing before purchasing.
What is price-to-performance and how do I use it?
The Price/Performance Rankings section divides a performance metric (like FP16 TFLOPS) by the GPU's price in thousands of dollars. A higher number means you get more compute per dollar. This ranking helps budget-conscious buyers identify the best value cards rather than simply the fastest ones.
GPU Specs That Matter for Machine Learning
VRAM (Video Memory)
VRAM is often the most critical spec for ML workloads. It determines the maximum model size you can load and the batch size you can use during training. Large language models like Llama 2 70B require 140 GB of VRAM at FP16 precision -- far more than any single consumer GPU offers. Quantization techniques (INT8, INT4) reduce VRAM requirements significantly, enabling smaller GPUs to run larger models at a slight accuracy trade-off.
FP16 and FP32 Performance (TFLOPS)
FP16 (half-precision) performance is the most relevant metric for modern ML training and inference. Most deep learning frameworks use mixed-precision training by default, leveraging FP16 Tensor Cores on NVIDIA GPUs for massive speed gains. FP32 (single-precision) still matters for certain scientific computing workloads and training stability. Datacenter GPUs like the H100 offer dramatically higher FP16 throughput compared to consumer cards.
Memory Bandwidth
Memory bandwidth determines how fast data can be read from and written to VRAM. For inference workloads -- especially autoregressive text generation with LLMs -- memory bandwidth is often the bottleneck, not compute. A GPU with high bandwidth will generate tokens faster. The NVIDIA H100 leads with 3,350 GB/s, while consumer cards like the RTX 4090 offer around 1,008 GB/s.
TDP (Thermal Design Power)
TDP indicates the maximum power the GPU will draw under load. For home setups, lower TDP means less heat and quieter operation. For data centers, TDP directly impacts electricity costs and cooling requirements. Apple Silicon chips stand out here with very low TDP relative to performance, though they trade raw compute power for energy efficiency.
Consumer vs. Datacenter GPUs
Consumer GPUs (like the RTX 4090) offer excellent price-to-performance for ML hobbyists and small-scale experiments. Datacenter GPUs (like the A100 and H100) provide far more VRAM, higher memory bandwidth, support for multi-GPU interconnects (NVLink), and ECC memory for reliability. Apple Silicon provides a unique option with unified memory architecture, allowing very large models to fit in memory at the cost of lower raw throughput.
Choosing the Right GPU for Your Workload
- Fine-tuning 7B parameter models: RTX 4090 (24 GB VRAM) or RTX 3090 for budget builds
- Running inference on 13B+ models: A6000 or RTX 6000 Ada (48 GB) for INT8; A100 80GB for FP16
- Stable Diffusion / image generation: RTX 4070 or above (12+ GB VRAM is comfortable)
- Training large models: H100 or multi-GPU A100 setups
- Quiet home lab with large models: Apple M2 Ultra or M4 Max with unified memory