Cloud GPU list prices for the same model can differ several times over from one provider to another. We curated the lowest rate, provider, market range, and median for 40+ GPU configurations across all three pricing tiers, plus a throughput-per-dollar benchmark on 10 models.
Cloud GPU price per throughput
See the most cost-effective GPU for your workload across 13 hyperscaler and neocloud providers, ranked by throughput per dollar:
Cloud GPU Throughput & Prices
Updated on May 29, 2026
Verda
Verda
Verda
Verda
Amazon Web Services
Microsoft Azure
Verda
Verda
Google Cloud
Amazon Web Services
Microsoft Azure
Latitude
See cloud GPU benchmark methodology for details.
On-demand is the most straightforward pricing model where you pay for the compute capacity by the hour or second, depending on what you use with no long-term commitments or upfront payments.
These instances are recommended for users who prefer the flexibility of a cloud GPU platform without any up-front payment or long-term commitment. On-demand instances are usually more expensive than spot instances, but they provide guaranteed uninterrupted capacity.
On-demand cloud GPU prices
Ranking: Sponsors are linked and highlighted at the top of the table. The remaining rows are ranked in ascending order by lowest on-demand price. Range shows the spread between the lowest and highest list price for the same SKU across all providers. Median is the middle of the price distribution across every listing for that SKU and serves as a fair-market anchor. Prices reflect the most recent weekly catalog refresh.
On-demand is the default rental model, pay per hour, no commitment, capacity guaranteed for as long as you keep the instance running. It is the most expensive tier but the only one without trade-offs.
Spot cloud GPU prices
Ranking: Rows are ranked by lowest spot price in ascending order. Spot capacity is interruptible. Median is the middle of the spot price distribution for that SKU.
Spot capacity is interruptible; the provider can reclaim the instance with little or no warning, usually when on-demand demand spikes. Spot rates typically run 30-60% below on-demand at the same provider. Use spot for checkpointable training, batch inference, and evaluation jobs that tolerate restarts. Avoid it for latency-sensitive inference or single-replica services without failover.
Reserved cloud GPU prices (1-year)
Ranking: Rows are ranked by the lowest 1-year reserved price in ascending order. Reservations lock in capacity for the term. Median is the middle of the reserved price distribution for that SKU.
Reservations lock in capacity for a fixed term in exchange for a discount versus on-demand. One-year contracts typically run 20-40% below the same provider’s on-demand list. In a few cases, reservation rates dip below spot, because the reserving provider isolates inventory from the spot market entirely.
Cloud provider performance comparison
The same GPU model can perform slightly differently across providers because of host CPU choice, network fabric, driver configuration, and virtualization overhead. To quantify this, we ran identical text and image generation workloads on AMD MI300X 192GB at DigitalOcean and Runpod:
Key Observations:
- For text generation, Digital Ocean demonstrated a slightly higher throughput, processing approximately 0.4% more tokens per second.
- Conversely, for image generation, Runpod showed a marginal advantage, processing about 0.4% more images per second.
The gap is small enough not to matter for most workloads. For latency-critical inference or large-scale training where every percentage point compounds across millions of inferences, benchmark the specific provider configuration before committing to a long reservation.
Buy on-prem or rent in the cloud
Owning makes sense when the workload is predictable, the team has the operational know-how, and hardware utilization stays above ~70% across the useful life of the GPU. For variable demand, training spikes, or product experiments, cloud rental wins on capital efficiency and scaling flexibility. The break-even sits roughly at 12-month utilization: above 70%, reservation or owned capacity almost always beats on-demand; below 50%, spot or on-demand wins on flexibility; the middle band depends on how much capacity disruption your workload tolerates.
A practical pattern at scale: own a baseline cluster sized to steady-state demand, rent in the cloud for spikes and exploratory work. Meta announced a multi-year partnership in February 2026 to deploy up to 6 gigawatts of AMD Instinct GPUs, signaling that even hyperscaler-scale operators continue to expand owned capacity while still consuming cloud GPU for variable workloads.
Consumer GPUs (RTX 4090, RTX 5090) deliver the best price per FLOP on paper, but NVIDIA’s EULA restricts their use in commercial data centers. They remain useful for individual workstations and proof-of-concept work, not production deployment.
Cloud GPU benchmark methodology
Throughput benchmarks use 4-bit FP quantization across all tests. The pipeline runs:
- Text finetuning: Llama 3.2 on the first 5,000 conversations from FineTome, 5 epochs, 1M total tokens, Unsloth framework. Throughput = (tokens × epochs) / total time.
- Text inference: 1M tokens generated with llama-cpp-python.
- Image finetuning: YOLOv9 on 100 images from SkyFusion, 4 epochs, Unsloth.
- Image inference: Finetuned YOLOv9 on ~500 images at 640×640.
The throughput-per-dollar metric divides workload output by the instance’s hourly cost. Throughput values are workload-specific and serve as relative guidelines; the same hardware will deliver materially different throughput on your own model.
FAQs
A single GPU model name often covers multiple physical SKUs. H100 ships in PCIe, SXM, SXM5, and NVL variants at different prices and interconnect bandwidths. A100 ships at 40GB and 80GB VRAM; V100 ships at 16GB and 32GB. Within a provider, the listed rate also varies by host CPU class, bundled RAM and storage, and region. The pricing tables above split SKUs by interconnect and VRAM where the source data allows, so each row is a single physical card rather than a model-name aggregate.
The component runs a fixed workload (image or text generation, finetuning, or inference) on each GPU instance and divides the total output by the instance’s hourly cost. A higher number is cheaper per output for that workload. The ranking shifts with the workload: A card optimized for FP8 inference can outrank a higher-VRAM card on text generation but lose on a large image-model finetune. Pick the workload tab that matches your job before reading the leaderboard.
The pricing tables refresh on a monthly catalog crawl.
Further reading
- Multi-GPU Benchmark: B200 vs H200 vs H100 vs MI300X
- Top 30 Cloud GPU Providers & Their GPUs
- GPU Concurrency Benchmark
- Top 25+ AI Chip Makers: NVIDIA & Its Competitors
- Cloud GPU Rental Price Index
- DGX Spark vs Mac Studio & Halo: Benchmarks & Alternatives
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
Comments 2
Share Your Thoughts
Your email address will not be published. All fields are required.
Cem - great article, I'd love to pick your brain on private networking or direct connects to these GPU instances.
Hi Ashley, thank you! Sure, happy to chat.
Hi there, fantastic article and very well-researched. Would you mind checking out Dataoorts at https://dataoorts.com
Sure, we'll review to see if we can include Dataoorts in the next edit.