Benchmark

Cloud GPU Pricing, Performance & Provider Comparison

with

updated on Jun 17, 2026

Cloud GPU list prices for the same model can differ several times over from one provider to another. We curated the lowest rate, provider, market range, and median for 40+ GPU configurations across all three pricing tiers, plus a throughput-per-dollar benchmark on 10 models.

Cloud GPU price per throughput

See the most cost-effective GPU for your workload across 13 hyperscaler and neocloud providers, ranked by throughput per dollar:

See cloud GPU benchmark methodology for details.

On-demand is the most straightforward pricing model where you pay for the compute capacity by the hour or second, depending on what you use with no long-term commitments or upfront payments.

These instances are recommended for users who prefer the flexibility of a cloud GPU platform without any up-front payment or long-term commitment. On-demand instances are usually more expensive than spot instances, but they provide guaranteed uninterrupted capacity.

On-demand cloud GPU prices

Provider	GPU	Lowest $/hr	Range ($/hr)	Median ($/hr)
IONOS	RTX PRO 6000 96GB Blackwell	$1.79	$0.93 – $17.27	$1.54
IONOS	Tesla T4 16GB	$0.94	$0.20 – $4.35	$1.03
IONOS	A10 24GB	$1.10	$1.00 – $4.52	$1.60
IONOS	Intel Flex 170 16GB	$1.20	NA	NA
Vast.ai	RTX 3090 24GB	$0.08	$0.08 – $0.49	$0.22
Vast.ai	RTX A4000 16GB	$0.09	$0.09 – $0.88	$0.17
Vast.ai	V100 16GB	$0.10	$0.10 – $4.22	$2.57
Salad	RTX A5000 24GB	$0.11	$0.11 – $1.56	$0.44
Vast.ai	P40 24GB	$0.13	$0.13	$0.13
Salad	RTX 4080 16GB	$0.13	$0.13 – $0.28	$0.23

Ranking: Sponsors are linked and highlighted at the top of the table. The remaining rows are ranked in ascending order by lowest on-demand price. Range shows the spread between the lowest and highest list price for the same SKU across all providers. Median is the middle of the price distribution across every listing for that SKU and serves as a fair-market anchor. Prices reflect the most recent weekly catalog refresh.

On-demand is the default rental model, pay per hour, no commitment, capacity guaranteed for as long as you keep the instance running. It is the most expensive tier but the only one without trade-offs.

Spot cloud GPU prices

Provider	GPU	Lowest $/hr	Range ($/hr)	Median ($/hr)
IONOS	RTX PRO 6000 96GB Blackwell	$1.79	$0.15 – $9.14	$0.94
Vast.ai	RTX A5000 24GB	$0.05	$0.05 – $0.18	$0.05
Vast.ai	RTX A4000 16GB	$0.05	$0.05 – $0.09	$0.07
Vast.ai	RTX 3090 24GB	$0.05	$0.05 – $0.21	$0.08
Verda	V100 16GB	$0.06	$0.06 – $0.47	$0.32
Vast.ai	P40 24GB	$0.07	$0.07	$0.07
Vast.ai	RTX 5090 32GB	$0.09	$0.09 – $0.37	$0.18
Vast.ai	RTX 5080 16GB	$0.10	$0.10 – $0.20	$0.11
Microsoft Azure	T4 16GB	$0.10	$0.10 – $0.70	$0.29
Vast.ai	A100 SXM 40GB	$0.14	$0.14 – $0.60	$0.45

Ranking: Rows are ranked by the lowest spot price in ascending order. Spot capacity is interruptible. Median is the middle of the spot price distribution for that SKU.

Spot capacity is interruptible; the provider can reclaim the instance with little or no warning, usually when on-demand spikes. Spot rates typically run 35-70% below on-demand at the same provider. Use spot for checkpointable training, batch inference, and evaluation jobs that tolerate restarts. Avoid it for latency-sensitive inference or single-replica services without failover.

Reserved cloud GPU prices (1-year)

Provider	GPU	Lowest $/hr	Range ($/hr)	Median ($/hr)
IONOS	RTX PRO 6000 96GB Blackwell	$1.79	$0.42 – $6.97	$1.14
Vast.ai	RTX A4000 16GB	$0.07	$0.07 – $0.39	$0.28
Vast.ai	RTX 3090 24GB	$0.11	$0.11 – $0.39	$0.23
Database Mart	K80 24GB	$0.13	$0.13	$0.13
Verda	V100 16GB	$0.13	$0.13 – $3.64	$1.51
Vast.ai	RTX 5080 16GB	$0.16	$0.16	$0.16
Vast.ai	RTX 4080 16GB	$0.17	$0.17 – $0.20	$0.18
Runpod	RTX 4000 Ada 20GB	$0.20	$0.20 – $0.22	$0.21
Google Cloud	T4 16GB	$0.21	$0.21 – $2.74	$0.53
Vast.ai	RTX 5090 32GB	$0.22	$0.22 – $0.93	$0.50

Ranking: Rows are ranked by the lowest 1-year reserved price in ascending order. Reservations lock in capacity for the term. Median is the middle of the reserved price distribution for that SKU.

Reservations lock in capacity for a fixed term in exchange for a discount versus on-demand. One-year contracts typically run 19-57% below the same provider’s on-demand list. In a few cases, reservation rates dip below spot because the reserving provider isolates inventory from the spot market entirely.

Get our team to automate one of your business processes with AI agents, free of charge.

Automate a process

Cloud provider performance comparison

The same GPU model can perform slightly differently across providers because of host CPU choice, network fabric, driver configuration, and virtualization overhead. To quantify this, we ran identical text and image generation workloads on AMD MI300X 192GB at DigitalOcean and Runpod:

Key Observations:

For text generation, Digital Ocean demonstrated a slightly higher throughput, processing approximately 0.4% more tokens per second.
Conversely, for image generation, Runpod showed a marginal advantage, processing about 0.4% more images per second.

The gap is small enough not to matter for most workloads. For latency-critical inference or large-scale training where every percentage point compounds across millions of inferences, benchmark the specific provider configuration before committing to a long reservation.

Buy on-prem or rent in the cloud

Owning makes sense when the workload is predictable, the team has the operational know-how, and hardware utilization stays above ~70% across the useful life of the GPU. For variable demand, training spikes, or product experiments, cloud rental wins on capital efficiency and scaling flexibility. The break-even sits roughly at 12-month utilization: above 70%, reservation or owned capacity almost always beats on-demand; below 50%, spot or on-demand wins on flexibility; the middle band depends on how much capacity disruption your workload tolerates.

A practical pattern at scale: own a baseline cluster sized to steady-state demand, rent in the cloud for spikes and exploratory work. Meta announced a multi-year partnership in February 2026 to deploy up to 6 gigawatts of AMD Instinct GPUs, signaling that even hyperscaler-scale operators continue to expand owned capacity while still consuming cloud GPU for variable workloads.

Consumer GPUs (RTX 4090, RTX 5090) deliver the best price per FLOP on paper, but NVIDIA’s EULA restricts their use in commercial data centers. They remain useful for individual workstations and proof-of-concept work, not production deployment.

Don’t miss our benchmarks and data-driven insights. The button opens Google; selecting AIMultiple confirms that you wish to see AIMultiple more often in Google search results.

Add as preferred source

Cloud GPU benchmark methodology

Throughput benchmarks use 4-bit FP quantization across all tests. The pipeline runs:

Text finetuning: Llama 3.2 on the first 5,000 conversations from FineTome, 5 epochs, 1M total tokens, Unsloth framework. Throughput = (tokens × epochs) / total time.
Text inference: 1M tokens generated with llama-cpp-python.
Image finetuning: YOLOv9 on 100 images from SkyFusion, 4 epochs, Unsloth.
Image inference: Finetuned YOLOv9 on ~500 images at 640×640.

The throughput-per-dollar metric divides workload output by the instance’s hourly cost. Throughput values are workload-specific and serve as relative guidelines; the same hardware will deliver materially different throughput on your own model.

FAQs

A single GPU model name often covers multiple physical SKUs. H100 ships in PCIe, SXM, SXM5, and NVL variants at different prices and interconnect bandwidths. A100 ships at 40GB and 80GB VRAM; V100 ships at 16GB and 32GB. Within a provider, the listed rate also varies by host CPU class, bundled RAM and storage, and region. The pricing tables above split SKUs by interconnect and VRAM where the source data allows, so each row is a single physical card rather than a model-name aggregate.

The component runs a fixed workload (image or text generation, finetuning, or inference) on each GPU instance and divides the total output by the instance’s hourly cost. A higher number is cheaper per output for that workload. The ranking shifts with the workload: A card optimized for FP8 inference can outrank a higher-VRAM card on text generation but lose on a large image-model finetune. Pick the workload tab that matches your job before reading the leaderboard.

The pricing tables refresh on a monthly catalog crawl.

Cite this research

Pick the format that matches where you're publishing. Pasting the link version into your CMS preserves the backlink.

Cem Dilmegani and Ekrem Sarı (2026) - "Cloud GPU Pricing, Performance & Provider Comparison". Published online at AIMultiple.com. Retrieved June 17, 2026, from: https://aimultiple.com/cloud-gpu-pricing [Online Resource]

Dilmegani, C., & Sarı, E. (2026, June 17). Cloud GPU Pricing, Performance & Provider Comparison. AIMultiple. https://aimultiple.com/cloud-gpu-pricing

@misc{dilmegani2026,
  author = {Dilmegani, Cem and Sarı, Ekrem},
  title  = {{Cloud GPU Pricing, Performance & Provider Comparison}},
  year   = {2026},
  month  = jun,
  howpublished    = {\url{https://aimultiple.com/cloud-gpu-pricing}},
  note   = {AIMultiple. Retrieved June 17, 2026}
}

Cem Dilmegani

Principal Analyst

Follow On

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

View Full Profile

Technically reviewed by

Ekrem Sarı

AI Researcher

Follow On

Ekrem is an AI Researcher and Data Analyst at AIMultiple. He designs and runs hands-on benchmarks for AI and LLM systems.

View Full Profile

Comments 2

Share Your Thoughts

Your email address will not be published. All fields are required. Comments are left in their original language.

Ashley Jenkinson

Oct 31, 2024 at 08:54

Cem - great article, I'd love to pick your brain on private networking or direct connects to these GPU instances.

Cem Dilmegani

Nov 10, 2024 at 06:58

Hi Ashley, thank you! Sure, happy to chat.

Harsh Sharma

Oct 06, 2024 at 02:19

Hi there, fantastic article and very well-researched. Would you mind checking out Dataoorts at https://dataoorts.com

Cem Dilmegani

Oct 22, 2024 at 03:18

Sure, we'll review to see if we can include Dataoorts in the next edit.

Cloud GPU price per throughput

On-demand cloud GPU prices

Spot cloud GPU prices

Reserved cloud GPU prices (1-year)

Cloud provider performance comparison

Buy on-prem or rent in the cloud

Cloud GPU benchmark methodology

FAQs

Next to Read

Backup & Recovery

Open World Evaluation

Jul 20

Cloud GPU Pricing, Performance & Provider Comparison

Cloud GPU price per throughput

Cloud GPU Throughput & Prices

Vast AI

Vast AI

IONOS

Vast AI

Verda

Verda

Verda

Verda

Vast AI

Vast AI

Vast AI

Vast AI

On-demand cloud GPU prices

Spot cloud GPU prices

Reserved cloud GPU prices (1-year)

Cloud provider performance comparison

Buy on-prem or rent in the cloud

Cloud GPU benchmark methodology

FAQs

Further reading

Cite this research

Comments 2

Share Your Thoughts

Next to Read

Disaster Recovery Benchmark: Acronis vs Comet vs MSP360

Graph Database Benchmark: Neo4j vs FalkorDB vs Memgraph

LLM Inference Engines: vLLM vs LMDeploy vs SGLang

Top 3 Synthetic Document Generators Benchmarked

GPU Concurrency Benchmark: H100 vs H200 vs B200 vs MI300X

Benchmark of 40+ LLMs in Finance: Claude Fable 5 & GPT-5.6 Sol

Cloud GPU Pricing, Performance & Provider Comparison

Cloud GPU price per throughput

Cloud GPU Throughput & Prices

Vast AI

Vast AI

IONOS

Vast AI

Verda

Verda

Verda

Verda

Vast AI

Vast AI

Vast AI

Vast AI

On-demand cloud GPU prices

Spot cloud GPU prices

Reserved cloud GPU prices (1-year)

Cloud provider performance comparison

Buy on-prem or rent in the cloud

Cloud GPU benchmark methodology

FAQs

Why does the same provider list the same GPU model at multiple prices?

How do I read the throughput-per-dollar number in the component above?

How often are these prices updated?

Further reading

Cite this research

Link with attributionHTML, for blog posts, LinkedIn articles & newsletters. Recommended.

APA 7th editionFor academic papers and analyst reports following APA 7th style.

BibTeXFor LaTeX documents and academic reference managers.

Comments 2

Share Your Thoughts

Next to Read

Disaster Recovery Benchmark: Acronis vs Comet vs MSP360

Graph Database Benchmark: Neo4j vs FalkorDB vs Memgraph

LLM Inference Engines: vLLM vs LMDeploy vs SGLang

Top 3 Synthetic Document Generators Benchmarked

GPU Concurrency Benchmark: H100 vs H200 vs B200 vs MI300X

Benchmark of 40+ LLMs in Finance: Claude Fable 5 & GPT-5.6 Sol