Services
Contact Us
No results found.

Cloud GPU Pricing, Performance & Provider Comparison

Cem Dilmegani
Cem Dilmegani
updated on May 19, 2026

Cloud GPU list prices for the same model can differ several times over from one provider to another. We curated the lowest rate, provider, market range, and median for 40+ GPU configurations across all three pricing tiers, plus a throughput-per-dollar benchmark on 10 models.

Cloud GPU price per throughput

See the most cost-effective GPU for your workload across 13 hyperscaler and neocloud providers, ranked by throughput per dollar:

Cloud GPU Throughput & Prices

Updated on May 29, 2026

Showing 12 of 217

Verda

Code
4V100.20V_4_FIN-03
Region
North Europe
GPU
4 x NVIDIA V100 64 GB
Images/s
98
Price/h
$ 0.55
641,455Images / $

Verda

Code
8V100.48V_8_FIN-02
Region
North Europe
GPU
8 x NVIDIA V100 128 GB
Images/s
195
Price/h
$ 1.11
632,432Images / $

Verda

Code
2V100.10V_2_FIN-03
Region
North Europe
GPU
2 x NVIDIA V100 32 GB
Images/s
49
Price/h
$ 0.28
630,000Images / $

Verda

Code
1V100.6V_1_FIN-01
Region
North Europe
GPU
1 x NVIDIA V100 16 GB
Images/s
24
Price/h
$ 0.14
617,143Images / $

Amazon Web Services

Code
inf1.2xlarge
Region
North America
GPU
1 x Amazon Web Services Inferentia
Images/s
35
Price/h
$ 0.24
525,000Images / $

Microsoft Azure

Code
NC24rs v3
Region
North America
GPU
4 x NVIDIA V100 16 GB
Images/s
98
Price/h
$ 0.73
483,288Images / $

Verda

Code
1A100.40S.22V_1_FIN-03
Region
North Europe
GPU
1 x NVIDIA A100 40 GB
Images/s
59
Price/h
$ 0.72
295,000Images / $

Verda

Code
8A100.40S.176V_8_FIN-03
Region
North Europe
GPU
8 x NVIDIA A100 320 GB
Images/s
469
Price/h
$ 5.77
292,617Images / $

Google Cloud

Code
a3-megagpu-8g
Region
North America
GPU
8 x NVIDIA H100 80 GB
Images/s
621
Price/h
$ 9.46
236,321Images / $

Amazon Web Services

Code
g5g.8xlarge
Region
North America
GPU
1 x NVIDIA A10G 16 GB
Images/s
28
Price/h
$ 0.43
234,419Images / $

Microsoft Azure

Code
NV24
Region
North America
GPU
4 x NVIDIA M60 16 GB
Images/s
31
Price/h
$ 0.49
227,755Images / $

Latitude

Code
g3.a100.large
Region
North America
GPU
8 x NVIDIA A100 80 GB
Images/s
469
Price/h
$ 7.82
215,908Images / $
Filters
GPU Name
Cloud
Region

See cloud GPU benchmark methodology for details.

On-demand is the most straightforward pricing model where you pay for the compute capacity by the hour or second, depending on what you use with no long-term commitments or upfront payments.

These instances are recommended for users who prefer the flexibility of a cloud GPU platform without any up-front payment or long-term commitment. On-demand instances are usually more expensive than spot instances, but they provide guaranteed uninterrupted capacity.

On-demand cloud GPU prices

Ranking: Sponsors are linked and highlighted at the top of the table. The remaining rows are ranked in ascending order by lowest on-demand price. Range shows the spread between the lowest and highest list price for the same SKU across all providers. Median is the middle of the price distribution across every listing for that SKU and serves as a fair-market anchor. Prices reflect the most recent weekly catalog refresh.

On-demand is the default rental model, pay per hour, no commitment, capacity guaranteed for as long as you keep the instance running. It is the most expensive tier but the only one without trade-offs.

Spot cloud GPU prices

Ranking: Rows are ranked by lowest spot price in ascending order. Spot capacity is interruptible. Median is the middle of the spot price distribution for that SKU.

Spot capacity is interruptible; the provider can reclaim the instance with little or no warning, usually when on-demand demand spikes. Spot rates typically run 30-60% below on-demand at the same provider. Use spot for checkpointable training, batch inference, and evaluation jobs that tolerate restarts. Avoid it for latency-sensitive inference or single-replica services without failover.

Reserved cloud GPU prices (1-year)

Ranking: Rows are ranked by the lowest 1-year reserved price in ascending order. Reservations lock in capacity for the term. Median is the middle of the reserved price distribution for that SKU.

Reservations lock in capacity for a fixed term in exchange for a discount versus on-demand. One-year contracts typically run 20-40% below the same provider’s on-demand list. In a few cases, reservation rates dip below spot, because the reserving provider isolates inventory from the spot market entirely.

Cloud provider performance comparison

The same GPU model can perform slightly differently across providers because of host CPU choice, network fabric, driver configuration, and virtualization overhead. To quantify this, we ran identical text and image generation workloads on AMD MI300X 192GB at DigitalOcean and Runpod:

Key Observations:

  • For text generation, Digital Ocean demonstrated a slightly higher throughput, processing approximately 0.4% more tokens per second.
  • Conversely, for image generation, Runpod showed a marginal advantage, processing about 0.4% more images per second.

The gap is small enough not to matter for most workloads. For latency-critical inference or large-scale training where every percentage point compounds across millions of inferences, benchmark the specific provider configuration before committing to a long reservation.

To get up to date on enterprise AI and software, follow us:
Cem Dilmegani
Cem Dilmegani
Principal Analyst

Buy on-prem or rent in the cloud

Owning makes sense when the workload is predictable, the team has the operational know-how, and hardware utilization stays above ~70% across the useful life of the GPU. For variable demand, training spikes, or product experiments, cloud rental wins on capital efficiency and scaling flexibility. The break-even sits roughly at 12-month utilization: above 70%, reservation or owned capacity almost always beats on-demand; below 50%, spot or on-demand wins on flexibility; the middle band depends on how much capacity disruption your workload tolerates.

A practical pattern at scale: own a baseline cluster sized to steady-state demand, rent in the cloud for spikes and exploratory work. Meta announced a multi-year partnership in February 2026 to deploy up to 6 gigawatts of AMD Instinct GPUs, signaling that even hyperscaler-scale operators continue to expand owned capacity while still consuming cloud GPU for variable workloads.

Consumer GPUs (RTX 4090, RTX 5090) deliver the best price per FLOP on paper, but NVIDIA’s EULA restricts their use in commercial data centers. They remain useful for individual workstations and proof-of-concept work, not production deployment.

Cloud GPU benchmark methodology

Throughput benchmarks use 4-bit FP quantization across all tests. The pipeline runs:

  • Text finetuning: Llama 3.2 on the first 5,000 conversations from FineTome, 5 epochs, 1M total tokens, Unsloth framework. Throughput = (tokens × epochs) / total time.
  • Text inference: 1M tokens generated with llama-cpp-python.
  • Image finetuning: YOLOv9 on 100 images from SkyFusion, 4 epochs, Unsloth.
  • Image inference: Finetuned YOLOv9 on ~500 images at 640×640.

The throughput-per-dollar metric divides workload output by the instance’s hourly cost. Throughput values are workload-specific and serve as relative guidelines; the same hardware will deliver materially different throughput on your own model.

FAQs

A single GPU model name often covers multiple physical SKUs. H100 ships in PCIe, SXM, SXM5, and NVL variants at different prices and interconnect bandwidths. A100 ships at 40GB and 80GB VRAM; V100 ships at 16GB and 32GB. Within a provider, the listed rate also varies by host CPU class, bundled RAM and storage, and region. The pricing tables above split SKUs by interconnect and VRAM where the source data allows, so each row is a single physical card rather than a model-name aggregate.

The component runs a fixed workload (image or text generation, finetuning, or inference) on each GPU instance and divides the total output by the instance’s hourly cost. A higher number is cheaper per output for that workload. The ranking shifts with the workload: A card optimized for FP8 inference can outrank a higher-VRAM card on text generation but lose on a large image-model finetune. Pick the workload tab that matches your job before reading the leaderboard.

The pricing tables refresh on a monthly catalog crawl.

Further reading

Cem Dilmegani
Cem Dilmegani
Principal Analyst
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
View Full Profile
Technically reviewed by
Ekrem Sarı
Ekrem Sarı
AI Researcher
Ekrem is an AI Researcher at AIMultiple, focusing on intelligent automation, GPUs, AI Agents, and RAG frameworks.
View Full Profile

Comments 2

Share Your Thoughts

Your email address will not be published. All fields are required.

0/450
Ashley Jenkinson
Ashley Jenkinson
Oct 31, 2024 at 08:54

Cem - great article, I'd love to pick your brain on private networking or direct connects to these GPU instances.

Cem Dilmegani
Cem Dilmegani
Nov 10, 2024 at 06:58

Hi Ashley, thank you! Sure, happy to chat.

Harsh Sharma
Harsh Sharma
Oct 06, 2024 at 02:19

Hi there, fantastic article and very well-researched. Would you mind checking out Dataoorts at https://dataoorts.com

Cem Dilmegani
Cem Dilmegani
Oct 22, 2024 at 03:18

Sure, we'll review to see if we can include Dataoorts in the next edit.