GPU Models

Last updated: Feb 2026

Rank

Model

Avail.

VRAM

Cost

NVIDIA

Released: 2024

B200 SXM

1st

Ranking

$4.89

Verda

1st

Ranking

$4.89

Verda

Benchmark Performance

Rank

Technical Specifications

Blackwell

Architecture

192 GB

Memory

8.20 TB/s

Bandwidth

1,000 W

TDP

75 TFLOPS

FP32 Performance

2,250 TFLOPS

BF16 Performance

Provider Pricing by Region

Provider

/ Region

Price/hour

Verda

/ North Europe

$4.89 (x1 GPU)

Runpod

/ East Europe

$4.99 (x1 GPU)

Lambda

/ Australia & New Zealand

$5.29 (x1 GPU)

Vultr

/ Not Specified

$23.92 (x8 GPUs)

Cirrascale

/ North America

$47.92 (x8 GPUs)

Coreweave

/ Not Specified

$68.80 (x8 GPUs)

FAQ

This page helps you compare the technical specs and pricing of individual models. For a broader market overview based on performance-per-dollar, you can explore our comprehensive cloud GPU benchmark which compares different providers and pricing models for various AI workloads.

A Cloud GPU instance, which is the focus of this page, involves renting a virtual server with a dedicated GPU by the hour. This gives you continuous access to the hardware, making it ideal for long-running tasks like model training or predictable workloads. You can explore a broader comparison of providers in our main cloud GPU benchmark.

A Serverless GPU is a different model where you pay per-second only for the actual time your code runs, without managing any servers. This is highly cost-effective for tasks with variable traffic, like inference APIs. If this model fits your needs, you can compare providers on our dedicated serverless GPU benchmark.

While NVIDIA is the current market leader, companies like AMD and Intel are strong competitors, and cloud providers like AWS and Google also produce their own custom silicon. You can learn more about the top AI chip makers and the broader industry landscape in our detailed report.

The decision depends on factors like your team's expertise, workload predictability, and long-term budget. Our guide on whether to buy or rent GPUs explores the pros and cons of each approach to help you make the right strategic choice for your business.

This score measures the cost-efficiency of a GPU. It tells you how many million (M) tokens or images you get for every US dollar spent, combining both speed and price into a single performance-per-dollar value. For all benchmarks on this page, a higher score is always better because it means you get more performance for your money.

Inference measures efficiency for using a pre-trained model to generate new content (like text or images). A high Inference score is crucial for running applications like chatbots or AI art generators smoothly and affordably.

Training (or Fine-tuning) measures efficiency for customizing an existing model with your own data. A high Training score is important if you need to build specialized models quickly and cost-effectively.

Text Benchmarks (measured in tokens/$): These scores are relevant for language-based workloads. Choose a GPU with high text scores for tasks like running large language models (LLMs), content creation, and code generation.

Image Benchmarks (measured in images/$): These scores are relevant for visual workloads. Choose a GPU with high image scores for tasks like generating AI art, object recognition, or creating synthetic image data.

GPU Models

B200 SXM

Benchmark Performance

Technical Specifications

Provider Pricing by Region

FAQ

How do I choose the best cloud GPU for my specific task?

What is the difference between a Cloud GPU instance and a Serverless GPU?

Who are the leading companies that produce these AI chips?

Should I buy my own GPUs or rent them from the cloud?

What does a benchmark score like 'Inference (2465M tokens/$)' mean?

What is the difference between 'Inference' and 'Training'?

What is the difference between the 'Text' and 'Image' benchmarks?