Contact Us
No results found.

GPU Models

Last updated: Feb 2026

Rank
Model
Avail.
VRAM
BW
Cost
NVIDIA
NVIDIA
Released: 2024

B200 SXM

1st
Overall Ranking
$2.89
Vultr
1st
Overall Ranking
$2.89
Vultr

Benchmark Performance

Rank
Category
Performance
6th
Image Inference (Efficiency)
81k token/$
1st
Image Inference (Throughput)
110 token/s
5th
Image Finetuning (Efficiency)
84k image/$
1st
Image Finetuning (Throughput)
114 image/s
3rd
Text Inference (Efficiency)
24M token/$
1st
Text Inference (Throughput)
33k token/s
3rd
Text Finetuning (Efficiency)
13M image/$
1st
Text Finetuning (Throughput)
18k image/s

Technical Specifications

Blackwell
Architecture
192 GB
Memory
8.20 TB/s
Bandwidth
1,000 W
TDP
75 TFLOPS
FP32 Performance
2,250 TFLOPS
BF16 Performance

Provider Pricing by Region

Provider
/ Region
Price/hour
Vultr
/ Not Specified
$2.89 (x8 GPUs)
Verda Cloud
/ North Europe
$4.89 (x1 GPU)
Runpod
/ North America
$4.99 (x1 GPU)
Lambda
/ Australia & New Zealand
$5.29 (x1 GPU)
cirrascale
/ North America
$5.99 (x8 GPUs)

FAQ

This page helps you compare the technical specs and pricing of individual models. For a broader market overview based on performance-per-dollar, you can explore our comprehensive cloud GPU benchmark which compares different providers and pricing models for various AI workloads.

A Cloud GPU instance, which is the focus of this page, involves renting a virtual server with a dedicated GPU by the hour. This gives you continuous access to the hardware, making it ideal for long-running tasks like model training or predictable workloads. You can explore a broader comparison of providers in our main cloud GPU benchmark.

A Serverless GPU is a different model where you pay per-second only for the actual time your code runs, without managing any servers. This is highly cost-effective for tasks with variable traffic, like inference APIs. If this model fits your needs, you can compare providers on our dedicated serverless GPU benchmark.

While NVIDIA is the current market leader, companies like AMD and Intel are strong competitors, and cloud providers like AWS and Google also produce their own custom silicon. You can learn more about the top AI chip makers and the broader industry landscape in our detailed report.

The decision depends on factors like your team's expertise, workload predictability, and long-term budget. Our guide on whether to buy or rent GPUs explores the pros and cons of each approach to help you make the right strategic choice for your business.

This score measures the cost-efficiency of a GPU. It tells you how many million (M) tokens or images you get for every US dollar spent, combining both speed and price into a single performance-per-dollar value. For all benchmarks on this page, a higher score is always better because it means you get more performance for your money.

Inference measures efficiency for using a pre-trained model to generate new content (like text or images). A high Inference score is crucial for running applications like chatbots or AI art generators smoothly and affordably.

Training (or Fine-tuning) measures efficiency for customizing an existing model with your own data. A high Training score is important if you need to build specialized models quickly and cost-effectively.

Text Benchmarks (measured in tokens/$): These scores are relevant for language-based workloads. Choose a GPU with high text scores for tasks like running large language models (LLMs), content creation, and code generation.

Image Benchmarks (measured in images/$): These scores are relevant for visual workloads. Choose a GPU with high image scores for tasks like generating AI art, object recognition, or creating synthetic image data.