GPU cloud prices have fallen roughly 40% year over year thanks to new entrants and reduced NVIDIA supply constraints. If you're training models, running inference, or rendering 3D, here's where to get the best hardware per dollar in 2026.
The 2026 GPU cloud landscape
The market split into three clear tiers this year:
- Hyperscalers (AWS, Azure, GCP): convenient, expensive, best if you're already in their ecosystem.
- Specialized AI clouds (RunPod, Paperspace, Lambda Labs, CoreWeave, Together AI): better pricing, purpose-built for ML.
- Community / peer-to-peer: cheapest, occasionally less reliable, great for experiments.
Top GPU providers of 2026
1. RunPod — best price for H100 and A100
RunPod has emerged as the price leader for on-demand H100 and A100 instances. Their Secure Cloud (tier-3/4 DCs) offers H100 PCIe from roughly $2.49/hour and A100 80GB from $1.19/hour. The Community Cloud goes even cheaper — often 40–60% less — with slightly less strict SLAs. Pre-built templates cover PyTorch, TensorFlow, Stable Diffusion, and Ollama out of the box.
2. Paperspace (DigitalOcean) — best dev-friendly GPU cloud
Now owned by DigitalOcean, Paperspace combines Gradient (ML notebooks and pipelines) with raw GPU Droplets. You can spin up an A100 or H100 in two clicks, launch a Jupyter Lab, and be training within 90 seconds. The pricing is fair (A100 from $3.09/hour, H100 from $5.95/hour) and integration with DigitalOcean Spaces is excellent.
3. Lambda Labs — best for reserved long-running jobs
Lambda is the darling of serious ML engineers. Their prices on reserved 1-click clusters (DGX H100) are hard to beat if you're running jobs for days or weeks straight. The catch: on-demand capacity frequently sells out, especially for 8x H100 nodes.
4. Vultr Cloud GPU — best for inference at the edge
If your inference needs to run close to users globally, Vultr's GPU footprint across 32 regions is uniquely valuable. They offer A100 and A40 instances with full root access and hourly billing in most regions.
5. CoreWeave — best for enterprise training at scale
CoreWeave specialises in large-scale ML infrastructure. 8x H100 and H200 SuperPODs, InfiniBand networking, Kubernetes-native. If you're training a frontier model, this is where you go — but pricing reflects the specialization.
Which GPU do you actually need?
Don't default to H100. Match the GPU to the workload:
- Hobbyist fine-tuning and SD image generation: RTX 4090 or A6000 (~$0.50–$0.80/hour).
- Fine-tuning models up to 13B parameters: A100 40GB (~$1.20–$1.60/hour).
- Fine-tuning 30B+ or pre-training medium models: A100 80GB or H100 PCIe.
- Training frontier models from scratch: H100 SXM or H200 multi-GPU nodes.
- Real-time inference: L40S or A10 — better tokens-per-second per dollar than H100 for smaller models.
- Video generation and 3D rendering: RTX 4090 or RTX 6000 Ada.
On-demand vs reserved pricing
Rule of thumb: if you'll use the GPU more than 12 hours a day, reserved beats on-demand. Most providers discount 30–60% for 1–12 month commitments. For burst workloads or experiments, always pay on-demand; the flexibility is worth the premium.
Bottlenecks beyond the GPU
New ML engineers often over-index on GPU FLOPS and forget everything else. In practice:
- Disk I/O: dataset loading is frequently the bottleneck. Pay for NVMe local storage, not just blob storage.
- CPU: data preprocessing happens on CPU. An H100 paired with 4 wimpy vCPUs will idle. Look for instances with at least 8–16 vCPUs per GPU.
- RAM: plan for at least 2x VRAM in system RAM (so 80 GB A100 = 160 GB RAM minimum).
- Inter-GPU bandwidth: for multi-GPU training, InfiniBand or NVLink matters more than raw GPU count.
How to save money on GPU compute
- Use mixed precision training (bfloat16/fp16) to cut wall-clock time 2–3x.
- Use gradient checkpointing to train larger models on smaller GPUs.
- Use spot / preemptible instances for non-critical batch jobs (often 60–70% cheaper).
- Use serverless GPU endpoints (RunPod Serverless, Modal) for bursty inference — you only pay for active seconds.
- Use tensor parallelism + pipeline parallelism libraries (DeepSpeed, vLLM, TensorRT-LLM) to squeeze more throughput out of each GPU.
Things to check before signing up
Before putting real money down:
- Can you actually reserve capacity? Many providers advertise H100 but routinely run out. Check their public status page for availability patterns.
- Do they charge for stopped instances? Some providers bill storage even on stopped machines — learn the fine print.
- Do they have pre-built ML images? If you want to skip dependency hell, this saves hours.
- Is there a free trial or $5 promotional credit? Almost everyone offers one if you ask.
Where to go from here
If you're just starting out, RunPod Community Cloud is the cheapest way to experiment. For production inference at scale, Vultr Cloud GPU or Paperspace strike the best balance. For serious training, RunPod Secure and Lambda Labs lead. Browse the full GPU cloud ranking to compare specs and prices directly.