Fornecedores de IA
Explore a latência e os preços dos principais fornecedores de IA.
Provider
Best Price
Best Latency
# of Models
Deepinfra
AI Provider
Best Price
66 Modelo Modelos
Best Latency
59 Modelo Modelos
# Models
77 Modelo Modelos
Model | Latency | Blended $ | Input $ | Output $ |
|---|---|---|---|---|
| llama-3-2-1b | 1st 0.44 | 2nd $0.01 | $0.01 | $0.01 |
| llama-3-1-8b-turbo-fp8 | 1st 0.18 | 1st $0.02 | $0.02 | $0.03 |
| llama-3-2-3b | 1st 0.46 | 1st $0.02 | $0.02 | $0.02 |
| mistral-nemo | 1st 0.19 | 1st $0.03 | $0.02 | $0.04 |
| mistral-7b | 3rd 0.45 | 1st $0.03 | $0.03 | $0.05 |
| llama-3-1-8b | 0.28 | 3rd $0.04 | $0.03 | $0.05 |
| llama-3-8b | 1st 0.31 | 1st $0.04 | $0.03 | $0.06 |
| llama-3-2-11b-vision | 2nd 1.92 | 1st $0.05 | $0.05 | $0.05 |
| gemma-3-4b | 1st 0.35 | 1st $0.05 | $0.04 | $0.08 |
| deepseek-ocr | 1st 0.17 | 2nd $0.05 | $0.03 | $0.10 |
| gpt-oss-20b-high | 2nd 0.18 | 2nd $0.06 | $0.03 | $0.14 |
| gemma-3-12b | 1st 0.29 | 1st $0.06 | $0.04 | $0.13 |
| mistral-small-3 | 3rd 0.39 | 1st $0.06 | $0.05 | $0.08 |
| mistral-small-3-1 | 3rd 0.39 | 1st $0.06 | $0.05 | $0.10 |
| nvidia-nemotron-nano-9b-v2 | 1st 0.42 | 1st $0.07 | $0.04 | $0.16 |
| devstral-small-may | 1st 0.25 | 1st $0.07 | $0.06 | $0.12 |
| gpt-oss-120b-high | 2nd 0.26 | 1st $0.08 | $0.04 | $0.19 |
| qwen2-5-coder-32b | 2nd 0.40 | 1st $0.08 | $0.06 | $0.15 |
| phi-4 | 1st 0.31 | 1st $0.09 | $0.07 | $0.14 |
| nvidia-nemotron-3-nano | 1st 0.24 | 1st $0.10 | $0.06 | $0.24 |
| mistral-small-3-2-fp8 | 1st 0.25 | 1st $0.11 | $0.07 | $0.20 |
| gemma-3-27b | 1st 0.36 | 1st $0.11 | $0.09 | $0.16 |
| qwen3-14b-fp8 | 1st 0.21 | 1st $0.12 | $0.08 | $0.24 |
| devstral-small | 1st 0.26 | 1st $0.12 | $0.07 | $0.28 |
| qwen3-coder-30b-a3b-fp8 | 1st 0.22 | 1st $0.12 | $0.07 | $0.26 |
| qwen3-30b-fp8 | 1st 0.21 | 1st $0.13 | $0.08 | $0.29 |
| llama-4-scout | 0.32 | 2nd $0.14 | $0.08 | $0.30 |
| qwq-32b-preview | 1st 0.37 | 1st $0.14 | $0.12 | $0.18 |
| qwen3-32b-fp8 | 1st 0.54 | 1st $0.15 | $0.10 | $0.30 |
| llama-3-3-70b-turbo-fp8 | 1st 0.52 | 1st $0.15 | $0.10 | $0.32 |
| qwen3-235b-2507 | 2nd 0.41 | 1st $0.17 | $0.07 | $0.46 |
| llama-nemotron-super-49b-v1-5 | 1st 0.44 | 1st $0.17 | $0.10 | $0.40 |
| qwen2-5-72b | 1st 0.31 | 2nd $0.19 | $0.12 | $0.39 |
| qwen3-vl-4b-fp8 | 1st 0.45 | 1st $0.23 | $0.10 | $0.60 |
| deepseek-v3-2-exp | 1st 0.62 | 1st $0.24 | $0.21 | $0.32 |
| qwen3-235b-fp8 | 1st 0.39 | 1st $0.25 | $0.13 | $0.60 |
| gpt-oss-120b-high-turbo | 1st 0.19 | 1st $0.26 | $0.15 | $0.60 |
| llama-4-maverick-fp8 | 1st 0.41 | 1st $0.26 | $0.15 | $0.60 |
| qwen3-vl-30b-a3b-fp8 | 1st 0.19 | 1st $0.26 | $0.15 | $0.60 |
| llama-3-3-70b | 0.55 | 2nd $0.27 | $0.23 | $0.40 |
| deepseek-r1-distill-qwen-32b | 1st 0.37 | 1st $0.27 | $0.27 | $0.27 |
| deepseek-v3-2 | 1st 0.61 | 1st $0.29 | $0.26 | $0.39 |
| hermes-3-llama-3-1-70b | 1st 0.30 | 1st $0.30 | $0.30 | $0.30 |
| nvidia-nemotron-nano-12b-v2-vl-fp8 | 1st 0.21 | 1st $0.30 | $0.20 | $0.60 |
| olmo-3-1-32b-instruct | 1st 0.24 | 1st $0.30 | $0.20 | $0.60 |
| llama-3-70b | 1st 0.35 | 1st $0.33 | $0.30 | $0.40 |
| qwen3-next-80b-a3b | 1st 0.30 | 2nd $0.34 | $0.09 | $1.10 |
| deepseek-v3-1-terminus-fp4 | 1st 0.42 | 1st $0.35 | $0.21 | $0.79 |
| deepseek-v3-1-fp4 | 1st 0.75 | 1st $0.35 | $0.21 | $0.79 |
| llama-3-2-90b-vision | 2nd 0.57 | 1st $0.36 | $0.35 | $0.40 |
| deepseek-v3-0324 | 0.57 | 1st $0.37 | $0.20 | $0.88 |
| llama-3-1-70b | 1st 0.42 | 1st $0.40 | $0.40 | $0.40 |
| llama-3-1-70b-turbo-fp8 | 1st 0.33 | 1st $0.40 | $0.40 | $0.40 |
| glm-4-5-air | 1st 0.17 | 2nd $0.42 | $0.20 | $1.10 |
| minimax-m2 | 1st 0.27 | 1st $0.45 | $0.25 | $1.02 |
| glm-4-6v-fp8 | 1st 0.25 | 1st $0.45 | $0.30 | $0.90 |
| qwen3-vl-235b-a22b-fp8 | 1st 0.34 | 1st $0.45 | $0.20 | $1.20 |
| deepseek-v3-dec | 1st 0.38 | 1st $0.46 | $0.32 | $0.89 |
| llama-4-maverick-turbo-fp8 | 1st 0.47 | 1st $0.50 | $0.50 | $0.50 |
| qwen3-coder-480b-turbo-fp4 | 1st 0.23 | 1st $0.51 | $0.28 | $1.20 |
| minimax-m2-1-fp8 | 1st 0.27 | 1st $0.51 | $0.28 | $1.20 |
| mixtral-8x7b | 2nd 0.32 | 1st $0.54 | $0.54 | $0.54 |
| qwen3-vl-8b-fp8 | 1st 15.00 | 1st $0.66 | $0.18 | $2.09 |
| glm-4-5 | 1st 0.36 | 1st $0.69 | $0.38 | $1.60 |
| qwen3-coder-480b-fp8 | 1st 0.28 | 2nd $0.70 | $0.40 | $1.60 |
| deepseek-r1-distill-llama-70b | 1st 0.36 | 1st $0.75 | $0.60 | $1.20 |
| glm-4-7-fp4 | 1st 0.31 | 1st $0.76 | $0.43 | $1.75 |
| glm-4-6-fp4 | 1st 0.38 | 1st $0.76 | $0.43 | $1.75 |
| qwen3-235b-a22b-2507-fp8 | 1st 0.30 | 2nd $0.77 | $0.23 | $2.39 |
| kimi-k2-0905 | 0.78 | 1st $0.80 | $0.40 | $2.00 |
| kimi-k2-thinking | 0.61 | 1st $0.85 | $0.47 | $2.00 |
| kimi-k2 | 0.76 | 1st $0.88 | $0.50 | $2.00 |
| deepseek-r1-0528 | 1st 0.38 | 1st $0.91 | $0.50 | $2.15 |
| glm-4-6-fp8 | 2nd 35.49 | 1st $0.93 | $0.60 | $1.90 |
| deepseek-r1-jan | 1st 0.35 | 1st $1.13 | $0.70 | $2.40 |
| llama-3-1-nemotron-70b | 1st 0.32 | 1st $1.20 | $1.20 | $1.20 |
| deepseek-r1-jan-turbo-fp4 | 1st 0.41 | 1st $1.50 | $1.00 | $3.00 |
llama-3-2-1b
Latency
0.44
Blended $
$0.01
Input $
$0.01
Output $
$0.01
llama-3-1-8b-turbo-fp8
Latency
0.18
Blended $
$0.02
Input $
$0.02
Output $
$0.03
llama-3-2-3b
Latency
0.46
Blended $
$0.02
Input $
$0.02
Output $
$0.02
mistral-nemo
Latency
0.19
Blended $
$0.03
Input $
$0.02
Output $
$0.04
mistral-7b
Latency
0.45
Blended $
$0.03
Input $
$0.03
Output $
$0.05
llama-3-1-8b
Latency
0.28
Blended $
$0.04
Input $
$0.03
Output $
$0.05
llama-3-8b
Latency
0.31
Blended $
$0.04
Input $
$0.03
Output $
$0.06
llama-3-2-11b-vision
Latency
1.92
Blended $
$0.05
Input $
$0.05
Output $
$0.05
gemma-3-4b
Latency
0.35
Blended $
$0.05
Input $
$0.04
Output $
$0.08
deepseek-ocr
Latency
0.17
Blended $
$0.05
Input $
$0.03
Output $
$0.10
gpt-oss-20b-high
Latency
0.18
Blended $
$0.06
Input $
$0.03
Output $
$0.14
gemma-3-12b
Latency
0.29
Blended $
$0.06
Input $
$0.04
Output $
$0.13
mistral-small-3
Latency
0.39
Blended $
$0.06
Input $
$0.05
Output $
$0.08
mistral-small-3-1
Latency
0.39
Blended $
$0.06
Input $
$0.05
Output $
$0.10
nvidia-nemotron-nano-9b-v2
Latency
0.42
Blended $
$0.07
Input $
$0.04
Output $
$0.16
devstral-small-may
Latency
0.25
Blended $
$0.07
Input $
$0.06
Output $
$0.12
gpt-oss-120b-high
Latency
0.26
Blended $
$0.08
Input $
$0.04
Output $
$0.19
qwen2-5-coder-32b
Latency
0.40
Blended $
$0.08
Input $
$0.06
Output $
$0.15
phi-4
Latency
0.31
Blended $
$0.09
Input $
$0.07
Output $
$0.14
nvidia-nemotron-3-nano
Latency
0.24
Blended $
$0.10
Input $
$0.06
Output $
$0.24
mistral-small-3-2-fp8
Latency
0.25
Blended $
$0.11
Input $
$0.07
Output $
$0.20
gemma-3-27b
Latency
0.36
Blended $
$0.11
Input $
$0.09
Output $
$0.16
qwen3-14b-fp8
Latency
0.21
Blended $
$0.12
Input $
$0.08
Output $
$0.24
devstral-small
Latency
0.26
Blended $
$0.12
Input $
$0.07
Output $
$0.28
qwen3-coder-30b-a3b-fp8
Latency
0.22
Blended $
$0.12
Input $
$0.07
Output $
$0.26
qwen3-30b-fp8
Latency
0.21
Blended $
$0.13
Input $
$0.08
Output $
$0.29
llama-4-scout
Latency
0.32
Blended $
$0.14
Input $
$0.08
Output $
$0.30
qwq-32b-preview
Latency
0.37
Blended $
$0.14
Input $
$0.12
Output $
$0.18
qwen3-32b-fp8
Latency
0.54
Blended $
$0.15
Input $
$0.10
Output $
$0.30
llama-3-3-70b-turbo-fp8
Latency
0.52
Blended $
$0.15
Input $
$0.10
Output $
$0.32
qwen3-235b-2507
Latency
0.41
Blended $
$0.17
Input $
$0.07
Output $
$0.46
llama-nemotron-super-49b-v1-5
Latency
0.44
Blended $
$0.17
Input $
$0.10
Output $
$0.40
qwen2-5-72b
Latency
0.31
Blended $
$0.19
Input $
$0.12
Output $
$0.39
qwen3-vl-4b-fp8
Latency
0.45
Blended $
$0.23
Input $
$0.10
Output $
$0.60
deepseek-v3-2-exp
Latency
0.62
Blended $
$0.24
Input $
$0.21
Output $
$0.32
qwen3-235b-fp8
Latency
0.39
Blended $
$0.25
Input $
$0.13
Output $
$0.60
gpt-oss-120b-high-turbo
Latency
0.19
Blended $
$0.26
Input $
$0.15
Output $
$0.60
llama-4-maverick-fp8
Latency
0.41
Blended $
$0.26
Input $
$0.15
Output $
$0.60
qwen3-vl-30b-a3b-fp8
Latency
0.19
Blended $
$0.26
Input $
$0.15
Output $
$0.60
llama-3-3-70b
Latency
0.55
Blended $
$0.27
Input $
$0.23
Output $
$0.40
deepseek-r1-distill-qwen-32b
Latency
0.37
Blended $
$0.27
Input $
$0.27
Output $
$0.27
deepseek-v3-2
Latency
0.61
Blended $
$0.29
Input $
$0.26
Output $
$0.39
hermes-3-llama-3-1-70b
Latency
0.30
Blended $
$0.30
Input $
$0.30
Output $
$0.30
nvidia-nemotron-nano-12b-v2-vl-fp8
Latency
0.21
Blended $
$0.30
Input $
$0.20
Output $
$0.60
olmo-3-1-32b-instruct
Latency
0.24
Blended $
$0.30
Input $
$0.20
Output $
$0.60
llama-3-70b
Latency
0.35
Blended $
$0.33
Input $
$0.30
Output $
$0.40
qwen3-next-80b-a3b
Latency
0.30
Blended $
$0.34
Input $
$0.09
Output $
$1.10
deepseek-v3-1-terminus-fp4
Latency
0.42
Blended $
$0.35
Input $
$0.21
Output $
$0.79
deepseek-v3-1-fp4
Latency
0.75
Blended $
$0.35
Input $
$0.21
Output $
$0.79
llama-3-2-90b-vision
Latency
0.57
Blended $
$0.36
Input $
$0.35
Output $
$0.40
deepseek-v3-0324
Latency
0.57
Blended $
$0.37
Input $
$0.20
Output $
$0.88
llama-3-1-70b
Latency
0.42
Blended $
$0.40
Input $
$0.40
Output $
$0.40
llama-3-1-70b-turbo-fp8
Latency
0.33
Blended $
$0.40
Input $
$0.40
Output $
$0.40
glm-4-5-air
Latency
0.17
Blended $
$0.42
Input $
$0.20
Output $
$1.10
minimax-m2
Latency
0.27
Blended $
$0.45
Input $
$0.25
Output $
$1.02
glm-4-6v-fp8
Latency
0.25
Blended $
$0.45
Input $
$0.30
Output $
$0.90
qwen3-vl-235b-a22b-fp8
Latency
0.34
Blended $
$0.45
Input $
$0.20
Output $
$1.20
deepseek-v3-dec
Latency
0.38
Blended $
$0.46
Input $
$0.32
Output $
$0.89
llama-4-maverick-turbo-fp8
Latency
0.47
Blended $
$0.50
Input $
$0.50
Output $
$0.50
qwen3-coder-480b-turbo-fp4
Latency
0.23
Blended $
$0.51
Input $
$0.28
Output $
$1.20
minimax-m2-1-fp8
Latency
0.27
Blended $
$0.51
Input $
$0.28
Output $
$1.20
mixtral-8x7b
Latency
0.32
Blended $
$0.54
Input $
$0.54
Output $
$0.54
qwen3-vl-8b-fp8
Latency
15.00
Blended $
$0.66
Input $
$0.18
Output $
$2.09
glm-4-5
Latency
0.36
Blended $
$0.69
Input $
$0.38
Output $
$1.60
qwen3-coder-480b-fp8
Latency
0.28
Blended $
$0.70
Input $
$0.40
Output $
$1.60
deepseek-r1-distill-llama-70b
Latency
0.36
Blended $
$0.75
Input $
$0.60
Output $
$1.20
glm-4-7-fp4
Latency
0.31
Blended $
$0.76
Input $
$0.43
Output $
$1.75
glm-4-6-fp4
Latency
0.38
Blended $
$0.76
Input $
$0.43
Output $
$1.75
qwen3-235b-a22b-2507-fp8
Latency
0.30
Blended $
$0.77
Input $
$0.23
Output $
$2.39
kimi-k2-0905
Latency
0.78
Blended $
$0.80
Input $
$0.40
Output $
$2.00
kimi-k2-thinking
Latency
0.61
Blended $
$0.85
Input $
$0.47
Output $
$2.00
kimi-k2
Latency
0.76
Blended $
$0.88
Input $
$0.50
Output $
$2.00
deepseek-r1-0528
Latency
0.38
Blended $
$0.91
Input $
$0.50
Output $
$2.15
glm-4-6-fp8
Latency
35.49
Blended $
$0.93
Input $
$0.60
Output $
$1.90
deepseek-r1-jan
Latency
0.35
Blended $
$1.13
Input $
$0.70
Output $
$2.40
llama-3-1-nemotron-70b
Latency
0.32
Blended $
$1.20
Input $
$1.20
Output $
$1.20
deepseek-r1-jan-turbo-fp4
Latency
0.41
Blended $
$1.50
Input $
$1.00
Output $
$3.00
FAQ
Laboratórios de IA (OpenAI, Anthropic, Google) desenvolvem modelos fundamentais de IA. Provedores de IA (Baseten, Together AI, Groq, DeepInfra) hospedam e disponibilizam esses modelos por meio de serviços de infraestrutura, gerenciando recursos computacionais, APIs e monitoramento. Gateways de IA atuam como middleware, oferecendo acesso unificado a APIs de múltiplos provedores com roteamento inteligente e otimização de custos.
Compare as estruturas de preços baseadas em tokens dos provedores. Por exemplo, em nossa análise comparativa , descobrimos que o OpenRouter oferece o menor custo, a US$ 0,08 por milhão de tokens de saída para o Llama 4 Scout, seguido pelo SambaNova, a US$ 0,11. Também examinamos os preços específicos de cada modelo entre os provedores de IA , onde plataformas como o Baseten oferecem rastreamento de custos por solicitação e por tipo de GPU para facilitar a comparação.
Sim, os limites de taxa variam bastante. Provedores de alto desempenho, como Cerebras e SambaNova, são otimizados para cargas de trabalho de grande escala, enquanto o Groq se especializa em respostas de latência ultrabaixa. Plataformas como o Baseten permitem configurar parâmetros de escalonamento automático, embora uma configuração incorreta possa afetar tanto o custo quanto a latência. Escolha com base na sua necessidade de capacidade de pico ou desempenho estável.
Sim, existem diversas opções para acessar modelos de IA gratuitos. Muitos modelos de código aberto, como Llama, Mistral e Gemma, estão disponíveis por meio de APIs gratuitas em plataformas como Hugging Face, OpenRouter e Together AI, embora geralmente apresentem limites de taxa e restrições de uso. Os modelos de lógica de aprendizado de máquina (LLMs) de código aberto oferecem maior segurança de dados, pois podem ser implantados em infraestrutura privada, além de eliminarem taxas de licenciamento e riscos de dependência de fornecedor. Embora os modelos em si sejam gratuitos, os custos de implantação variam dependendo se você os executa localmente (gratuitamente se o seu hardware for compatível), em provedores gerenciados ou em infraestrutura de nuvem.
Escolha os fornecedores de IA com base nas suas prioridades. Para depuração detalhada, selecione Baseten (rastreamento em nível de requisição) ou Together AI (métricas por versão). Para latência ultrabaixa, nosso benchmark constatou que o Groq oferece latência de 0,13s para o primeiro token. Para otimização de custos, o Parasail oferece troca flexível de GPUs. Para implantação rápida, o Fireworks AI fornece endpoints de API imediatos. Para governança corporativa, o Databricks integra rastreamento MLflow e linhagem de dados. Considere usar gateways de IA se precisar de flexibilidade entre vários fornecedores.