Experience the fastest proprietary and flagship AI models on the market, powered by next-gen chips.
Achieve high-quality performance at a fraction of the cost compared to other LLM APIs.
Ninja’s models are rigorously tested against leading AI benchmarks, demonstrating near state-of-the-art performance across diverse domains.
Ninja's Compound AI Models
Ninja's proprietary LLMs are the easy choice for developers looking for the best performance. Our compound AI models combine multiple flagship LLMs from OpenAI, Anthropic, Google, DeepSeek, and others, with cutting edge inference level optimization.
Ninja’s Pricing & Future Offerings
Ninja makes it possible to access the world’s best AI models at unbeatable prices. Along with offering APIs for our proprietary models, we’re expanding with external models tailored to diverse industries and specialized tasks.
Mode
Input price / per M tokens
Output price / per M tokens
Price / task
Qwen 3 Coder 480B (Cerebras)
–
–
$1.50
Standard mode
–
–
$1.00
Complex mode
–
–
$1.50
Fast mode
–
–
$1.50
Mode
Input price / per M tokens
Output price / per M tokens
Price / task
Qwen 3 Coder 480B (Cerebras)
$3.75
$3.75
–
Standard mode
$1.50
$1.50
–
Complex mode
$4.50
$22.50
–
Fast mode
$3.75
$3.75
–
Model
Input price / per M tokens
Output price / per M tokens
Turbo 1.0
$0.11
$0.42
Apex 1.0
$0.88
$7.00
Reasoning 2.0
$0.38
$1.53
Deep Research 2.0
$1.40
$5.60
Rate Limits
Ninja AI enforces rate limits on inference requests per model to ensure that developers are able to try the fastest inference.
Model
Request per minute (RPM)
Turbo 1.0
50
Apex 1.0
20
Reasoning 2.0
30
Deep Research 2.0
5
Ninja API Performance
Flagship Models: Turbo 1.0 & Apex 1.0
Apex 1.0 scored the highest on the industry-standard Arena-Hard-Auto (Chat) test. It measures how well AI can handle complex, real-world conversations, focusing on its ability to navigate scenarios that require nuanced understanding and contextual awareness.
The models also excel in other benchmarks: Math-500, AIME2024 - Reasoning, GPQA - Reasoning, LiveCodeBench - Coding, and LiveCodeBench - Coding - Hard.

Last updated: 04/15/2025

Last updated: 04/15/2025

Last updated: 04/15/2025

Last updated: 04/15/2025

Last updated: 04/15/2025

Last updated: 04/15/2025
Reasoning 2.0
Reasoning 2.0 outperformed OpenAI O1 and Sonnet 3.7 in competitive math on the AIME test. It assesses AI’s ability to handle problems requiring logic and advanced reasoning.
Reasoning 2.0 also surpassed human PhD-level accuracy on the GPQA test. It evaluates general reasoning through complex, multi-step questions requiring factual recall, inference, and problem-solving.
.avif)
Last updated: 04/15/2025
.avif)
Last updated: 04/15/2025
.avif)
Last updated: 04/15/2025
Deep Research 2.0
Deep Research achieved 91.2% accuracy on the SimpleQA test. It’s one of the best proxies for detecting the hallucination levels of a model. This highlights Deep Research’s exceptional ability to accurately identify factual information—surpassing leading models in the field.
In the GAIA test, Deep Research scored 57.64%, which indicates superior performance in navigating real-world information environments, synthesizing data from multiple sources, and producing factual, concise answers.
Deep Research also achieved a significant breakthrough in AI with a 17.47% score on the HLE test. It’s widely recognized as a rigorous benchmark for evaluating AI systems across more than 100 subjects. Deep Research performed notably higher than several other leading AI models, including o3-mini, o1, and DeepSeek-R1.

Last updated: 04/15/2025

Last updated: 04/15/2025
Provider (Pass @1)
Level 1
Level 2
Level 3
Average
OpenAI's Deep Research
74.29
69.06
47.6
67.36
Ninjas's Deep Research
69.81
56.97
46.15
57.64
Data source: OpenAI Blog post – Read more
%20Benchmark.avif)
Last updated: 04/15/2025
You can sign up for free or subscribe to an Ultra or Business tier. Ultra and Business give you unlimited access to the playground to experiment with flagship, reasoning, and Deep Research models.








