DeepSeek-R1 671B is now operating on SambaNova Cloud at 198 tokens per second (t/s), reaching speeds and efficiency that no other platform can match, according to SambaNova, the generative AI startup that delivers the fastest models and most efficient AI chips.
Despite DeepSeek-R1’s 10X reduction in AI training costs, significant inference costs and inefficiencies have up till now prevented its broad use. This obstacle has been eliminated by SambaNova, enabling developers and businesses to use real-time, affordable inference at scale.
SambaNova Solves DeepSeek’s Biggest Challenge: Inference at Scale
Since DeepSeek-R1’s reasoning skills need a lot more computation for inference, making AI production more expensive, its mainstream acceptance has stagnated despite the fact that it has revolutionized AI by tenfold reducing training costs. In actuality, most developers have not been able to afford DeepSeek-R1 due to the inefficiency of GPU-based inference.
SambaNova has overcome this problem.
SambaNova’s SN40L Reconfigurable Dataflow Unit (RDU) chips, which have a proprietary dataflow architecture and a three-tier memory design, reduce the hardware needed to run DeepSeek-R1 671B from 40 racks (320 of the newest GPUs) to 1 rack (16 RDUs), enabling cost-effective inference at unparalleled efficiency.
The Most Efficient DeepSeek API in the World — 100X Current Global Capacity
SambaNova is rapidly scaling its capacity to meet anticipated demand, and by the end of the year will offer more than 100x the current global capacity for DeepSeek–R1. This makes its RDUs the most efficient enterprise solution for reasoning models.
Get Early Access to R1 on SambaNova Cloud
DeepSeek–R1 671B full model is available now to all users to experience and to select users via API on SambaNova Cloud. To try it today visit cloud.sambanova.ai.
Leadership Comments
“Powered by the SN40L RDU chip, SambaNova is the fastest platform running DeepSeek at 198 tokens per second per user,” stated Rodrigo Liang, CEO and co-founder of SambaNova. “This will increase to 5X faster than the latest GPU speed on a single rack — and by year end, we will offer 100X capacity for DeepSeek–R1.”
“Being able to run the full DeepSeek–R1 671B model — not a distilled version — at SambaNova’s blazingly fast speed is a game changer for developers. Reasoning models like R1 need to generate a lot of reasoning tokens to come up with a superior output, which makes them take longer than traditional LLMs. This makes speeding them up especially important,” stated Dr. Andrew Ng, Founder of DeepLearning.AI, Managing General Partner at AI Fund, and an Adjunct Professor at Stanford University’s Computer Science Department.
“Artificial Analysis has independently benchmarked SambaNova’s cloud deployment of the full 671 billion parameter DeepSeek–R1 Mixture of Experts model at over 195 output tokens/s, the fastest output speed we have ever measured for DeepSeek–R1. High output speeds are particularly important for reasoning models, as these models use reasoning output tokens to improve the quality of their responses. SambaNova’s high output speeds will support the use of reasoning models in latency sensitive use cases,” said George Cameron, Co-Founder, Artificial Analysis.
“DeepSeek–R1 is one of the most advanced frontier AI models available, but its full potential has been limited by the inefficiency of GPUs,” said Rodrigo Liang, CEO of SambaNova. “That changes today. We’re bringing the next major breakthrough — collapsing inference costs and reducing hardware requirements from 40 racks to just one — to offer DeepSeek–R1 at the fastest speeds, efficiently.”
“More than 10 million users and engineering teams at Fortune 500 companies rely on Blackbox AI to transform how they write code and build products. Our partnership with SambaNova plays a critical role in accelerating our autonomous coding agent workflows. SambaNova’s chip capabilities are unmatched for serving the full R1 671B model, which provides much better accuracy than any of the distilled versions. We couldn’t ask for a better partner to work with to serve millions of users,” stated Robert Rizk, CEO of Blackbox AI.
Sumti Jairath, Chief Architect, SambaNova, explained: “DeepSeek–R1 is the perfect match for SambaNova’s three-tier memory architecture. With 671 billion parameters R1 is the largest open source large language model released to date, which means it needs a lot of memory to run. GPUs are memory constrained, but SambaNova’s unique dataflow architecture means we can run the model efficiently to achieve 20000 tokens/s of total rack throughput in the near future — unprecedented efficiency when compared to GPUs due to their inherent memory and data communication bottlenecks.”