Google is quietly rewriting its AI hardware playbook by opening discussions with Marvell Technology to co-develop a new generation of AI Inference Chips – a custom chip focused squarely on AI inference, according to a report by The Information.

These talks known to be still in their early, exploratory phase could bring two new chips into Google’s Tensor Processing Unit (TPU) ecosystem, with the goal of running AI workloads more efficiently and trimming long-term infrastructure costs.
The first chip under discussion would be a memory-processing unit (MPU) designed to work alongside Google’s existing TPUs.
It would handle memory-heavy operations usually done by the accelerator die, helping to ease on-chip bandwidth pressure, lower latency, and improve cache efficiency for AI workloads especially large-scale inference, where moving data around is one of the biggest bottlenecks.
The second chip would be a new TPU variant built specifically for inference, the phase in which trained models serve live queries to users. Inference already makes up the bulk of Google’s cloud AI compute, so even modest efficiency improvements at this layer can add up to significant cost savings at scale.
Nothing is finalized yet, but the move fits a clear trend, Google is broadening its TPU supply chain beyond Broadcom and MediaTek. Broadcom remains committed as a long-term TPU partner through 2031, while MediaTek has already stepped in with cost-optimized TPU designs.
Adding Marvell as a third independent design?services supplier would give Google more options to tune power, performance, and economics across different parts of its AI stack.
This diversification matters all the more as the market for custom ASICs and inference-specific chips is expected to grow sharply over the next few years, with analysts projecting a triple-digit-billion-dollar industry by the early 2030s.
For Marvell, even a potential design-services role tied to Google’s TPUs carries real strategic value. The company has long been known for its data-center and networking silicon; a high-profile AI inference contract with Google would position it as a core player in the AI-accelerator space, not just a supplier in the connectivity layer.
For Google, the math is more straightforward more design partners mean more flexibility, stronger negotiation leverage, and a better shot at landing chips that are finely tuned to the specific mix of workloads running in its cloud data centers.
The timing is also significant. Google is in Talks With Marvell come just as the giant search engine company’s TPU-based cloud services are becoming a major growth driver for its cloud division, helping Google Cloud stand out from competitors that lean heavily on Nvidia?based solutions.
At the same time, inference costs are rapidly overtaking training as the largest chunk of AI infrastructure spending. Squeezing even a few percentage points out of per-inference energy use or latency, across millions of daily queries, could translate into billions of dollars in annual savings.
That is exactly the kind of pressure Marvell’s memory-processing unit and the new inference-focused TPU are designed to relieve.
Still, it’s important to treat Google is in Talks With Marvell as an early-stage signal rather than a final deal. No binding contracts have been announced, and given typical chip-design timelines, any Marvell-designed silicon for Google is likely years away from mass production.
The fact that Google is already weighing multiple partners like Broadcom, MediaTek, TSMC for fabrication, and now Marvell shows it sees AI inference as a long-term, multi-wave initiative, not a one-off product cycle.





