Microsoft has unveiled its first bespoke chips for artificial intelligence in the cloud, as developers clamour for alternative suppliers to Nvidia, which dominates the market for AI processors.
Two new processors — a general-purpose chip based on Arm designs called Cobalt and a specialised AI “accelerator” named Maia — will be deployed in Microsoft’s Azure data centres next year, supporting its services including OpenAI and Copilot.
Microsoft’s entry to the AI processor market, announced at its Ignite developer conference on Wednesday, follows renewed efforts by Intel and AMD to compete with Nvidia, which holds a virtual monopoly on the high-powered graphics processing units, or GPUs, needed to train large AI models.
Demand for Nvidia’s A100 and H100 chips has far outstripped supply over the past year, with even Microsoft tapping rival cloud providers such as Oracle for extra GPU capacity to support its own AI services. In another sign of computing capacity constraints in AI, OpenAI — which is backed by Microsoft and relies heavily on its infrastructure — was forced on Tuesday to “pause” new sign-ups to its ChatGPT Plus service after a “surge in usage”.
Microsoft’s chip initiative comes years after its cloud computing rivals Google and Amazon first introduced their own AI accelerators. Its investments in silicon show how it is doubling down on a multibillion-dollar bet that generative AI — the technology that powers OpenAI’s ChatGPT and is capable of creating humanlike text, code and imagery — will define the tech industry in the 2020s.
“It’s going to be these fundamental investments that we’re making that are going to help set up the next decade of innovation in the [AI] space,” Scott Guthrie, executive vice-president of Microsoft’s cloud and AI group, told the Financial Times. He said both chips were the first in a series, with follow-up versions in development.
Microsoft, which this year committed to invest $10bn in OpenAI as part of a multi-year partnership, has designed its Maia chip to work best with the AI company’s large language model, GPT.
Since 2016, Microsoft has increasingly developed its data centre hardware in-house. Custom silicon is its most ambitious step yet to tie its hardware and software “stack” more tightly together, promising improved performance and efficiencies. That would help boost value for Azure’s customers, Guthrie said, and help grow Microsoft’s margins, which improved in its last quarter despite mounting capital expenditure among cloud companies.
“In this era of AI, there’s going to be more workloads and more needs, by multiple orders of magnitude, from where we are today,” said Guthrie. “And so, every amount of optimisation, every amount of performance improvement we can get is going to help everyone.”
The chips have already received an endorsement from Sam Altman, chief executive of OpenAI, which worked with Microsoft to develop and test them.
“Azure’s end-to-end AI architecture, now optimised down to the silicon with Maia, paves the way for training more capable models and making those models cheaper for our customers,” he said in a statement. Until now, OpenAI has relied on Nvidia’s chips to train GPT models.
Developing custom silicon is estimated to cost hundreds of millions of dollars, making it a project that only the tech industry’s richest companies are able to take on.
Microsoft’s data centre AI chips, which in industry lingo are called application-specific integrated circuits or ASICs, will follow Google’s debut of its own TPU accelerators in 2015 and Amazon Web Services’ Trainium processors, unveiled in 2020. Anthropic, OpenAI’s largest start-up rival, said in the past month it planned to use both Trainium and TPU chips.
When Maia launches next year, Microsoft customers will experience it when they use its Bing, Microsoft 365 and Azure OpenAI services, rather than tapping it directly to run their own cloud-based applications. “We internally can automatically use the silicon without any customer actually having to change anything,” said Guthrie.
Rani Borkar, corporate vice-president for Azure Hardware Systems and Infrastructure, told the FT that Maia would be rolled out to public cloud customers at some point in the future. “Because these are the first generation, we are making sure that we are testing them,” Borkar said. “It won’t be just limited internally — we are just starting with that.”
While Microsoft has developed chips for its Xbox and HoloLens devices for more than a decade, its effort to create custom silicon for Azure began in 2020.
When Microsoft started working on the project, there was a “very high bar for anyone entering the silicon space”, Guthrie said, because of a wide diversity in the kinds of tasks that an AI processor would need to support. It became more straightforward to develop a purpose-built accelerator for a single model, GPT. OpenAI’s model is gaining momentum among AI developers, though competitors include Google’s PaLM, Anthropic’s Claude and Meta’s LLaMa.
Microsoft said Maia was designed for both “training” AI models — the most compute-intensive part of the AI process involving vast amounts of data — and “inference” or delivery of AI services, including ChatGPT and GitHub Copilot, which uses AI to help write software. The AI chip is made by Taiwan Semiconductor Manufacturing Company using the foundry’s 5nm process.
Even as Microsoft creates alternatives to existing chipmakers, Borkar said it would “continue our strong partnerships across the industry”, including by making the latest GPUs from AMD and Nvidia available to Azure customers. “At the scale we operate, it’s important to optimise and integrate every layer of the stack to maximise performance, but it’s also important to diversify and give our customers choices,” she added.
Ben Bajarin, tech industry analyst at consultancy Creative Strategies, said that Microsoft’s ability to develop its chips in concert with its software and other parts of its data centre infrastructure was a “real differentiator” against other cloud providers.
He compared it to Apple’s longstanding efforts to develop custom processors for its iPhones and Macs or Google’s dedicated video chips for YouTube, which help deliver a smoother experience for customers.
“Not everybody can do this. Not everybody will do this,” Bajarin said. “But for those who control enough of the stack and have the resources, this [custom silicon] trend will continue.”