Microsoft is doubling down on AI models that aren’t large language models. The company announced the launch of three new models on Thursday: all-new models for voice and text transcription, and the second generation of its in-house image model.
The voice and text transcription templates are the first of their kind offered by Microsoft. The transcription template can translate recordings to text in 25 different languages. It is built for video subtitlingmeeting transcription and voice agents. The voice model can create audio recordings up to 60 seconds long. The company claims that its second-generation image model has faster generation speed and more realistic representations, improving its previous model. They are now available in Microsoft’s Foundry Playground and MAI, and there are plans to bring MAI-Image-2 to Bing and PowerPoint. Developers can view pricing information here.
These new models are a clear indication that Microsoft is looking to expand its offerings in the AI market. Microsoft’s Copilot is one of the most popular chatbots among businesses, especially those already using Microsoft’s Office 360 suite and the Azure cloud service. Aside from the now-obsolete original image model, Microsoft has primarily focused on text-based models, trying to stand out among its many competitors as a secure and business-friendly option. Its new AI tools, Co-pilot Cowork And Co-pilot healthare proof of this.
These models also serve as a reminder that Microsoft, as an incumbent technology company, has the money and compute to dedicate to these kinds of “side quests” that even billion-dollar startups like OpenAI can’t always afford to do. Last week, OpenAI confirmed that it would abandoning its Sora AI video appspecifying that he will refocus on the main activities. The AI industry in 2026 aims to prove that its tools are useful in the workplace, especially with Code Claude d’Anthropice surpassing the competition.
Generative media, like the models that power AI image and video generation, requires a lot of calculation and energy for running, which could be spent elsewhere. Google, as another legacy tech company with billions in budget allocated to AI research, indicated this week that it would not abandon generative media but would try to make the models more cost and energy efficient, such as with its new Veo 3.1 Lite Video Template.