Microsoft Build 2026: Top Announcements for Agent Developers

Issue #66 | How do we get LLM Agents to Plan, Review and Monitor Progress on complex tasks?

Jun 22, 2026

Developer conferences - while extremely energizing (all of the new announcements, the labs, the learning), can be quite challenging to navigate - there’s alot.

Microsoft Build 2026 (June 2-3, San Francisco) this year has been no different - a whopping 460 sessions. Disclosure up front: I work at Microsoft on agent tooling, including one of the agent features covered below - so read this as an informed but invested view.

In this post, I will mostly walk through sessions focused on agents and agent development. There were lots of other excellent sessions - cloud-native PostgreSQL rebuilt for scale with Azure HorizonDB, a GPU-accelerated Fabric Data Warehouse with up to 7x faster analytics, and Arm-based Cobalt 200 VMs for cost-efficient AI workloads - but for today I will focus on the 106 sessions tagged “Agents & apps” and “Agents”. This covers announcements on new models, the hosted runtime, and the supporting layer (memory, retrieval, agent optimization, governance). For each, the useful question is the same: what was missing before, what landed now, and what does it bring Azure to parity on in the broader ecosystem.

If you are interested in building AI agents, and would like a book that covers the topic in depth, I wrote one: Designing Multi-Agent Systems. It is written in four parts that follow a theory -> build -> optimize -> apply arc: foundations and patterns, building multi-agent systems from scratch, evaluation and optimization and responsible AI, and real-world applications. The agent optimization theme in this post maps directly to Part III.

New in-house models from Microsoft AI (reasoning, code, image, transcription, text-to-speech)

There are now seven new Microsoft AI models, trained from scratch on “clean and commercially licensed data.” They span five areas: reasoning, code, image generation and editing, transcription, and text-to-speech.

These models are built to address requirements for many enterprise customers - models there the data used for training meets compliance requirements and the tools to easily adapt the models for specific business use cases are built in.

The training data is traceable and, per the announcement at Build, “enterprise-grade,” and it doesn’t distill from other labs, so you can fine-tune the weights without inheriting unknown data provenance or IP risk. With Frontier Tuning, “your institutional knowledge becomes part of the model, and it stays yours.” Owning the full data lineage is also how Microsoft reduces dependence on any single outside lab.

The lineup:

MAI-Thinking-1 is the reasoning flagship. A 35B active-parameter mixture-of-experts model with a 256K context window. Microsoft reports 53% on SWE-bench Pro and 97% on AIME 2025, and says independent human raters on Surge preferred it over Claude Sonnet 4.6 in blind side-by-side quality comparisons.
MAI-Code-1-Flash is a small agentic coding model: 5B active parameters inside a roughly 137B-total mixture-of-experts, 51% on SWE-bench Pro, positioned “closer to Haiku in size but cheaper in cost” (only a fraction of the network activates per token). It is rolling out as one of the default models in VS Code.
MAI-Image-2.5 (and a Flash variant) is an image generation and editing model Microsoft places at #2 on the public arena leaderboards, ahead of Nano Banana 2 on image editing.
MAI-Transcribe-1.5 claims state-of-the-art transcription across 43 languages, up to 5x faster than rival models.
MAI-Voice-2 (with a Flash variant coming) does text-to-speech in 15 languages with voice adaptation, emotional control, output watermarking, and protections against unauthorized cloning.

The MAI playground lets you try some of these models at no cost.

You can try the MAI models yourself in the MAI Playground or deploy them from the Foundry model catalog.

Watch: Mustafa Suleyman unveils the seven new MAI models, Build 2026 keynote

Hosted agents

Hosted agents in Foundry Agent Service first entered public preview in April 2026 (an earlier version was shown at Ignite 2025). They solve one problem well: you build your agent with any harness or code - LangGraph, Microsoft Agent Framework, the Claude Agent SDK, the OpenAI Agents SDK, the GitHub Copilot SDK, or your own - and run it on Azure with the benefits of a managed enterprise platform: managed scaling, observability, and a per-agent Entra identity. Each session runs in its own sandbox with a persistent filesystem and VM-level isolation, on Azure Container Apps Sandboxes (the same primitive behind GitHub’s cloud sandboxes).

You package the agent as a container, push it to Azure Container Registry, and Foundry runs it behind a managed endpoint. It speaks two protocols - the OpenAI-compatible Responses API and a schema-free Invocations protocol - and there is a quickstart to deploy one today.

Hosted agents reach general availability in July 2026. Two companion features were also announced: Routines (public preview) run an agent automatically on a schedule or trigger, and Toolboxes (public preview) give the agent one managed MCP endpoint for a curated bundle of tools, with auth and governance handled by Foundry.

Comparable managed agent runtimes: AWS Bedrock AgentCore and OpenAI’s hosted agents.

Watch: From prototype to production: build and run agents at scale, BRK241

Agent Optimization

Foundry’s Agent Optimization (public preview) targets the step after you ship. Building and deploying an agent with tools like hosted agents above is often the first step; getting a good performance-and-cost tradeoff is harder, and takes skill and effort many teams do not have to spare, more so across hundreds of agents in production. An agent’s behavior depends on its instructions, its tools and tool descriptions, its skills, and its model choice, and tuning those by hand does not scale.

P.S. I worked on this service. It’s in public preview - try it out and share feedback.

Agent Optimization starts from evals: you define an eval set that encodes what good performance looks like, and if you do not have one, the service can help you auto generate a initial set (read: initial, as only you as the SME truly knows what good performance looks like). From there it searches for better instructions, skills, tools, and model choices, and returns ranked agent candidates you can review and ship to production.

Watch: Agent Optimization in the Build 2026 keynote

Memory

Foundry agent memory (public preview) now spans three scopes: procedural (reusing successful execution patterns, which can lift task success 7-14%), user, and session. There is now a memory management view in the Foundry portal to inspect and edit what an agent has stored (CRUD on individual memories), plus multimodal memories, time-to-live so stale memories retire automatically, and direct “remember this” / “forget that” commands. This is the same ground covered by dedicated agent-memory libraries like Mem0 and Zep, now offered as a managed service.

Watch: Using Microsoft Agent Framework with Foundry managed memory

Retrieval

Foundry IQ (public preview) is a serverless, SLA-backed retrieval endpoint that unifies Work IQ, Fabric IQ, Azure SQL, and file search behind one interface, with Web IQ (limited access) adding sub-200ms web grounding with zero data retention. Each knowledge base is also exposed as an MCP server, so an agent grounds on it by adding one MCP endpoint. Comparable to managed retrieval and vector-database tooling.

Watch: Foundry IQ: fuel agents with enterprise knowledge and agentic retrieval, BRK246

Model routing and governance

The AI Gateway in Azure API Management brings a Unified Model API that gives you one contract across model providers, plus policy-driven routing, model fallback, semantic caching to cut token spend, and end-to-end logging of prompts, completions, and MCP tool calls. It also governs MCP servers and A2A endpoints, can wrap external or on-prem MCP servers, and can turn your existing REST APIs into MCP servers so you do not rebuild them. Comparable to model-router and LLM-gateway tools like OpenRouter and LiteLLM, in the API layer many Azure shops already run.

Watch: Govern AI models, tools, and agents with Azure API Management, OD831

The rest of Build, in one scan

Other launches worth noting if you run on Azure:

Azure Cobalt 200 VMs: Arm-based, roughly 50% better performance for AI workloads.
Foundry Local: sovereign, on-prem AI with retrieval and multi-node vLLM inference, plus agentic retrieval over local M365 data for air-gapped environments.
Azure Container Apps sandboxes: VM-level isolation aimed specifically at running untrusted agent code, complemented by a Microsoft Execution Containers SDK for policy-driven, least-privilege execution.
Agent 365 (GA): governance across local, SaaS, and cloud agents.
Serverless agents on Azure Functions: a markdown-first agent runtime with an MCP extension and 1,400+ managed connectors.
Purview sensitivity labels in AI Search: policy-aware retrieval so grounding respects data classification.
Data for agents: GPU-accelerated Fabric Data Warehouse (up to 7x faster analytics), Azure HorizonDB (enterprise PostgreSQL for AI patterns), a Cosmos DB Agent Kit, and DocumentDB gaining full-text and hybrid search.
GitHub Copilot SDK (GA) and a new agent-native Copilot desktop app with worktrees for running parallel coding agents.

What it means for builders

What’s new. Hosted runtime, managed memory, serverless retrieval, and a first-party model family were either early or missing on Azure a year ago. If you were reaching outside Azure for an agent runtime or a memory service, there are now native options.

References

Building a hill-climbing machine: Launching seven new MAI models, Microsoft AI (microsoft.ai)
Microsoft Build 2026: MAI keynote transcript (microsoft.ai)
Build and run agents at scale with Microsoft Foundry (devblogs.microsoft.com)
Agent Optimizer in Foundry Agent Service (devblogs.microsoft.com)
Agent Optimization referenced in the Build 2026 keynote (youtube)
What’s new in Microsoft Foundry, Build Edition (devblogs.microsoft.com)
Foundry IQ: unified knowledge and serverless retrieval (devblogs.microsoft.com)
Making agent memory more reliable and production-ready (devblogs.microsoft.com)
Microsoft Build 2026 session catalog (build.microsoft.com/sessions)

Michael Lopez Chiesa

Jun 22

Useful walk-through, and the "what was missing, what landed, what's at parity" frame is the right way to read 460 sessions. The thread I'd surface: the consequential launches aren't the models, they're the isolation and governance primitives, and they answer a problem the rest of the stack creates.

Once you have hosted agents running arbitrary code, managed memory they write to, and MCP/A2A endpoints, failures stop being SRE-shaped and turn trust-boundary-shaped: untrusted execution, write paths into memory, tool calls with authority. So the untrusted-code sandboxes, the least-privilege Execution Containers SDK, and the Gateway logging every tool call are what I'd lead with, defense-in-depth as product. Two boundary questions in that spirit: does memory's "remember/forget" CRUD go through a verification gate or can the agent mutate its own memory freely (unconstrained writes are how persistent memory rots silently), and does Gateway model-fallback flag that it fails over to a different trust profile, different refusals and blind spots, making routing a security decision, not just uptime.

Worth pressing because you've got a great posture, agent-as-untrusted by default, and those are the two seams where it either holds or quietly leaks. The parity read is the most clarifying I've seen on where Azure sits now.

Designing with AI

Discussion about this post

Ready for more?