GPT-5 in Microsoft 365 Copilot -- What Changes for Developers
Last week I was demoing a declarative agent to a customer. The agent worked fine – solid responses, decent reasoning, nothing to complain about. Then on August 7th, Microsoft flipped the switch on GPT-5 in M365 Copilot. I ran the exact same agent with the exact same prompts, and the responses were noticeably different. Better, mostly. But different. And “different” in production means you better understand what changed.
So here is what actually happened, what it means if you’re building on this stuff, and what you should go test right now.
How GPT-5 is wired into Copilot
Microsoft did not just swap out GPT-4o for GPT-5 and call it a day. The integration is more interesting than that. GPT-5 ships with a real-time router that picks the right model variant for each individual prompt.
There are two modes under the hood:
- High-throughput model – fast, lightweight, optimized for routine questions. Think “summarize this email” or “what’s on my calendar tomorrow.” Speed is the priority here.
- Deep reasoning model – slower, more deliberate, built for complex multi-step tasks. Planning, analysis, ambiguous situations. This is where the new intelligence shows up.
The router decides automatically which mode to use based on prompt complexity. You don’t control it directly in Copilot Chat – but in Copilot Studio you can.
Users see a “Try GPT-5” button in Copilot Chat. Once they activate it, GPT-5 handles that session. Microsoft has said GPT-5 will become the default model going forward.
What this means for declarative agents
Declarative agents run on top of the Copilot orchestrator, so they automatically get the GPT-5 upgrade without any code changes on your side. Sounds great, right? Mostly, yes.
The catch: GPT-5 follows instructions with what OpenAI calls “surgical precision.” It is way more sensitive to contradictory or vague instructions than GPT-4o was. Where the old model would just pick one interpretation and run with it, GPT-5 actually spends reasoning tokens trying to reconcile conflicts in your instructions. So if your declarative agent instructions have any ambiguity or contradictions, you might see worse performance, not better.
I went back through three of my declarative agent manifests and found at least one conflicting instruction in each. One agent had a rule saying “always respond in the user’s language” and another rule saying “format output in English.” GPT-4o just ignored the conflict. GPT-5 struggled with it.
Step one after the rollout: audit your agent instructions for conflicts.
Custom engine agents – less affected, more opportunity
Custom engine agents use their own models and orchestration, so GPT-5 in Copilot does not directly change how they work. Your custom engine agent still uses whatever model you configured.
That said, the opportunity here is indirect. If you are building custom engine agents because Copilot’s reasoning was not good enough for your scenario, it might be worth re-evaluating. GPT-5’s deep reasoning model handles multi-step logic and ambiguous situations much better. Some scenarios that previously needed a custom engine agent might now work with a well-built declarative agent – which is cheaper and easier to maintain.
What developers should test right now
If you have agents or Copilot extensions in production, here is my checklist:
- Run your existing prompts unchanged. Compare GPT-5 responses against what you had before. Look for behavioral differences, not just quality differences.
- Check for instruction conflicts. Go through every instruction in your declarative agent manifests. Remove contradictions. Set up clear hierarchies when rules might clash.
- Test edge cases harder. GPT-5 is better at reasoning but also more literal. Prompts that relied on the model “just figuring it out” might need to be more explicit now.
- Validate multi-turn conversations. GPT-5 has improved coherence across conversation turns, which is great, but it also carries more context forward – and that can occasionally lead to unexpected behavior if earlier turns contained errors.
- Monitor response times. The deep reasoning model is slower. If your users are used to sub-second responses and GPT-5 routes their prompt to the reasoning model, they will notice the latency.
Performance and accuracy improvements
The improvements are real. I tested across several enterprise scenarios and here is what I saw:
Instruction following got noticeably better. GPT-5 does what you tell it to do. Anyone who has built agents knows how much prompt engineering went into getting GPT-4o to reliably follow complex instructions, so this alone is a big deal.
Multi-turn coherence improved too – the model maintains context across longer conversations without losing track of earlier constraints.
The high-throughput model has longer context support, which means better results when your agent reasons over large SharePoint documents or lengthy email threads. And ambiguous queries get better handling because the reasoning model actually works through ambiguity instead of guessing. This matters a lot in business scenarios where user queries are rarely clean.
GPT-5 in Copilot Studio
For Copilot Studio builders, the August update is honestly a bigger deal than the Copilot Chat upgrade. Microsoft’s Copilot Studio blog post covers the details, but the headline is that you now have explicit model selection:
- GPT-5 Auto – uses the real-time router to pick between high-throughput and deep reasoning per prompt. Good default for most agents.
- GPT-5 Reasoning – forces the agent to primarily use the deep reasoning model. Use this when your agent handles complex business logic, planning, or analysis.
You can also use GPT-5 in custom prompt actions. Previously, custom prompts were limited to whatever model Copilot Studio defaulted to. Now you can point a specific prompt node at GPT-5 Reasoning for the steps that need heavy thinking, while the rest of the agent flow uses the faster model. That is a big deal for anything with mixed complexity.
A practical example: I built an agent that reviews contract clauses. The initial classification step (what type of clause is this?) works perfectly with the high-throughput model. But the compliance analysis step (does this clause comply with our internal policies?) benefits massively from the reasoning model. With GPT-5 Auto in Copilot Studio, the router handles this split automatically. With GPT-5 Reasoning, you can force it for the critical steps.
Note that as of August 2025, GPT-5 in Copilot Studio launched as “experimental” in early release cycle environments first, with general availability rolling out later. Check your tenant’s release cycle setting.
Cost implications – message packs and PAYG
Now for the money side. GPT-5 is more capable, but the reasoning model is also more expensive to run.
In Copilot Studio, billing works through Copilot credits:
- A standard generative answer costs 2 credits.
- When an agent uses a reasoning model, there is an additional premium charge: 100 credits per 10 responses for the “Text and generative AI tools (premium)” meter.
- One credit equals $0.01 in pay-as-you-go pricing.
For message packs, you get 25,000 credits for $200 per month. Pay-as-you-go bills at $0.01 per credit consumed.
In practice? If your agent routes most prompts to the reasoning model, costs go up. If the router uses the high-throughput model for routine queries and only escalates to reasoning when needed, the increase is modest. GPT-5 Auto is designed to be cost efficient by default.
My recommendation: start with GPT-5 Auto and monitor your credit consumption in the admin center before switching to GPT-5 Reasoning across the board. Microsoft also provides a usage estimator tool to help you model costs before deployment.
Prompt engineering in the GPT-5 era
This is probably the biggest change in how you work day to day. GPT-5 rewards precision and punishes ambiguity. Some practical stuff:
Strip the filler. Phrases like “You are a world-class expert” or “Take a deep breath and think step by step” – GPT-5 treats this as noise. It does not need encouragement. It needs a clear spec.
Be explicit about output format. GPT-5 follows formatting instructions closely. If you want bullet points, say so. If you want a specific JSON schema, provide it. The model will match your specification more tightly than GPT-4o did.
Watch for over-conciseness. GPT-5 is naturally less verbose. If your old prompts included “be concise,” you might now get responses that are too brief. Adjust accordingly.
Use structured tags. XML-style tags like <instructions> or <context> improve instruction adherence. Not required, but it helps the model parse complex prompts.
Resolve instruction hierarchies. When rules conflict, GPT-5 does not just pick one. It tries to reconcile them, which costs reasoning tokens and often produces worse results. Set explicit priority: “If rule A and rule B conflict, rule A takes precedence.”
Use the verbosity parameter. In Copilot Studio custom prompts, you can now control answer length separately from reasoning effort. This gives you more granular control than before.
Looking ahead
GPT-5 in M365 Copilot is a bigger deal than a typical model swap. The real-time router changes how Copilot processes prompts, and the Copilot Studio model selection gives builders actual control over which intelligence level their agents use. That combination matters.
If you’re a developer, go audit your existing agents and test them. Then look at scenarios that were previously too complex for declarative agents – some of those might be within reach now. I think prompt engineering is moving from “coax the model into doing what you want” to “write a precise specification and let the model execute it.” That shift is already happening.
Microsoft rolled this out on the same day OpenAI released the model, which tells you something about the pace. Test your agents, and keep an eye on those credit consumption numbers.
Read more
- Available today: GPT-5 in Microsoft 365 Copilot – Microsoft’s official August 7 announcement
- Available today: GPT-5 in Microsoft Copilot Studio – model routing and GPT-5 Auto/Reasoning details
- Copilot Studio credit and billing management – credit consumption, reasoning model premium, PAYG pricing
Enjoyed this post? Let's connect on LinkedIn:
Follow on LinkedIn →

