The Copilot Retrieval API -- Accessing Enterprise Knowledge Without a Copilot License
The Copilot Retrieval API – Accessing enterprise knowledge without a Copilot license
I have been waiting for something like this. Not because I needed another API to learn, but because building RAG on top of Microsoft 365 data has always meant either paying for full Copilot licenses or rolling your own retrieval stack. Neither option was great.
Microsoft has been opening up the components that power Microsoft 365 Copilot as standalone APIs. The first wave landed at Build 2025 – Retrieval, Interactions Export, Change Notifications, Meeting Insights, and Chat. The Retrieval API is the one I keep coming back to, because it solves a problem that has been bothering me for a while: how do you ground your custom AI solutions with enterprise knowledge without rebuilding the entire search and indexing stack yourself?
And here is the kicker – you can now use it without a Copilot add-on license. Pay-as-you-go. Ten cents per API call. That changes the math for a lot of custom app scenarios.
What the Retrieval API actually does
Let me be blunt about what this is. The Retrieval API gives you programmatic access to the same hybrid index that powers Microsoft 365 Copilot. You send it a natural language query, and it returns relevant text chunks from SharePoint, OneDrive, and Copilot connectors – permission trimmed, with sensitivity labels, ready to feed into your LLM.
This is RAG without the infrastructure. You do not need to stand up a vector database, build a chunking pipeline, replicate the permissions model, or duplicate data. The content stays where it is, the security stays intact, and you get the extracts you need.
The API performs query transformations under the hood – the same ones Copilot uses – so it does better than basic lexical search or a naive RAG implementation you might cobble together yourself.
Instead of building your own retrieval pipeline on top of Microsoft Graph search results, you get to use the one Microsoft already built for Copilot. Same engine, same quality, your app.
Pay-as-you-go without a Copilot license
Here is where it gets interesting. Until now, using any Copilot capability required a Microsoft 365 Copilot add-on license – roughly 30 dollars per user per month. That was a hard sell for custom apps where you only need the grounding layer, not the full Copilot experience.
The pay-as-you-go preview changes this. Users without a Copilot license can now access the Retrieval API at $0.10 per API call, billed through your Azure subscription. There is one prerequisite that might trip you up: your tenant still needs at least one Copilot license. But you do not need one for every user calling the API.
A few things to know about the pay-as-you-go model:
- Pay-as-you-go only covers tenant level sources – SharePoint and Copilot connectors. OneDrive (user level) is not available without a full Copilot license.
- The preview runs until January 31, 2027 or 30 days after GA, whichever comes first. No SLA during preview.
- You turn it on in the Microsoft 365 admin center under Copilot > Billing & usage > Pay-as-you-go. Expect about two hours for propagation.
- Billing is metered through your Azure subscription as “Pay as you go Copilot Credit” under the Microsoft Copilot Studio meter category.
For a custom agent that makes maybe 50-100 retrieval calls per user per month, you are looking at 5 to 10 dollars instead of 30. Way easier to justify than 30 per head.
What you can build with this
Once retrieval is decoupled from per-user Copilot licenses, you can do things that did not make financial sense before.
An HR support agent that answers policy questions by grounding responses in SharePoint-hosted policy documents, for instance. Only the agent makes API calls, not every employee. Or a legal review tool that retrieves relevant contract clauses from your document library, scoped to specific sites with KQL filters. Finance teams pulling context from quarterly reports. Compliance grounding audit responses in internal docs.
You can also combine SharePoint content with data from Copilot connectors – ServiceNow, Confluence, whatever you have indexed – into one retrieval call. One query, multiple sources via batch requests.
Internal developer portals are another good fit. Ground documentation answers in your actual SharePoint-hosted architecture docs and runbooks instead of getting generic responses from an LLM that has never seen your codebase.
Step by step: calling the Retrieval API
Enough background. Here is what an actual implementation looks like.
1. App registration and permissions
Register an app in Microsoft Entra ID (the Azure portal app registrations blade). You need these delegated permissions:
| Data Source | Required Permissions |
|---|---|
| SharePoint | Files.Read.All + Sites.Read.All |
| OneDrive | Files.Read.All + Sites.Read.All |
| Copilot connectors | ExternalItem.Read.All |
Important: the Retrieval API only supports delegated permissions. Application permissions are not supported. The API always runs in the context of a signed-in user, and results are permission trimmed to what that user can access. This is a feature, not a limitation – it is how Microsoft ensures the security model holds.
2. Acquire an access token
Standard OAuth 2.0 authorization code flow since we need delegated permissions. If you are using MSAL:
const { ConfidentialClientApplication } = require('@azure/msal-node');
const msalConfig = {
auth: {
clientId: '{your-client-id}',
authority: 'https://login.microsoftonline.com/{tenant-id}',
clientSecret: '{your-client-secret}'
}
};
const cca = new ConfidentialClientApplication(msalConfig);
const tokenRequest = {
scopes: ['https://graph.microsoft.com/Files.Read.All',
'https://graph.microsoft.com/Sites.Read.All'],
code: authorizationCode, // from the redirect
redirectUri: '{your-redirect-uri}'
};
const response = await cca.acquireTokenByCode(tokenRequest);
const accessToken = response.accessToken;
3. Make the retrieval call
The endpoint lives under the Microsoft Graph namespace. Both v1.0 and beta are available:
POST https://graph.microsoft.com/v1.0/copilot/retrieval
Content-Type: application/json
Authorization: Bearer {access-token}
{
"queryString": "What is our company policy on remote work?",
"dataSource": "sharePoint",
"resourceMetadata": [
"title",
"author"
],
"maximumNumberOfResults": 10
}
That is it. A natural language query, a data source, and optionally which metadata you want back.
4. Parse the response
The response gives you an array of retrieval hits, each containing text extracts with relevance scores:
{
"retrievalHits": [
{
"webUrl": "https://contoso.sharepoint.com/sites/HR/RemoteWorkPolicy.docx",
"extracts": [
{
"text": "Employees may work remotely up to three days per week with manager approval. Remote work arrangements must be documented...",
"relevanceScore": 0.8374
},
{
"text": "All remote workers must use the corporate VPN and comply with the data handling policy outlined in section 4.2.",
"relevanceScore": 0.7465
}
],
"resourceType": "listItem",
"resourceMetadata": {
"title": "Remote Work Policy 2025",
"author": "HR Department"
},
"sensitivityLabel": {
"sensitivityLabelId": "f71f1f74-bf1f-4e6b-b266-c777ea76e2s8",
"displayName": "Confidential",
"priority": 4
}
}
]
}
A few things to notice:
- The
extractsare actual text chunks, not just snippets. Graph search gives you tiny previews. These are long enough to ground an LLM response directly. relevanceScoreis a cosine similarity between your query and the extract, normalized to 0-1.sensitivityLabelcomes back automatically. Your orchestrator can use this to decide what to surface.- Results are unordered. Do not assume the first hit is the best one – send all extracts to your LLM.
5. Scope with KQL filters
You can use KQL expressions to narrow retrieval to specific sites, file types, date ranges, or authors. This is where it gets really useful:
POST https://graph.microsoft.com/v1.0/copilot/retrieval
Content-Type: application/json
Authorization: Bearer {access-token}
{
"queryString": "quarterly revenue targets",
"dataSource": "sharePoint",
"filterExpression": "path:\"https://contoso.sharepoint.com/sites/Finance/\" AND FileExtension:\"pptx\" AND LastModifiedTime>= 2025-07-01",
"resourceMetadata": ["title", "author"],
"maximumNumberOfResults": 15
}
Supported SharePoint filter properties: Author, FileExtension, Filename, FileType, InformationProtectionLabelId, LastModifiedTime, ModifiedBy, Path, SiteID, and Title. You can combine them with AND, OR, and NOT.
One gotcha: if your KQL syntax is invalid, the API silently executes without filtering instead of throwing an error. Double-check your expressions.
Retrieval API vs. Graph Search vs. Copilot Search
This is the question I keep getting. Microsoft now has three different ways to search enterprise content programmatically. Here is how I think about them:
| Aspect | Graph Search API | Copilot Retrieval API | Copilot Search API |
|---|---|---|---|
| Purpose | Document discovery, CRUD | RAG grounding for LLMs | Semantic document search |
| Returns | Metadata + tiny snippets | Substantial text extracts | Documents with relevance |
| Data sources | Broad M365 coverage | SharePoint, OneDrive, connectors | OneDrive (more coming) |
| Query type | KQL, keyword | Natural language | Natural language |
| Permission model | Standard Graph | Delegated only, permission trimmed | Delegated only |
| Optimized for | Finding documents | Context recall for AI | Contextual relevance |
| License | M365 standard | Copilot or pay-as-you-go | Copilot license |
| Status | GA | GA (pay-as-you-go in preview) | Preview |
The short version: Graph Search is for finding documents. The Retrieval API is for feeding content to an LLM. Copilot Search is the newer semantic search specifically for OneDrive, with more data sources coming later.
If you are building a custom AI agent or RAG pipeline, the Retrieval API is almost certainly what you want. It returns enough text to actually ground a response, and it handles the hard parts – chunking, semantic ranking, permission trimming – for you.
Batch requests for multi-source retrieval
Since the API requires you to specify one data source per request, you will often want to query SharePoint and Copilot connectors in parallel. The Graph batch endpoint handles this nicely – up to 20 requests per batch:
POST https://graph.microsoft.com/v1.0/$batch
Content-Type: application/json
Authorization: Bearer {access-token}
{
"requests": [
{
"id": "1",
"method": "POST",
"url": "/copilot/retrieval",
"body": {
"queryString": "What is our data retention policy?",
"dataSource": "sharePoint",
"maximumNumberOfResults": 10
},
"headers": { "Content-Type": "application/json" }
},
{
"id": "2",
"method": "POST",
"url": "/copilot/retrieval",
"body": {
"queryString": "What is our data retention policy?",
"dataSource": "externalItem",
"maximumNumberOfResults": 10
},
"headers": { "Content-Type": "application/json" }
}
]
}
Your orchestrator receives both result sets in a single round trip, merges the extracts, and sends them to the LLM.
Cost math
Let me do some back-of-the-napkin math for the pay-as-you-go model.
Say you build an internal knowledge assistant used by 200 employees. Each employee asks an average of 5 questions per day, and each question triggers 2 retrieval calls (one to SharePoint, one to Copilot connectors via batch). That is 200 x 5 x 2 = 2,000 API calls per day, or roughly 44,000 per month.
At $0.10 per call, that is $4,400 per month.
Compare that to licensing all 200 users with Copilot at $30/user/month: $6,000 per month. And that is just for the retrieval piece – with Copilot licenses, users would get the full Copilot experience, which might be more than you need.
The break even depends entirely on your usage pattern. For light use (a few calls per user per day), pay-as-you-go wins. For apps that hammer the API constantly, per-user licenses might actually be cheaper.
Watch out for the 200 requests per user per hour throttle limit. For burst heavy workloads, you will need to design around that.
Limitations
Before you go all in, here are the practical constraints:
- 1,500 characters max for
queryString. - Maximum 25 results per call.
- One data source per request. No interleaved results. Use batch requests to work around this.
- File size limits: 512 MB for .docx, .pptx, .pdf files; 150 MB for everything else.
- Semantic retrieval only works for .doc, .docx, .pptx, .pdf, .aspx, and .one files. Other extensions fall back to lexical retrieval.
- No image or chart content. Only text is retrieved. Tables in .doc, .docx, and .pptx are supported.
- Global service only. No US Government or China (21Vianet) deployments yet.
- Pay-as-you-go does not include OneDrive access. SharePoint and Copilot connectors only.
How it fits together
The typical integration flow looks like this:
- User asks a question in your app.
- Your orchestrator sends the query to the Retrieval API (potentially a batch request across multiple data sources).
- The API returns permission trimmed text extracts.
- Your orchestrator sends those extracts as context to your LLM (Azure OpenAI, or whatever you are using).
- The LLM generates a grounded response.
- Your app displays the answer with source citations (using the
webUrlfrom each hit).
This works with Azure AI Foundry, Semantic Kernel, LangChain, or any custom orchestration. The Retrieval API is just a REST endpoint – it does not care what is on the other side.
For Microsoft Foundry users, the SharePoint tool already uses the Retrieval API under the hood. If you are building agents in Foundry, pay-as-you-go users get this automatically.
What I think this means
The pricing model is what matters most here, not the API itself. Having Microsoft handle chunking, ranking, and permissions is a nice time saver, sure. But the per-user Copilot license requirement was what kept most custom app scenarios off the table. Removing that barrier – well, mostly removing it, you still need one Copilot license in the tenant – is what makes this actually usable.
I expect the pay-as-you-go model to expand to other Copilot APIs over time. And as the Copilot Search API adds SharePoint and connector support, the line between retrieval and search will get blurry.
If you have been building RAG pipelines with Graph search and stitching together snippets that were never designed for LLM grounding – give this a look. Same data, better extracts, proper security, and pricing that does not require a spreadsheet to justify.
Try it in Graph Explorer. Five minutes and you will see what I mean.
Read more
- Microsoft DevBlog: Unlocking Enterprise Knowledge for AI with the Retrieval API – the official announcement by Zakiullah Khan with use cases and data source overview
- Microsoft Learn: Copilot Retrieval API reference – full API documentation including request/response schemas and permissions
- Microsoft Learn: Pay-as-you-go billing for Copilot APIs – setup instructions and pricing details for the consumption model
- Microsoft Learn: Copilot APIs overview – the broader set of Copilot APIs beyond Retrieval
Enjoyed this post? Let's connect on LinkedIn:
Follow on LinkedIn →

