The Copilot Retrieval API -- Accessing Enterprise Knowledge Without a Copilot License

The Copilot Retrieval API -- Accessing Enterprise Knowledge Without a Copilot License

The Copilot Retrieval API – Accessing enterprise knowledge without a Copilot license

I have been waiting for something like this. Not because I needed another API to learn, but because building RAG on top of Microsoft 365 data has always meant either paying for full Copilot licenses or rolling your own retrieval stack. Neither option was great.

Microsoft has been opening up the components that power Microsoft 365 Copilot as standalone APIs. The first wave landed at Build 2025 – Retrieval, Interactions Export, Change Notifications, Meeting Insights, and Chat. The Retrieval API is the one I keep coming back to, because it solves a problem that has been bothering me for a while: how do you ground your custom AI solutions with enterprise knowledge without rebuilding the entire search and indexing stack yourself?

And here is the kicker – you can now use it without a Copilot add-on license. Pay-as-you-go. Ten cents per API call. That changes the math for a lot of custom app scenarios.

What the Retrieval API actually does

Let me be blunt about what this is. The Retrieval API gives you programmatic access to the same hybrid index that powers Microsoft 365 Copilot. You send it a natural language query, and it returns relevant text chunks from SharePoint, OneDrive, and Copilot connectors – permission trimmed, with sensitivity labels, ready to feed into your LLM.

This is RAG without the infrastructure. You do not need to stand up a vector database, build a chunking pipeline, replicate the permissions model, or duplicate data. The content stays where it is, the security stays intact, and you get the extracts you need.

The API performs query transformations under the hood – the same ones Copilot uses – so it does better than basic lexical search or a naive RAG implementation you might cobble together yourself.

Instead of building your own retrieval pipeline on top of Microsoft Graph search results, you get to use the one Microsoft already built for Copilot. Same engine, same quality, your app.

Pay-as-you-go without a Copilot license

Here is where it gets interesting. Until now, using any Copilot capability required a Microsoft 365 Copilot add-on license – roughly 30 dollars per user per month. That was a hard sell for custom apps where you only need the grounding layer, not the full Copilot experience.

The pay-as-you-go preview changes this. Users without a Copilot license can now access the Retrieval API at $0.10 per API call, billed through your Azure subscription. There is one prerequisite that might trip you up: your tenant still needs at least one Copilot license. But you do not need one for every user calling the API.

A few things to know about the pay-as-you-go model:

  • Pay-as-you-go only covers tenant level sources – SharePoint and Copilot connectors. OneDrive (user level) is not available without a full Copilot license.
  • The preview runs until January 31, 2027 or 30 days after GA, whichever comes first. No SLA during preview.
  • You turn it on in the Microsoft 365 admin center under Copilot > Billing & usage > Pay-as-you-go. Expect about two hours for propagation.
  • Billing is metered through your Azure subscription as “Pay as you go Copilot Credit” under the Microsoft Copilot Studio meter category.

For a custom agent that makes maybe 50-100 retrieval calls per user per month, you are looking at 5 to 10 dollars instead of 30. Way easier to justify than 30 per head.

What you can build with this

Once retrieval is decoupled from per-user Copilot licenses, you can do things that did not make financial sense before.

An HR support agent that answers policy questions by grounding responses in SharePoint-hosted policy documents, for instance. Only the agent makes API calls, not every employee. Or a legal review tool that retrieves relevant contract clauses from your document library, scoped to specific sites with KQL filters. Finance teams pulling context from quarterly reports. Compliance grounding audit responses in internal docs.

You can also combine SharePoint content with data from Copilot connectors – ServiceNow, Confluence, whatever you have indexed – into one retrieval call. One query, multiple sources via batch requests.

Internal developer portals are another good fit. Ground documentation answers in your actual SharePoint-hosted architecture docs and runbooks instead of getting generic responses from an LLM that has never seen your codebase.

Step by step: calling the Retrieval API

Enough background. Here is what an actual implementation looks like.

1. App registration and permissions

Register an app in Microsoft Entra ID (the Azure portal app registrations blade). You need these delegated permissions:

Data SourceRequired Permissions
SharePointFiles.Read.All + Sites.Read.All
OneDriveFiles.Read.All + Sites.Read.All
Copilot connectorsExternalItem.Read.All

Important: the Retrieval API only supports delegated permissions. Application permissions are not supported. The API always runs in the context of a signed-in user, and results are permission trimmed to what that user can access. This is a feature, not a limitation – it is how Microsoft ensures the security model holds.

2. Acquire an access token

Standard OAuth 2.0 authorization code flow since we need delegated permissions. If you are using MSAL:

const { ConfidentialClientApplication } = require('@azure/msal-node');

const msalConfig = {
  auth: {
    clientId: '{your-client-id}',
    authority: 'https://login.microsoftonline.com/{tenant-id}',
    clientSecret: '{your-client-secret}'
  }
};

const cca = new ConfidentialClientApplication(msalConfig);

const tokenRequest = {
  scopes: ['https://graph.microsoft.com/Files.Read.All',
           'https://graph.microsoft.com/Sites.Read.All'],
  code: authorizationCode, // from the redirect
  redirectUri: '{your-redirect-uri}'
};

const response = await cca.acquireTokenByCode(tokenRequest);
const accessToken = response.accessToken;

3. Make the retrieval call

The endpoint lives under the Microsoft Graph namespace. Both v1.0 and beta are available:

POST https://graph.microsoft.com/v1.0/copilot/retrieval
Content-Type: application/json
Authorization: Bearer {access-token}

{
  "queryString": "What is our company policy on remote work?",
  "dataSource": "sharePoint",
  "resourceMetadata": [
    "title",
    "author"
  ],
  "maximumNumberOfResults": 10
}

That is it. A natural language query, a data source, and optionally which metadata you want back.

4. Parse the response

The response gives you an array of retrieval hits, each containing text extracts with relevance scores:

{
  "retrievalHits": [
    {
      "webUrl": "https://contoso.sharepoint.com/sites/HR/RemoteWorkPolicy.docx",
      "extracts": [
        {
          "text": "Employees may work remotely up to three days per week with manager approval. Remote work arrangements must be documented...",
          "relevanceScore": 0.8374
        },
        {
          "text": "All remote workers must use the corporate VPN and comply with the data handling policy outlined in section 4.2.",
          "relevanceScore": 0.7465
        }
      ],
      "resourceType": "listItem",
      "resourceMetadata": {
        "title": "Remote Work Policy 2025",
        "author": "HR Department"
      },
      "sensitivityLabel": {
        "sensitivityLabelId": "f71f1f74-bf1f-4e6b-b266-c777ea76e2s8",
        "displayName": "Confidential",
        "priority": 4
      }
    }
  ]
}

A few things to notice:

  • The extracts are actual text chunks, not just snippets. Graph search gives you tiny previews. These are long enough to ground an LLM response directly.
  • relevanceScore is a cosine similarity between your query and the extract, normalized to 0-1.
  • sensitivityLabel comes back automatically. Your orchestrator can use this to decide what to surface.
  • Results are unordered. Do not assume the first hit is the best one – send all extracts to your LLM.

5. Scope with KQL filters

You can use KQL expressions to narrow retrieval to specific sites, file types, date ranges, or authors. This is where it gets really useful:

POST https://graph.microsoft.com/v1.0/copilot/retrieval
Content-Type: application/json
Authorization: Bearer {access-token}

{
  "queryString": "quarterly revenue targets",
  "dataSource": "sharePoint",
  "filterExpression": "path:\"https://contoso.sharepoint.com/sites/Finance/\" AND FileExtension:\"pptx\" AND LastModifiedTime>= 2025-07-01",
  "resourceMetadata": ["title", "author"],
  "maximumNumberOfResults": 15
}

Supported SharePoint filter properties: Author, FileExtension, Filename, FileType, InformationProtectionLabelId, LastModifiedTime, ModifiedBy, Path, SiteID, and Title. You can combine them with AND, OR, and NOT.

One gotcha: if your KQL syntax is invalid, the API silently executes without filtering instead of throwing an error. Double-check your expressions.

This is the question I keep getting. Microsoft now has three different ways to search enterprise content programmatically. Here is how I think about them:

AspectGraph Search APICopilot Retrieval APICopilot Search API
PurposeDocument discovery, CRUDRAG grounding for LLMsSemantic document search
ReturnsMetadata + tiny snippetsSubstantial text extractsDocuments with relevance
Data sourcesBroad M365 coverageSharePoint, OneDrive, connectorsOneDrive (more coming)
Query typeKQL, keywordNatural languageNatural language
Permission modelStandard GraphDelegated only, permission trimmedDelegated only
Optimized forFinding documentsContext recall for AIContextual relevance
LicenseM365 standardCopilot or pay-as-you-goCopilot license
StatusGAGA (pay-as-you-go in preview)Preview

The short version: Graph Search is for finding documents. The Retrieval API is for feeding content to an LLM. Copilot Search is the newer semantic search specifically for OneDrive, with more data sources coming later.

If you are building a custom AI agent or RAG pipeline, the Retrieval API is almost certainly what you want. It returns enough text to actually ground a response, and it handles the hard parts – chunking, semantic ranking, permission trimming – for you.

Batch requests for multi-source retrieval

Since the API requires you to specify one data source per request, you will often want to query SharePoint and Copilot connectors in parallel. The Graph batch endpoint handles this nicely – up to 20 requests per batch:

POST https://graph.microsoft.com/v1.0/$batch
Content-Type: application/json
Authorization: Bearer {access-token}

{
  "requests": [
    {
      "id": "1",
      "method": "POST",
      "url": "/copilot/retrieval",
      "body": {
        "queryString": "What is our data retention policy?",
        "dataSource": "sharePoint",
        "maximumNumberOfResults": 10
      },
      "headers": { "Content-Type": "application/json" }
    },
    {
      "id": "2",
      "method": "POST",
      "url": "/copilot/retrieval",
      "body": {
        "queryString": "What is our data retention policy?",
        "dataSource": "externalItem",
        "maximumNumberOfResults": 10
      },
      "headers": { "Content-Type": "application/json" }
    }
  ]
}

Your orchestrator receives both result sets in a single round trip, merges the extracts, and sends them to the LLM.

Cost math

Let me do some back-of-the-napkin math for the pay-as-you-go model.

Say you build an internal knowledge assistant used by 200 employees. Each employee asks an average of 5 questions per day, and each question triggers 2 retrieval calls (one to SharePoint, one to Copilot connectors via batch). That is 200 x 5 x 2 = 2,000 API calls per day, or roughly 44,000 per month.

At $0.10 per call, that is $4,400 per month.

Compare that to licensing all 200 users with Copilot at $30/user/month: $6,000 per month. And that is just for the retrieval piece – with Copilot licenses, users would get the full Copilot experience, which might be more than you need.

The break even depends entirely on your usage pattern. For light use (a few calls per user per day), pay-as-you-go wins. For apps that hammer the API constantly, per-user licenses might actually be cheaper.

Watch out for the 200 requests per user per hour throttle limit. For burst heavy workloads, you will need to design around that.

Limitations

Before you go all in, here are the practical constraints:

  • 1,500 characters max for queryString.
  • Maximum 25 results per call.
  • One data source per request. No interleaved results. Use batch requests to work around this.
  • File size limits: 512 MB for .docx, .pptx, .pdf files; 150 MB for everything else.
  • Semantic retrieval only works for .doc, .docx, .pptx, .pdf, .aspx, and .one files. Other extensions fall back to lexical retrieval.
  • No image or chart content. Only text is retrieved. Tables in .doc, .docx, and .pptx are supported.
  • Global service only. No US Government or China (21Vianet) deployments yet.
  • Pay-as-you-go does not include OneDrive access. SharePoint and Copilot connectors only.

How it fits together

The typical integration flow looks like this:

  1. User asks a question in your app.
  2. Your orchestrator sends the query to the Retrieval API (potentially a batch request across multiple data sources).
  3. The API returns permission trimmed text extracts.
  4. Your orchestrator sends those extracts as context to your LLM (Azure OpenAI, or whatever you are using).
  5. The LLM generates a grounded response.
  6. Your app displays the answer with source citations (using the webUrl from each hit).

This works with Azure AI Foundry, Semantic Kernel, LangChain, or any custom orchestration. The Retrieval API is just a REST endpoint – it does not care what is on the other side.

For Microsoft Foundry users, the SharePoint tool already uses the Retrieval API under the hood. If you are building agents in Foundry, pay-as-you-go users get this automatically.

What I think this means

The pricing model is what matters most here, not the API itself. Having Microsoft handle chunking, ranking, and permissions is a nice time saver, sure. But the per-user Copilot license requirement was what kept most custom app scenarios off the table. Removing that barrier – well, mostly removing it, you still need one Copilot license in the tenant – is what makes this actually usable.

I expect the pay-as-you-go model to expand to other Copilot APIs over time. And as the Copilot Search API adds SharePoint and connector support, the line between retrieval and search will get blurry.

If you have been building RAG pipelines with Graph search and stitching together snippets that were never designed for LLM grounding – give this a look. Same data, better extracts, proper security, and pricing that does not require a spreadsheet to justify.

Try it in Graph Explorer. Five minutes and you will see what I mean.

Read more

Enjoyed this post? Let's connect on LinkedIn:

Follow on LinkedIn →