maledias

Fine-Grained Tool Authorization for AI Agents

2026-03-02T00:00:00+00:00

AI agents have tools. Which tools any given user should be able to invoke — and under what conditions — is an authorization problem. It’s one many agent developers don’t encounter until they’re deep in implementation, and one the industry has been solving for decades in traditional software.

This post applies those solutions to agents.

There are no implementation tutorials here. What follows is conceptual: a grounding in the authorization models — RBAC, ABAC, and ReBAC — and how each applies to agent tool authorization. Along the way, we’ll cover where enforcement happens in an agent architecture and the mechanisms available for implementing it: OAuth scopes, pre-dispatch middleware, and policy engines such as OPA, Cedar, and Cerbos. The goal is a clear mental model — what each approach does, when it’s the right choice, and how the pieces fit together.

If you haven’t had to think about this yet, this post is for you too. Many agent developers ship their first version without tool-level authorization and reach for these patterns only when a compliance requirement, a security concern, or a product decision forces the question. The concepts are easier to apply when you’ve seen the full map first.

Fine-Grained Tool Authorization for AI Agents

1. The problem hiding in plain sight

1.1 The agent and its tools

Agents have access to tools. But which tools an agent should be able to invoke depends on who is talking with it — not every user should have access to every capability.

To make this concrete, imagine you’re building an internal operations assistant for your company. It can search documentation, file tickets, approve expenses, pull financial reports, query databases, and manage user accounts. Useful across the whole organization — but not every one of those capabilities should be available to every user. An employee searching docs is fine. That same employee approving expenses or running arbitrary database queries is not.

Throughout this post, we’ll use this assistant as a running example. It has eight tools that span a natural sensitivity spectrum:

search_docs, create_ticket, send_notification
approve_expense, view_financials, export_report
query_database, manage_users

Three types of users will interact with it: employees, managers, and admins — each with a different scope of responsibility. Who should be able to invoke which tools, and how do you enforce that? That’s what this post is about.

1.2 Meet the users

The three user types aren’t arbitrary. They reflect real differences in responsibility and trust.

Employees are the general user base. They use the assistant for day-to-day tasks: finding information, filing tickets, sending notifications. Nothing they do carries significant risk.

Managers can do everything employees can, plus actions that carry real-world consequences: approving expenses, accessing financial data, exporting reports. These aren’t tools you want every employee to have by default.

Admins have the broadest access. On top of what managers can do, they can query internal databases and manage user accounts — capabilities that, if misused, could affect the whole system.

graph TD
    subgraph Admin
        subgraph Manager
            subgraph Employee
                sd[search_docs]
                ct[create_ticket]
                sn[send_notification]
            end
            ae[approve_expense]
            vf[view_financials]
            er[export_report]
        end
        qd[query_database]
        mu[manage_users]
    end

Each level inherits the tools of the one below it and adds its own. It’s a simple hierarchy, but it captures something important: access should reflect responsibility, and different users genuinely need different things from the same agent.

1.3 The gap

The hierarchy we just described is what you’d want. The problem is that it’s not what you get by default.

When you give an agent a set of tools, every user who interacts with it can invoke every tool. There’s no built-in enforcement. The agent doesn’t know that employees shouldn’t be approving expenses, or that database queries should be restricted to admins. It just has tools, and it will use them for whoever asks.

graph LR
    E([Employee])
    M([Manager])
    A([Admin])
    Agent[[Agent]]

    E & M & A --> Agent

    Agent --> sd[search_docs]
    Agent --> ct[create_ticket]
    Agent --> sn[send_notification]
    Agent --> ae[approve_expense]
    Agent --> vf[view_financials]
    Agent --> er[export_report]
    Agent --> qd[query_database]
    Agent --> mu[manage_users]

All three users. Same agent. Same tools. No differentiation.

Closing the gap requires answering two questions — and the order matters.

The first question is non-negotiable: which tools is this user allowed to invoke? This is answered before the agent starts, and the answer determines exactly which tools the agent is given. If a user can’t access a tool, the agent shouldn’t know the tool exists. Checking permissions at call time and returning an error is not a substitute — it still means the agent has the tool, can reason about it, and can attempt to use it. The access boundary must be set at instantiation, not enforced reactively.

The second question is additive: with these specific parameters, is this particular invocation allowed? This governs fine-grained enforcement within the tools the user already has access to — conditions like approval thresholds, time windows, or resource ownership that go beyond whether the tool is available at all.

The rest of this post is about how to answer both.

2. Why this is worth solving

2.1 Compliance

In regulated industries, access controls aren’t a best practice — they’re an audit requirement. Frameworks like SOC 2, HIPAA, and GDPR share a common thread: users should only be able to access what their role justifies, and every access event should be traceable to a specific person.

Agents complicate both of these.

SOC 2 requires demonstrating that sensitive data is accessed only by authorized personnel. If your ops assistant lets any employee call view_financials or export_report, you can’t make that demonstration — regardless of what your role system looks like elsewhere in your application.

HIPAA is more explicit. Its “minimum necessary” standard requires systems to limit access to the information strictly needed for a given task. An agent with a flat tool set has no concept of minimum necessary. It will use whatever tools seem helpful.

GDPR’s data minimization principle follows the same logic. An agent that can access more data than the invoking user is entitled to violates the spirit of the regulation, even if it isn’t actively misused.

The operational risk compounds all of this. When an agent acts on a user’s behalf, audit logs tend to record the agent as the actor — not the user who initiated the conversation. Without tool-level authorization tied to user identity, it becomes difficult to reconstruct who triggered what, which is precisely the question auditors ask.

Tool-level authorization isn’t just about preventing misuse. In regulated contexts, it’s what makes the system auditable at all.

2.2 Business tiers and feature gates

Not every reason to limit tool access is about compliance or security. Sometimes it’s purely a product decision.

If you’re building an agent as a feature of a SaaS product, the tools the agent can invoke are a direct expression of what each customer tier is paying for. A free user gets search_docs. A Pro customer gets export_report. An Enterprise customer gets the full set. The access model isn’t enforcing security — it’s enforcing the product.

The same pattern applies to internal tools. An organization might roll out a powerful capability like query_database gradually, starting with one team before expanding access. Or it might restrict tools that trigger expensive operations — certain capabilities carry real infrastructure costs — to the users whose work justifies them.

In all of these cases, tool-level authorization is doing the same job: making sure the right users have access to the right capabilities, for reasons that have nothing to do with threats or regulations. It’s product design, enforced at the agent layer.

2.3 Security and least privilege

The principle of least privilege has been a cornerstone of security for decades: give a system only the permissions it needs to do its job, and nothing more. If something goes wrong, the damage is bounded by what the system can actually do.

Agents need this more than most.

Unlike traditional software, agents interpret natural language. That means the boundary between intended and unintended behavior is fuzzier. An agent can be manipulated through its inputs — a technique called prompt injection — into taking actions its operator never intended. It can also make reasoning mistakes that lead it to invoke tools in ways the developer didn’t anticipate.

In both cases, the blast radius is determined by which tools the agent has access to. An agent that can only call search_docs and create_ticket can’t do much damage if something goes wrong. An agent that also has access to manage_users and query_database is a different story.

This is why authorization for agents has to be an enforcement problem, not a reasoning problem. The agent’s tool set must be defined by deterministic code that reads the user’s permissions — not by the agent reasoning about what it should or shouldn’t do. You can’t rely on a model to implement a security boundary. That boundary belongs in the authorization layer, before the agent is ever instantiated.

3. The challenge agents introduce

3.1 How traditional apps enforce authorization

Before getting into what makes agents different, it helps to understand how authorization works in a traditional application — because the patterns are well established and agents can learn from them.

In a typical web application, authorization operates at two levels. The first is the UI: the application knows who is logged in, and it renders only the actions that user is allowed to take. Buttons are disabled. Menu items are hidden. An employee using the ops assistant we described earlier would never see an “Approve Expense” button — it simply isn’t there for them.

The second level is the backend. The server independently validates permissions before executing any action. This isn’t redundancy for its own sake — it ensures that authorization is enforced consistently as a proper second layer of defense.

graph LR
    U([User]) --> UI
    UI -->|"renders permitted actions"| View[Permitted UI]
    View -->|"triggers action"| API
    API -->|"validates permission"| Auth[Authorization Layer]
    Auth -->|allow / deny| Action[Action]

The specific mechanism that powers these checks — role-based rules, attribute conditions, policy engines — is something we’ll cover in detail later. But regardless of implementation, the structure is consistent: authorization shapes both what the user sees and what they can actually do.

The same two-layer structure applies to agents. The first layer determines which tools the agent is instantiated with — the equivalent of the UI rendering only permitted actions. The second layer handles fine-grained enforcement at invocation time: not whether the user can access the tool (that was already decided at instantiation), but whether this specific invocation with these specific parameters is permitted. Getting both right is what tool-level authorization for agents is about.

3.2 What changes with agents

In a traditional application, the user’s intent is explicit. Clicking “Approve Expense” means exactly one thing: invoke the approve-expense endpoint. The action is discrete, the mapping is direct, and the authorization check is straightforward.

With an agent, none of that is guaranteed.

The user sends a message in natural language: “Can you take care of the pending expenses from last week?” The agent interprets that message, decides what it means, and determines which tools to call to fulfill it. It might call view_financials to look up pending expenses, then approve_expense for each one. Or it might do something slightly different depending on how it reasons about the request. The developer didn’t specify the action — the agent inferred it.

graph LR
    U([User]) -->|"natural language intent"| Agent
    Agent -->|"interprets and decides"| Tools
    Tools --> t1[view_financials]
    Tools --> t2[approve_expense]
    Tools --> t3[...]

This indirection is what makes authorization harder. In the traditional model, you authorize a specific action the user explicitly asked to perform. In the agent model, you authorize a tool set — a range of capabilities the agent might decide to invoke on the user’s behalf, based on its own interpretation of their intent.

The user is no longer making discrete requests. They’re delegating to the agent, which means the agent’s capabilities become the effective scope of what that user can do. And if those capabilities aren’t scoped to the user’s permissions, the agent can do far more than the user should be allowed to.

3.3 The delegation problem

Here is the crux of it. When a user talks to an agent, they are delegating — handing off the execution of their intent to a system that will act on their behalf. The question is: with whose permissions?

In most implementations, the answer is the agent’s own. The agent runs with a service account, API keys, or a token provisioned by the developer. When it calls a tool, it authenticates with those credentials — not the user’s. The tool has no inherent knowledge of who initiated the conversation.

This creates a direct mismatch. The user might be an employee who isn’t allowed to approve expenses. But if the agent has credentials that permit approve_expense, and the user asks it to handle last week’s pending approvals, the agent will do it — successfully.

graph TD
    U([Employee User]) -->|"Can you handle last week's expenses?"| Agent
    Agent -->|"invokes"| T[approve_expense]
    U -. "not authorized to approve expenses" .-> T

The user didn’t do anything wrong. The agent didn’t malfunction. The system worked exactly as designed — and that is the problem.

Solving this requires making the user’s identity and permissions a first-class part of the agent’s execution context. The agent needs to know not just what tools exist, but which tools are available for the specific user it’s serving right now. That’s what tool-level authorization is about — and as we’ve seen in traditional applications, the industry has been solving this class of problem for a long time.

4. This problem is not new

4.1 A brief history

Authorization isn’t a new problem. Long before AI agents existed, engineers were wrestling with the same fundamental question: who should be able to do what, and how do you enforce it at scale?

The earliest solutions were access control lists — explicit tables mapping users to permissions. They worked for small systems but became unmanageable fast. In the 1990s, Role-Based Access Control (RBAC) emerged as a cleaner answer: instead of assigning permissions directly to users, you assign them to roles, and users inherit permissions through their role. More auditable, easier to reason about, and it scaled.

But RBAC had limits. It couldn’t express conditional access — things like “managers can approve expenses, but only under $1,000.” Attribute-Based Access Control (ABAC) addressed this by factoring in context: who the user is, what resource they’re accessing, and the circumstances under which the request is being made.

More recently, Relationship-Based Access Control (ReBAC) emerged for collaborative, graph-structured access patterns — “this user owns that document, and owners can share with editors.” Google published the Zanzibar paper in 2019, describing the system that powers authorization across Drive, YouTube, and other products, and the model has since influenced a generation of authorization systems.

Each model was built to solve the problems the previous one couldn’t handle. Together, they form a toolkit that agents can draw directly from — without reinventing anything.

4.2 The key reframe

The mental model that makes all of this tractable is simple.

Traditional access control asks: can this subject perform this action on this resource? Can Alice read this document? Can Bob delete this record? Can this service write to this bucket?

For agent tools, the same structure applies directly:

Subject: the user talking to the agent
Action: invoking the tool
Resource: the tool itself

Can this employee invoke approve_expense? Can this manager invoke export_report? Can this admin invoke manage_users?

But tool invocations don’t happen in a vacuum — they come with parameters. And parameters matter. A manager might be authorized to invoke approve_expense, but only when the amount is below a certain threshold. The tool is permitted; certain invocations of it are not. So the authorization question has two layers:

Can this user invoke this tool at all?
With these specific parameters, is this particular invocation allowed?

Tools are resources. Invocations are actions. Parameters are part of the context. The user is the subject. Once you see it that way, the models in the next section apply directly — no new mental model required.

4.3 A map of the models

The models we’ll cover next all answer the same subject/action/resource question, but from different angles — and each is better suited to certain kinds of problems.

Role-based (RBAC): users are assigned roles, roles carry specific permissions. The most widely used model and the right starting point for most agent tool authorization.
Attribute-based (ABAC): decisions factor in attributes of the user, the resource, and the environment. Expressive enough to handle conditions like amount thresholds or time windows — things RBAC can’t express.
Relationship-based (ReBAC): authorization derives from a graph of relationships between entities. Well-suited for delegation and resource-specific access patterns.

graph TD
    R[RBAC]
    A[ABAC]
    Re[ReBAC]
    H([Hybrid])

    R -->|"+ context"| A
    R -->|"+ relationships"| Re
    A --> H
    Re --> H

These models aren’t used in isolation — they operate at different points in the flow. RBAC is what you query at instantiation time to determine the agent’s tool set. ABAC and ReBAC come in at invocation time, enforcing fine-grained conditions on how those tools are used. We’ll see how they work together in the hybrid section. For now, let’s go through each model in turn.

5. The authorization models

5.1 Role-based (RBAC)

In RBAC, permissions are attached to roles, and users are assigned to roles. A role is an abstract description of a job function — what someone in that position is responsible for doing. It exists independently of any specific user.

This separation matters. You define what the Manager role can do once, as a standalone policy. Then separately, you assign users to that role. Updating the role’s permissions — adding a new tool, removing an old one — doesn’t require touching user assignments. And assigning a new manager to the system doesn’t require duplicating any policy logic. The two concerns are cleanly decoupled.

Roles also support hierarchy. Rather than granting a manager access to employee tools by assigning them to multiple buckets, you define that Manager inherits everything Employee can do and adds its own permissions on top:

graph TD
    subgraph "Role definitions"
        AR[Admin] -->|inherits| MR[Manager]
        MR -->|inherits| ER[Employee]
    end

    Alice([Alice]) --> ER
    Bob([Bob]) --> MR
    Carol([Carol]) --> AR

    ER --- sd[search_docs]
    ER --- ct[create_ticket]
    ER --- sn[send_notification]
    MR --- ae[approve_expense]
    MR --- vf[view_financials]
    MR --- er[export_report]
    AR --- qd[query_database]
    AR --- mu[manage_users]

Each role defines only its own permissions. The inherited ones come from the roles below it. A Manager can invoke approve_expense because the Manager role grants it — and also search_docs because the Employee role grants it, and Manager inherits from Employee. Carol, as an Admin, gets everything.

For agent tool authorization, RBAC maps cleanly. You define a role per user type, assign tools to each role, enforce role hierarchy, and at agent instantiation time you query which tools the user’s role permits. The agent is built with exactly that set.

RBAC is the right starting point for most systems. The policies are readable, the model is easy to audit — you can always answer “what can a Manager do?” with a direct lookup — and it covers the majority of tool authorization cases.

Where it runs into a wall is conditions. “Managers can invoke approve_expense” is expressible. “Managers can invoke approve_expense, but only when the amount parameter is below $1,000” is not — not in pure RBAC. Some teams try to work around this by creating more granular roles: junior_manager, senior_manager, each with different tool sets. But that path leads to role explosion: a proliferating set of roles that becomes hard to manage and harder to reason about.

When you start needing conditions, that’s the signal to reach for the next model.

5.2 Attribute-based (ABAC)

RBAC answers the question “what role does this user have?” ABAC asks something richer: given everything we know about this user, this tool, these parameters, and the current context — should this invocation be allowed?

Where RBAC expresses permission as membership, ABAC expresses it as a policy evaluated against attributes across multiple dimensions:

User attributes: role, department, clearance level, subscription tier
Tool and parameter attributes: which tool is being invoked, what values are being passed
Environment attributes: time of day, network location, session context

graph TD
    UA["User attributes
role: Manager
department: Finance"] --> D{Policy evaluation}
    PA["Parameter attributes
tool: approve_expense
amount: $800"] --> D
    EA["Environment attributes
time: 10:30 AM
day: Tuesday"] --> D
    D -->|allow| Allow[Invoke tool]
    D -->|deny| Deny[Blocked]

This is what lets ABAC express the condition RBAC couldn’t. “Managers can approve expenses, but only under $1,000” becomes a policy rule evaluated at invocation time:

allow if:
  user.role == "Manager"
  AND tool == "approve_expense"
  AND params.amount < 1000

For the ops assistant, ABAC unlocks a range of controls that RBAC alone can’t handle:

approve_expense is available to managers, but only for amounts within their approval limit
query_database is available to admins, but only during business hours
export_report is available to managers in the finance department, not operations

Each of these depends on context — and context is exactly what ABAC is designed to evaluate.

ABAC also addresses the second layer from section 4.2: not just “can this user invoke this tool?” but “can this user invoke this tool with these parameters?” That distinction matters as soon as any of your tools take inputs that carry their own sensitivity.

The trade-off is auditability. With RBAC, “what can a Manager do?” is a direct lookup. With ABAC, that same question requires evaluating every policy rule against every possible combination of attribute values — which is computationally and conceptually harder. ABAC policies can also grow complex over time: a rule that starts simple can accumulate conditions until it’s difficult to inspect, test, or explain to an auditor.

A common pattern is to layer ABAC on top of RBAC: roles establish which tools are in scope for a given user, and attribute checks refine which specific invocations are permitted within that scope. But this isn’t a universal prescription — simpler systems often use RBAC alone, and as we’ll see, some access patterns are better expressed with an entirely different model.

5.3 Relationship-based (ReBAC)

RBAC asks: what role does this user have? ABAC asks: what are the attributes of this user, this tool, and this context? ReBAC asks a different question entirely: what is the relationship between this user and this specific resource?

The distinction matters more than it might seem. Consider the approve_expense tool. Under RBAC, the policy is: “Managers can approve expenses” — meaning any manager can approve any expense. Under ReBAC, the policy becomes: “A user can approve expenses submitted by people they directly manage.” The tool is the same. The role is the same. But the authorization is now tied to a specific relationship in the organizational graph.

graph LR
    Alice(["Alice
Manager"]) -->|manages| Bob(["Bob
Employee"])
    Bob -->|submitted| E42[Expense #42]
    Alice -.->|"can approve"| E42

    Carol(["Carol
Manager"]) -->|manages| Dave(["Dave
Employee"])
    Dave -->|submitted| E99[Expense #99]
    Carol -.->|"can approve"| E99

    Alice -. "cannot approve" .-> E99

Alice manages Bob, who submitted Expense #42 — so Alice can approve it. Carol manages Dave, who submitted Expense #99 — so Carol can approve it. Alice cannot approve Expense #99, because she has no management relationship to Dave, even though they share the same role.

This is something neither RBAC nor ABAC can express cleanly. RBAC doesn’t know about the specific relationship between Alice and Bob. ABAC could approximate it with a direct_report_ids attribute on the user, but that attribute would need to be kept in sync with the org structure and injected into every authorization decision — fragile and hard to maintain. ReBAC makes the relationship itself the authorization primitive.

Authorization decisions in ReBAC work by checking whether a path exists in the relationship graph from the user to the resource through a chain of authorized relationship types. “Can Alice approve Expense #42?” becomes: does a path exist from Alice to Expense #42 through manages → submitted? If yes, allow. If no, deny.

One property this enables that RBAC and ABAC cannot match: atomic revocation. When Bob moves to a different team and the manages relationship between Alice and Bob is removed, Alice immediately loses the ability to approve Bob’s future expenses. There’s no role to update, no attribute to recalculate, no permission to explicitly revoke. Removing the relationship is sufficient — all access derived from it disappears in the same operation.

The trade-off is relationship management overhead. Every authorization-relevant relationship between users and resources needs to be stored and kept current. In an ops assistant with a small team, this is straightforward. At the scale of Google Drive — billions of documents, millions of users, continuous changes — it requires specialized infrastructure, which is exactly what Zanzibar was built to provide.

For most agent tool authorization scenarios, ReBAC is the right model when your access decisions are fundamentally about who has what relationship to whom or to what — approval hierarchies, team ownership, resource-specific delegation. If your authorization logic says “any manager can do X,” that’s RBAC. If it says “this manager can do X for these specific resources because of how they’re related to them,” that’s ReBAC.

5.4 Hybrid models

Before going further, there is one principle that sits above all model choices: an agent must never have access to a tool it shouldn’t call. This isn’t a preference — it’s a security requirement. The tool set the agent is instantiated with must be determined by deterministic code that reads the user’s permissions from the authorization layer. Not approximated. Not filtered by the agent after the fact. Defined upfront, before the agent starts.

With that established, the three models we’ve covered aren’t alternatives — they address different aspects of the same authorization problem, and they operate at different points in the agent’s lifecycle.

RBAC answers the instantiation question: which tools does this user get at all? This is evaluated before the agent starts. An employee’s agent is provisioned with search_docs, create_ticket, and send_notification. It knows nothing about approve_expense — the tool is simply not there.

ABAC answers the invocation question: given the tools the agent has, is this specific invocation permitted? A manager’s agent has approve_expense, but when the agent tries to call it with an amount of $1,500 — exceeding the approval limit — the ABAC check denies it.

ReBAC answers the resource question: does the right relationship exist between this user and the specific resource being acted on? A manager’s agent invokes approve_expense within their limit, but for an expense submitted by someone outside their direct reports — the relationship check fails.

flowchart LR
    subgraph "Agent instantiation"
        RBAC["RBAC
Which tools does
this user get?"] --> Agent[[Agent with permitted tools]]
    end

    subgraph "Tool invocation"
        Agent --> ABAC{"ABAC
Are parameters
and context valid?"}
        ABAC -->|No| D1([Denied])
        ABAC -->|Yes| ReBAC{"ReBAC
Does the required
relationship exist?"}
        ReBAC -->|No| D2([Denied])
        ReBAC -->|Yes| Allow([Tool invoked])
    end

The models handle what they’re each best suited for, at the right moment in the flow.

This isn’t a prescription to implement all three from the start. Many agent systems need only RBAC, and that’s the right call — don’t add complexity your requirements don’t justify. Attribute conditions are the most common next need, and ABAC extends naturally when they appear. ReBAC enters the picture when authorization depends on specific relationships between users and resources that are meaningful enough to model explicitly.

Think of the models as tools you reach for as your requirements grow, not a stack to implement all at once. Section 7 has a more concrete guide for choosing where to start.

6. Where enforcement happens

6.0 The two enforcement points

Understanding the authorization models is one thing. Knowing where to enforce them in the actual architecture is another.

There are two enforcement points, and they serve different purposes. The first is mandatory. The second is complementary and applies in specific cases.

Enforcement point 1: agent instantiation (mandatory)

Before the agent starts, something in your architecture must determine which tools this user is permitted to invoke, and the agent must be built with exactly that set. The mechanism varies: it might be a JWT containing permission scopes, a role lookup against your authorization system, or OAuth token claims that indicate which capabilities the user has. The specific approach depends on your architecture. What doesn’t vary is the requirement: the agent’s tool set must reflect the user’s permissions before the agent starts reasoning.

flowchart LR
    U([User]) -->|identity + permissions| Component["Authorization Component"]
    Component -->|"permitted tools for this user"| Factory[Agent Factory]
    Factory -->|"instantiates with permitted tools"| Agent[[Agent]]

This is non-negotiable. Giving the agent all tools and hoping it won’t use the ones it shouldn’t is not a security control. The agent will reason about every tool it has, and it can be manipulated through prompt injection into using tools it knows about. The only safe position is for unauthorized tools to not exist in the agent’s context at all.

Enforcement point 2: somewhere in the invocation path (complementary)

There are cases where two users have access to the same tool but with different permission scopes. Consider two managers: one with an approval limit of $1,000 and another with $10,000. Instantiation-time provisioning can’t distinguish between them — both get approve_expense. What differs is how they’re allowed to invoke it.

In these cases, a parameter-level authorization check needs to happen somewhere in the invocation path. Where exactly depends on your architecture:

Inside the tool: the tool reads user context from agent state and checks permissions before executing. Most agentic frameworks support agent state — a context object accessible inside tool calls — where user identity and attributes can be injected at startup.
In an external API: the tool calls an API that enforces its own authorization. The API validates the request against the user’s permissions and rejects it if it’s out of scope. In this case, the tool itself doesn’t need to implement any authorization logic.
In a middleware layer: a gateway or interceptor sits between the agent and the tool, evaluating the invocation against policy before allowing it through.

flowchart LR
    Agent[[Agent]] -->|"invokes tool with parameters"| Tool[Tool]
    Tool -->|"calls"| API["External API / Middleware"]
    API -->|"enforces authorization"| Check{Allowed?}
    Check -->|yes| Action[Execute]
    Check -->|no| Reject[Rejected]

The important thing is that it happens somewhere before the action executes. The architecture is flexible; the requirement is not.

The two points work together. Instantiation-time provisioning ensures the agent can’t reach tools it shouldn’t have. Invocation-time enforcement ensures it can’t misuse the tools it does have. The sections that follow cover the specific mechanisms — OAuth scopes, middleware, and policy engines — that you can use to implement each point.

6.1 OAuth scopes

OAuth scopes are the most familiar authorization primitive for developers building applications that call external APIs. When a user authenticates, they consent to a set of scopes — a declaration of what the application is allowed to do on their behalf. The authorization server encodes those scopes into an access token, and every downstream API call is gated by whether the required scope is present.

For agent tools that call external services — an expense platform, a CRM, a data warehouse — scopes are a natural and standards-compliant access layer. If the token doesn’t carry the expenses:write scope, the expense API rejects the call. That check happens at the API level, without any extra logic in your agent.

sequenceDiagram
    participant User
    participant Agent
    participant Auth as Auth Server
    participant API as External API

    User->>Auth: Authenticate
    Auth-->>User: Access token (scopes: expenses:read, expenses:write)
    User->>Agent: Start session (token passed through)
    Agent->>API: Call approve_expense (with token)
    API->>API: Scope present?
    API-->>Agent: Allowed / Rejected

If you map each tool to a scope, scopes can also drive instantiation-time provisioning. At startup, read the scopes from the user’s token, and build the agent with only the tools those scopes permit. For simpler systems — where the authorization question is purely “does this user have access to this tool at all?” with no parameter-level conditions — this approach can be sufficient on its own.

Where scopes run into limits is when fine-grained conditions enter the picture. Scopes are coarse — they gate access to a tool or operation category, but they can’t express rules like “this user can approve expenses up to $1,000” or “this user can only export reports for their own department.” Tokens are also static after issuance: the scopes encoded at login time can’t change mid-session, and revoking a scope requires token expiry or active introspection.

For systems where tool-level gating is enough, scopes are a clean and low-overhead solution. When you need parameter-level enforcement on top of that, scopes handle the outer layer and you add a complementary mechanism for the rest.

6.2 Pre-dispatch middleware

A middleware gate is a layer of code that sits between the agent’s decision to invoke a tool and the tool’s actual execution. The agent calls a central dispatcher; the dispatcher reads user context, evaluates permission rules, and either forwards the call to the tool or rejects it.

flowchart LR
    Agent[[Agent]] -->|"invoke tool(params)"| MW[Middleware Gate]
    MW -->|reads| State["Agent State
user context"]
    MW -->|allow| Tool[Tool]
    MW -->|deny| Reject([Rejected])
    Tool --> Action[Execute]

The middleware can serve both enforcement points. At instantiation time, it can filter the list of available tools based on the user’s role before passing them to the agent. At invocation time, it evaluates parameter-level conditions — approval limits, department checks, time windows — before the tool runs.

A simple implementation looks something like this:

function dispatch(tool_name, params, user_context):
    if tool_name not in permitted_tools(user_context.role):
        raise AuthorizationError("tool not permitted")
    if not check_conditions(user_context, tool_name, params):
        raise AuthorizationError("invocation not permitted")
    return tools[tool_name](params)

This approach has a lot going for it early on. It requires no external dependencies, keeps authorization logic in one place, and gives you full control over how rules are evaluated. For smaller systems or early-stage products where the permission model is still evolving, it’s often the right place to start.

The friction appears as the rule set grows. Authorization logic encoded in application code must be redeployed every time a rule changes. More importantly, it tends to drift — conditions accumulate, edge cases get added inline, and what started as a clean central gate becomes a tangle of conditionals spread across the codebase. At that point, the policy is hard to inspect, hard to test independently, and hard to hand to an auditor.

When you find yourself wanting to manage authorization logic separately from application code — version it, test it in isolation, update it without a deploy — that’s the signal to consider moving to a policy engine.

6.3 Policy engines

A policy engine externalizes authorization logic into declarative policies that live outside your application code. Instead of embedding permission rules in a dispatcher or a tool, the application asks the policy engine a question — “is this user allowed to invoke this tool with these parameters?” — and the engine evaluates the current policy and returns a decision.

flowchart LR
    subgraph "Instantiation"
        F[Agent Factory] -->|"which tools for this user?"| PE[(Policy Engine)]
        PE -->|permitted tool list| F
    end

    subgraph "Invocation"
        A[[Agent]] -->|"is this invocation allowed?"| PE
        PE -->|allow / deny| A
    end

The policy itself is written in a declarative language and stored separately from the application — in a file, a repository, or the engine’s own storage. A rule that governs approve_expense might look like this:

allow if:
    input.user.role == "manager"
    input.params.amount <= input.user.approval_limit
    input.time.hour >= 9
    input.time.hour <= 17

This is the same logic that would otherwise live in application code — but now it’s in a policy file that can be read, reviewed, and updated independently. Change the approval limit threshold, adjust the time window, add a department condition: the policy changes, the application doesn’t.

This separation is what makes policy engines valuable in regulated or audited environments. The authorization rules are inspectable as a standalone artifact. They can be version-controlled alongside the rest of your codebase, tested in isolation, and handed to a compliance team without requiring them to navigate application logic.

Tools like OPA (Open Policy Agent), Cedar, and Cerbos are common choices, each with their own policy language and evaluation model. They differ in expressiveness, performance characteristics, and how well they handle ABAC versus ReBAC patterns — but the architectural pattern is the same: your application becomes a policy decision client, and the engine is the policy decision point.

The trade-off is operational overhead. A policy engine is an additional component to deploy, operate, and integrate. Policy languages have a learning curve, and debugging a failed authorization decision in a declarative language is a different skill than debugging application code. For small systems with simple, stable rules, this overhead may not be justified. As rules grow more complex and more people — developers, security teams, auditors — need to reason about them, the investment pays off.

7. Choosing your approach

7.1 Start with the right question

Before reaching for a model or a mechanism, it’s worth spending a moment on what your authorization requirements actually are. The right starting point varies significantly depending on the nature of your access control problem, and the wrong choice creates friction that compounds over time.

Two questions cut through most of the decision:

What determines whether a user can invoke a tool?

If the answer is purely their role or user type — “managers can approve expenses, employees cannot” — RBAC is likely sufficient. If the answer involves conditions on the invocation itself — “managers can approve expenses, but only under their approval limit” — you need ABAC on top. If the answer involves the specific relationship between the user and the resource being acted on — “managers can approve expenses submitted by their direct reports” — ReBAC is the right fit.

How stable and auditable do your rules need to be?

If your rules are small in number, unlikely to change frequently, and don’t need to be reviewed by anyone outside your team, middleware is a reasonable place to start. If your rules need to be updated independently of application deployments, inspectable by a compliance or security team, or testable in isolation, a policy engine is the better long-term foundation.

These two questions map to the two enforcement points from section 6: the first shapes how you build the agent’s tool set at instantiation; the second shapes how you implement invocation-time checks. Answer them separately, because the right choice for each doesn’t have to be the same.

7.2 A decision map

Choosing your authorization model:

flowchart TD
    Q1{"Does tool access depend
on user type or role?"}

    Q1 -->|Yes| RBAC[Start with RBAC]
    Q1 -->|"No — relationship-based"| ReBAC

    RBAC --> Q2{"Do invocations need
conditions beyond role?"}
    Q2 -->|No| UseRBAC([RBAC])
    Q2 -->|Yes| Q3{"Resource-specific
relationship checks?"}
    Q3 -->|No| UseABAC([RBAC + ABAC])
    Q3 -->|Yes| UseHybrid([RBAC + ABAC + ReBAC])

    ReBAC --> Q4{"Conditions on
invocations?"}
    Q4 -->|No| UseReBAC([ReBAC])
    Q4 -->|Yes| UseHybrid2([ReBAC + ABAC])

Choosing your enforcement mechanism:

flowchart TD
    M1{"Tools call external APIs
that enforce OAuth?"}
    M1 -->|"Yes — tool-level gating is sufficient"| Scopes([OAuth Scopes])
    M1 -->|Need more control| M2{"Rules simple, stable,
and team-internal?"}
    M2 -->|Yes| MW([Middleware])
    M2 -->|"No — need auditability or compliance review"| PE([Policy Engine])

The two maps are independent. You might use RBAC + ABAC for your authorization model and middleware as your enforcement mechanism. Or ReBAC with a policy engine. The model choice is about the logic of your authorization rules; the mechanism choice is about where and how you evaluate them.

7.3 Design for growth

The most common mistake isn’t picking the wrong model — it’s embedding authorization logic directly in individual tools, scattered across the codebase, with no central enforcement point. Once you’re there, you can’t answer basic questions like “what can a manager do?” without reading every tool’s implementation. Every new tool requires manually adding the same checks. Auditing becomes archaeology.

The central enforcement point is the first thing to get right. Whether it’s an OAuth scope check, a middleware gate, or a policy engine matters less than having a single, deliberate place where authorization decisions are made. Everything else can be improved incrementally.

From there, the signals for evolving are usually clear:

Role explosion: you keep creating new roles to handle edge cases. That’s the signal for ABAC — conditions should be expressed in policies, not as role variants.
Rules drifting into tools: authorization logic has started accumulating across individual tool implementations rather than staying centralized. Time to establish or reinforce the middleware gate.
Audit requests: someone asks you to prove which users can do what, or compliance requires an independent review of your authorization rules. If those rules live in application code, that conversation is painful. A policy engine makes it tractable.
Relationship-dependent access: you find yourself maintaining lists of “authorized users per resource” and keeping them in sync by hand. That’s the shape of a ReBAC problem.

Start with what your current requirements justify. Add complexity only when the signals are clear. And from the beginning, hold the one principle this post keeps returning to: the agent’s tool set is determined by your authorization layer, not by the agent’s own reasoning. That boundary is the foundation everything else is built on.

OAuth 2.0 from the AI Engineer’s Perspective

2026-02-28T00:00:00+00:00

This post is a practical guide to OAuth 2.0 for AI engineers. It covers the core concepts — roles, scopes, access tokens, and JWTs — and then goes deep on the grant types most relevant to AI and agentic systems: Authorization Code, Authorization Code with PKCE, Client Credentials, Refresh Token, Device Authorization, and Token Exchange. Along the way, it maps each grant type to real agentic scenarios, explains how OAuth 2.0 relates to OpenID Connect, and gives you a decision framework for choosing the right grant type in your own systems.

OAuth 2.0 from the AI Engineer Perspective

Introduction

Imagine you’re building an AI-powered secretary — a SaaS application that joins your users’ calls with their clients and, based on what was discussed, schedules follow-up meetings in their Google Calendar. To do that, your application needs permission to access each user’s calendar. Your users probably won’t be thrilled about sharing their Google account passwords with your application, no matter how honest your intentions are. And even if they were, you’d want your application to only be able to manage calendar events, not access everything else in their Google account. That permission should also be revocable at any time. How do you design that system?

This is exactly the problem OAuth 2.0 was built to solve. OAuth 2.0 is the industry standard protocol for authorization. It defines a set of rules and workflows — called grant types — that cover scenarios such as a client application obtaining delegated permission to access user-owned resources on behalf of that user, without ever handling their credentials, or a client application accessing its own resources directly. Each grant type is designed for a specific scenario: a user authorizing a web app, a backend service calling another service, a CLI tool running on a device without a browser, or an agent delegating access to another agent.

As an AI engineer, you’ll encounter these scenarios constantly. An agent accessing user data, a pipeline of agents where one needs to act on behalf of another, a background service making API calls without any user present — each of these requires a different authorization approach, and picking the wrong one leads to systems that are either insecure, brittle, or both. In this post, we’ll cover the OAuth 2.0 grant types most relevant to AI engineering: Authorization Code, Authorization Code with PKCE, Client Credentials, Refresh Token, Device Authorization, and Token Exchange. Along the way, we’ll look at how each one maps to real agentic scenarios.

Authorization is one of those things that’s easy to get approximately right and hard to get exactly right. As AI agents become more autonomous, longer-lived, and more deeply integrated with sensitive systems, the cost of getting it wrong compounds. A poorly scoped token, a credential stored in the wrong place, or the wrong grant type chosen for the wrong scenario can quietly become a serious vulnerability. The goal of this post is not just to explain how OAuth 2.0 works, but to give you the mental model to design authorization into your AI systems deliberately — from the start.

OAuth 2.0 Roles

OAuth 2.0 defines four roles that together describe who owns what, who wants access, and who mediates the whole interaction. Let’s ground them in the AI secretary scenario from earlier.

The Resource Owner is the entity that owns the protected resource and can grant access to it. In the secretary example, that’s your user — the person whose Google Calendar holds their meetings.

The Client is the application that wants access to the protected resource, acting on behalf of the Resource Owner with their authorization. In the secretary example, that’s your SaaS application. In an agentic system, the agent itself often plays this role: it’s the party making API calls and requesting permission to act.

The Authorization Server is what mediates trust. It authenticates the Resource Owner, obtains their consent, and issues access tokens to the Client. In the secretary example, that’s Google’s OAuth infrastructure — the system behind the consent screen your users see when they connect their calendar.

The Resource Server is where the protected resources live. It accepts requests from the Client, validates the access token, and either serves the resource or rejects the request. In the secretary example, that’s the Google Calendar API. We’ll look at exactly how that enforcement works once we cover tokens.

These roles can feel abstract in isolation, so it’s worth pausing on a few things that commonly cause confusion.

First, the Authorization Server and Resource Server are logically distinct roles, but they don’t have to be run by different systems. In many implementations — including Google’s — they’re operated by the same provider. What matters is the separation of concerns: the Authorization Server decides whether the Client is allowed to access something; the Resource Server enforces that decision at request time.

Second, and more importantly for AI engineering: the same entity can play different roles depending on which resource access you’re looking at. Consider the AI secretary again. When it accesses a user’s Google Calendar, the agent is the Client and the user is the Resource Owner — the agent is acting on someone else’s behalf. But suppose that same agent also pulls in weather forecasts to help schedule outdoor meetings. The weather API doesn’t belong to any user; the agent is accessing it on its own behalf, not delegating from anyone. That’s a different kind of interaction entirely — different Resource Server, different Authorization Server, different role configuration.

This is the normal state of an agentic application. An agent typically has many resource access interactions happening across its lifetime, and each one has its own set of roles. One interaction might involve user-delegated access to a calendar. Another might involve the agent accessing a third-party data API directly. A third might involve one agent calling another. Each of these is a separate OAuth interaction with its own Client, Resource Owner, Authorization Server, and Resource Server — and the same entity can appear in different roles across different interactions. Keeping this in mind will make the grant types that follow much easier to reason about: each grant type is really a description of how one particular interaction is authorized, not how an entire system works.

Scopes

When your AI secretary requests access to a user’s Google Calendar, it doesn’t just ask for “access to Google.” It asks for access to something specific — in Google’s case, something like https://www.googleapis.com/auth/calendar.events, which grants permission to read and write calendar events. That string is a scope, and it defines the boundaries of what the token the Client receives will be permitted to do (more on tokens in the next section).

Scopes flow through the OAuth process in a predictable way. The Client declares which scopes it needs when it initiates the authorization request. The Authorization Server presents those scopes to the Resource Owner on a consent screen — this is why you see prompts like “This app wants to: view and edit your calendar events.” If the user consents, the resulting token is restricted to exactly those scopes.

sequenceDiagram
    participant C as Client (AI Secretary)
    participant AS as Authorization Server (Google OAuth)
    participant RO as Resource Owner (User)
    participant RS as Resource Server (Calendar API)

    C->>AS: Authorization request (scopes: calendar.events)
    AS->>RO: Consent screen — "This app wants to: view and edit your calendar events"
    RO->>AS: Approves
    AS->>C: Token restricted to calendar.events
    C->>RS: API request + token
    RS->>C: Calendar data

That’s the user-delegated flow. But recall the weather API example from the previous section — the agent accessing forecast data on its own behalf, with no user involved. Scopes still apply here: the agent requests specific scopes when it authenticates with the Authorization Server, and the resulting token is similarly restricted to those scopes. The difference is that there’s no consent screen and no user approving anything at runtime. Instead, the scopes the application is allowed to request are pre-configured when the application is registered with the Authorization Server. The token the agent receives is still scoped — just scoped to what the application itself is authorized for, not what a user has delegated.

One thing worth knowing: OAuth doesn’t define what scope values should look like or what they mean. The specification only says they’re space-delimited strings. Every provider defines their own vocabulary. Google’s scopes are long URLs. GitHub’s look like repo or read:user. A custom internal API might use reports:read or calendar:write. There’s no universal scope language — when integrating with a new API, you’ll need to consult that provider’s documentation to understand what scopes exist and what each one covers.

For AI engineering, scopes are your primary tool for least-privilege access. An agent should request only the scopes it needs for the specific task it’s performing — not a broad set “just in case.” This matters more than it might seem. An agent with overly permissive access can do more damage if it’s compromised, behaves unexpectedly, or is manipulated by a malicious prompt. A narrowly scoped token limits the blast radius. If your AI secretary only needs to create calendar events, it should request calendar.events — not full calendar access, and certainly not access to the user’s email or drive. The same principle applies to the weather API: even though the agent is acting on its own behalf, it should still request only the scopes it actually needs.

It’s also worth noting that scopes declare what a token is permitted to do — but they don’t enforce it by themselves. Think of it like a driver’s license. The licensing authority — the Authorization Server — issues your license and specifies on it what you’re authorized to operate: a car, but not a bus. The license itself is the token, and it carries that permission with it wherever you go. But the licensing authority isn’t present every time you drive. When a police officer pulls you over at a checkpoint, they’re the Resource Server: they check your license, read what it says, and decide whether you’re in compliance. The licensing authority trusted you enough to issue the license; the officer enforces what it says in the real world. We’ll cover exactly how that enforcement works in practice when we get to tokens.

Access Tokens & JWT

Every time the AI secretary makes a request to the Google Calendar API, it includes a credential in the HTTP request — an access token. The Resource Server reads that token to decide whether the request is authorized. But what exactly is an access token, and what does it contain?

The OAuth spec deliberately doesn’t mandate a format. A token is just a string the Client presents and the Resource Server validates. What matters is the validation model, and there are two main approaches.

The first produces what’s known as a by-reference token — a random, opaque string with no inherent meaning. The Resource Server can’t read anything from it directly; instead, it calls the Authorization Server to look up what permissions that string maps to. This works, but it means every API call requires a network round-trip to the Authorization Server.

The second produces a by-value token — a self-contained token that encodes all the information the Resource Server needs to validate it, signed so that the contents can be trusted without any external lookup. JWT (JSON Web Token) is the most widely used format for by-value tokens, and it’s what most OAuth implementations use in practice.

The diagrams below show what validation looks like for each approach:

By-reference token (opaque)

sequenceDiagram
    participant C as Client
    participant RS as Resource Server
    participant AS as Authorization Server

    C->>RS: API request + opaque token
    RS->>AS: Introspection request — is this token valid?
    AS->>RS: Token info (sub, scope, exp...)
    RS->>RS: Check scope, apply app logic
    RS->>C: Response

By-value token (JWT)

sequenceDiagram
    participant C as Client
    participant RS as Resource Server

    note over RS: AS public key fetched once at startup and cached
    C->>RS: API request + JWT
    RS->>RS: Verify signature using cached AS public key
    RS->>RS: Check claims (exp, aud, scope)
    RS->>RS: Read sub, apply app logic
    RS->>C: Response

JWT structure

A JWT is three Base64URL-encoded strings joined by dots: header.payload.signature.

The header specifies the token type and the signing algorithm — for example, RS256 (RSA with SHA-256).

The payload contains claims — statements about the token and the entity it represents. Some standard claims you’ll encounter regularly:

sub (subject): who the token represents, typically a user ID
iss (issuer): which Authorization Server issued the token
exp (expiration): a Unix timestamp after which the token is no longer valid
aud (audience): which Resource Server this token is intended for
scope: the permissions the token carries

The signature is generated by the Authorization Server using its private key. When the Resource Server receives the token, it verifies the signature using the Authorization Server’s public key — typically fetched once from a well-known endpoint and cached locally. If verification passes, the Resource Server knows the token hasn’t been tampered with and came from a trusted source, without making any network call.

Here’s what a decoded JWT payload might look like in the AI secretary scenario:

{
  "sub": "user_8472",
  "iss": "https://accounts.google.com",
  "aud": "https://www.googleapis.com/",
  "scope": "calendar.events",
  "exp": 1740000000
}

How the Resource Server actually enforces authorization

Back in the Roles section we said the Resource Server “enforces the authorization decision at request time” and deferred the details. Here’s what that actually looks like in practice.

When the Calendar API receives a request, it validates the JWT signature, checks that the token hasn’t expired, confirms the scope covers the operation being requested, and then reads the sub claim — the user’s identifier — and uses it in application code to scope the data it returns, querying only calendar events that belong to that user. That last step is plain application logic, not OAuth. OAuth tells the Resource Server who the request is for and what it’s allowed to do; what to actually return is up to the application. This is why a narrowly scoped token can’t be redirected to access another user’s data — the sub claim pins it to a specific identity.

The trade-off: revocation

Because JWTs are self-contained and validated locally, the Authorization Server has no visibility into whether a token is being used after it’s issued. If a token is compromised — or a user explicitly revokes access — the token remains valid until its exp timestamp passes.

Revocation is technically possible: you can maintain a server-side blocklist of revoked token identifiers and check incoming tokens against it on every request. But this reintroduces the server-side lookup that by-value tokens were meant to avoid, and it’s not standardized in OAuth 2.0 — it requires custom coordination between the Authorization Server and Resource Server. The honest summary: it’s achievable, but the cost is high enough that most implementations either accept the gap or switch to shorter token lifetimes as a mitigation. (More on the trade-offs here.)

For AI agents this matters more than it might seem. Agents can be long-running, operate autonomously, and hold tokens across many interactions. If an agent is compromised or starts behaving unexpectedly, you want to cut off its access immediately — and with JWTs, “immediately” has an asterisk. The practical answer is to keep token lifetimes short and rely on refresh tokens to maintain access over time, which the Refresh Token Grant handles — covered in the next section.

Grant Types

Each grant type below follows the same structure: how it works, an AI scenario, and security notes.

Authorization Code Grant

The Authorization Code Grant is the most widely used OAuth flow. It’s designed for scenarios where a user is present and needs to authorize a client application to access their resources — which is exactly what happens the first time one of your users connects their Google Calendar to the AI secretary.

How it works

The flow starts when the Client redirects the user’s browser to the Authorization Server, including the requested scopes and a redirect URI — the URL the Authorization Server should send the user back to after they approve. The user authenticates with the Authorization Server and sees the consent screen. If they approve, the Authorization Server redirects them back to the Client’s redirect URI with a short-lived authorization code in the URL. The Client then takes that code and makes a direct, server-to-server request to the Authorization Server’s token endpoint — authenticating itself with its client ID and secret — and exchanges the code for an access token.

sequenceDiagram
    participant U as User (Browser)
    participant C as Client (AI Secretary)
    participant AS as Authorization Server
    participant RS as Resource Server (Calendar API)

    C->>U: Redirect to Authorization Server with requested scopes
    U->>AS: User authenticates and sees consent screen
    AS->>U: Redirect back to Client with authorization code
    U->>C: Authorization code delivered via redirect
    C->>AS: Exchange code for token (client ID + secret)
    AS->>C: Access token (+ refresh token)
    C->>RS: API request + access token
    RS->>C: Calendar data

A detail worth pausing on: why the two-step dance? Why not just return the token directly after the user approves? The answer is that the authorization code travels through the browser — via URL redirects — which is a less controlled environment. The token exchange, by contrast, happens in a direct server-to-server call that never touches the browser, and it requires the client to authenticate with its client secret. This means that even if an attacker intercepts the authorization code, they can’t use it without also having the client secret. The token itself never passes through the browser at all.

AI scenario

This is the right flow for the moment a user first connects their Google Calendar to the AI secretary. The user is present, the consent screen is shown, and the app ends up with an access token — and typically a refresh token — that it can use going forward. The Refresh Token flow (covered in 5.4) is what keeps that access alive after the access token expires, without requiring the user to go through this process again.

Security notes

This flow requires a confidential client — an application that can securely store a client secret on a server. That rules out browser-based apps (where any JavaScript can be inspected) and native mobile or desktop apps (where secrets can be extracted from the binary). If your client can’t safely store a secret, the base Authorization Code flow isn’t sufficient on its own — which is exactly the problem PKCE solves, covered next.

Authorization Code Grant + PKCE

The base Authorization Code flow has one dependency that not every client can meet: a client secret. Secrets work well on a server you control, but a mobile app is a different story — the app binary is distributed to users’ devices, and anything embedded in it can be extracted. There’s no safe place to put a secret in a mobile app. The same is true for browser-based single-page apps and CLI tools.

Without a client secret, the token exchange step loses its authentication: anyone who intercepts the authorization code can exchange it for a token themselves. PKCE (Proof Key for Code Exchange) closes this gap without requiring a secret. Instead of authenticating the client with something it knows (a secret), it proves that the party completing the exchange is the same one that started it.

How it works

Before initiating the authorization request, the client generates a random string called the code verifier. It then hashes it to produce the code challenge, which gets sent along with the authorization request. The Authorization Server stores the challenge. When the client later exchanges the authorization code for a token, it includes the original code verifier. The Authorization Server hashes it and checks it against the stored challenge. If they match, it knows the exchange is coming from the same party that started the flow — no secret required.

sequenceDiagram
    participant U as User (Mobile App)
    participant C as Client (AI Secretary Mobile)
    participant AS as Authorization Server
    participant RS as Resource Server (Calendar API)

    C->>C: Generate code_verifier, derive code_challenge = hash(code_verifier)
    C->>U: Redirect to Authorization Server with code_challenge
    U->>AS: User authenticates and sees consent screen
    AS->>AS: Store code_challenge
    AS->>U: Redirect back to Client with authorization code
    U->>C: Authorization code delivered via redirect
    C->>AS: Exchange code + code_verifier for token (no client secret)
    AS->>AS: Verify hash(code_verifier) matches stored code_challenge
    AS->>C: Access token (+ refresh token)
    C->>RS: API request + access token
    RS->>C: Calendar data

AI scenario

Your users want a mobile version of the AI secretary. The app needs to request access to their Google Calendar just like the web version does — but it can’t store a client secret. PKCE is what makes this possible. The user taps “Connect Calendar,” gets redirected to Google’s consent screen, approves, and the app ends up with a token — the same end result as the web flow, achieved without a secret.

Security notes

PKCE was originally designed for public clients, but it’s now considered best practice for all clients regardless of whether they can store a secret. Even for confidential clients, PKCE provides an additional layer of protection against authorization code interception. OAuth 2.1 — the in-progress update to the spec — requires PKCE for all clients, public or not.

Client Credentials Grant

Every flow we’ve covered so far has involved a user — someone who authenticates, sees a consent screen, and approves. The Client Credentials Grant removes the user entirely. It’s designed for machine-to-machine scenarios where the client is acting on its own behalf, not delegating from anyone.

How it works

The client sends its client ID and client secret directly to the Authorization Server’s token endpoint. The Authorization Server validates the credentials and returns an access token. That’s the entire flow — no redirects, no consent screen, no user interaction of any kind.

sequenceDiagram
    participant C as Client (AI Secretary Backend)
    participant AS as Authorization Server
    participant RS as Resource Server (Weather API)

    C->>AS: Token request (client ID + secret, requested scopes)
    AS->>AS: Validate client credentials
    AS->>C: Access token
    C->>RS: API request + access token
    RS->>C: Weather data

AI scenario

This is the flow for the weather API example from the Roles section. The AI secretary wants to pull in forecast data to help schedule outdoor meetings — but that data doesn’t belong to any user. The agent is accessing it on its own behalf, as the application. Client Credentials is also the right choice for any background processing your system does without a user present: a pipeline that runs overnight to summarize meetings, a scheduled job that syncs data, or a service that calls another internal service.

Security notes

Because there’s no user in the loop, the scopes granted here represent what the application itself is authorized to do — not what any particular user has delegated. This means scope configuration happens at registration time, when the client is set up with the Authorization Server. Getting this right matters: an overly permissive client registered with broad scopes creates a standing risk, because those scopes are available to anyone who obtains the client credentials.

Refresh tokens are typically not issued for this flow. Since the client has its own credentials and can authenticate directly at any time, there’s no need for a long-lived refresh token — when the access token expires, the client simply requests a new one. That’s a meaningful difference from the Authorization Code flows, where re-authenticating would require pulling the user back in. Which is exactly the problem the next grant type solves.

Refresh Token Grant

Access tokens are intentionally short-lived — typically valid for an hour or less. This is a feature, not a limitation: a short-lived token that gets compromised stops being useful quickly. But it creates a practical problem. The AI secretary was authorized by the user once, and it needs to keep accessing their calendar for weeks or months. Requiring the user to re-authorize every time the token expires would be a terrible experience. The Refresh Token Grant is what bridges that gap.

Unlike the previous flows, this isn’t something you choose as an authorization strategy. It’s a companion to Authorization Code (and Device Authorization, covered next) — the mechanism that keeps access alive after the initial authorization without requiring the user to come back.

How it works

When the Authorization Server issues an access token at the end of the Authorization Code flow, it typically also issues a refresh token. The refresh token is longer-lived and stored securely by the client. When the access token expires, the client sends the refresh token to the Authorization Server’s token endpoint. If the refresh token is still valid — and the user hasn’t revoked access — the Authorization Server issues a new access token. No user interaction required.

sequenceDiagram
    participant C as Client (AI Secretary)
    participant AS as Authorization Server
    participant RS as Resource Server (Calendar API)

    C->>RS: API request + access token
    RS->>C: 401 Unauthorized (token expired)
    C->>AS: Refresh request (refresh token + client credentials)
    AS->>AS: Validate refresh token
    AS->>C: New access token (+ new refresh token)
    C->>RS: Retry API request + new access token
    RS->>C: Calendar data

AI scenario

The user connected their Google Calendar to the AI secretary two weeks ago. Since then, the agent has been quietly joining calls, parsing transcripts, and scheduling follow-ups — all without the user thinking about authorization again. Each time the access token expires, the agent uses the refresh token to get a new one in the background. From the user’s perspective, it just works.

Security notes

Refresh tokens are high-value credentials — they’re long-lived and can be used to generate new access tokens repeatedly. Losing one is more serious than losing an access token, which will at least expire on its own.

Token rotation is the main mitigation: each time a refresh token is used, the Authorization Server should issue a new one and invalidate the old. This means a stolen refresh token can only be used until the legitimate client uses it first — at which point the stolen token becomes worthless. Better implementations also detect reuse: if a refresh token that’s already been rotated shows up again, it’s a signal that something may be compromised, and the Authorization Server can invalidate the entire token family.

It’s also worth knowing that most Authorization Servers set a maximum lifetime on the refresh token chain — after a certain number of rotations, or after a certain total duration, the refresh token expires and the user must re-authenticate. This is intentional: it ensures that long-term access can’t persist forever purely on automation, and that users periodically reconfirm they still want the integration active.

For AI agents, refresh tokens deserve particular care. An agent that holds a refresh token effectively has long-lived access to a user’s resources — as long as the token isn’t revoked and keeps getting rotated within the Authorization Server’s limits. That’s a significant trust surface. Store refresh tokens with the same care you’d give a password, and make sure your system handles token revocation (when a user disconnects the integration) and re-authentication prompts cleanly.

Device Authorization Grant

The Authorization Code flow assumes the device initiating the request has a browser and can handle redirects. That assumption breaks down quickly in agentic contexts. A CLI tool running in a terminal can’t open a browser window and receive a redirect. A headless agent running on a server has no UI at all. The Device Authorization Grant is designed for exactly these situations — devices that need user authorization but can’t complete a browser-based redirect flow.

The solution is to decouple the authorization from the device making the request. The device gets a code and a URL, and the user completes the authorization on a different device — typically their phone or laptop — while the original device waits.

How it works

The client sends a request to the Authorization Server’s device authorization endpoint. The AS responds with two things: a device_code (used internally by the client) and a short user_code (meant for the user), along with a verification_uri where the user should go to enter it. The client displays the URL and code to the user and starts polling the AS at regular intervals. Meanwhile, the user opens the URL on another device, authenticates, and enters the code. Once the AS sees that the user has approved, the next poll from the client returns an access token.

sequenceDiagram
    participant C as Client (AI Secretary CLI)
    participant AS as Authorization Server
    participant U as User (Phone or Laptop)
    participant RS as Resource Server (Calendar API)

    C->>AS: Device authorization request
    AS->>C: device_code, user_code, verification_uri
    C->>U: Display "Go to example.com/activate and enter: XKCD-42"
    loop Poll until approved or expired
        C->>AS: Poll with device_code
        AS->>C: authorization_pending...
    end
    U->>AS: Opens verification_uri, authenticates, enters user_code
    AS->>C: Access token (+ refresh token) on next poll
    C->>RS: API request + access token
    RS->>C: Calendar data

AI scenario

The AI secretary ships a terminal-based version for engineers who prefer working without a GUI. When a user sets it up for the first time, the CLI prints something like:

To connect your Google Calendar, open this URL on your phone or browser:
  https://accounts.google.com/device

And enter the code: XKCD-42

Waiting for authorization...

The user opens the link, logs in, enters the code, and the CLI receives its token — no browser on the local machine required. From there, the Refresh Token flow takes over to keep access alive without repeating this process.

Security notes

The user_code is intentionally short and human-typeable, which means it has a limited character space. To compensate, it expires quickly — typically in 15 minutes or less. If the user doesn’t complete authorization in that window, the flow must be restarted.

The polling interval matters too: clients must respect the interval returned by the AS and not poll more aggressively. Some AS implementations will return a slow_down response if a client polls too frequently, temporarily increasing the required interval.

For agentic use, this flow is well-suited for one-time setup of a long-running agent that will then maintain access via refresh tokens. The user experience is a single authorization moment at setup time, after which the agent operates autonomously within the bounds of what was approved.

Token Exchange (RFC 8693)

The flows covered so far all involve a single client obtaining a token and using it. Multi-agent systems introduce a different problem: what happens when one agent needs to delegate work to another?

Consider the AI secretary in a more complex configuration. The main orchestrator agent receives a user’s token after Authorization Code flow — it’s authorized to access the user’s calendar. Now it needs to hand off a sub-task to a specialized writing agent: draft a follow-up email based on what was discussed in the meeting. The writing agent needs to act in the context of that user — but it doesn’t have a token, and the orchestrator can’t just hand its own token over. That would give the writing agent the same level of access as the orchestrator, with no record of the delegation happening.

Token Exchange (RFC 8693) solves this. It lets a client present an existing token to the Authorization Server and request a new one — scoped down, targeted at a specific audience, and cryptographically encoding who is acting on behalf of whom.

How it works

The client sends a token exchange request to the Authorization Server’s token endpoint, including a subject_token (the token representing the user) and an actor_token (the token representing the agent making the request). The AS validates both tokens and issues a new one. The resulting token contains the standard sub claim identifying the user, plus an act claim identifying the agent that is acting — creating a verifiable record of the delegation chain.

sequenceDiagram
    participant OA as Orchestrator Agent
    participant AS as Authorization Server
    participant WA as Writing Agent
    participant RS as Resource Server (Email API)

    OA->>AS: Token exchange request (subject_token: user token, actor_token: orchestrator token)
    AS->>AS: Validate both tokens, issue delegated token
    AS->>OA: Delegated token (sub: user, act: orchestrator agent)
    OA->>WA: Call with delegated token
    WA->>RS: API request + delegated token
    RS->>WA: Access granted (scoped to user, audit trail preserved)

AI scenario

The orchestrator agent has a token for the user — it’s allowed to read calendar events and manage scheduling. It delegates the email drafting task to a writing agent, using Token Exchange to request a new token scoped only to email.compose — the minimum the writing agent needs. The AS issues a token where sub is still the user and act identifies the orchestrator as the delegating party. If anything goes wrong, the audit trail shows exactly which agent did what and under whose authority.

Security notes

The act claim is what makes delegation auditable. Each hop in a multi-agent pipeline can be recorded in the token, creating a chain: Agent C acting on behalf of Agent B acting on behalf of User X. Resource Servers can inspect this chain to enforce policies — for example, refusing to honor a token that has passed through more than two hops.

The most important principle here is scope reduction. The delegated token should carry only the scopes needed for the specific sub-task — not the full set of permissions from the original token. Passing down a maximally-permissive token through a chain of agents is exactly the kind of thing that turns a compromised sub-agent into a wide-open breach. Token Exchange gives you the mechanism to prevent that; it’s worth using it deliberately.

OAuth 2.0 vs. OpenID Connect

OAuth 2.0 and OpenID Connect are closely related, often used together, and frequently confused. The distinction is conceptually clean: OAuth 2.0 is about authorization — what a token is allowed to do. OpenID Connect is about authentication — who the user is. OIDC is built directly on top of OAuth 2.0 and extends it with a small set of additions specifically designed to convey user identity.

What OIDC adds

When a Client includes the openid scope in an OAuth authorization request, it signals that it wants OIDC. The Authorization Server responds not just with an access token, but also with an ID token — a separate JWT whose purpose is to tell the Client who just authenticated and how. Where an access token is intended for the Resource Server (“here’s your authorization to act”), the ID token is intended for the Client itself (“here’s who logged in”).

OIDC also defines a UserInfo endpoint — an API the Client can call using the access token to retrieve additional profile claims about the user. Rather than packing everything into the ID token, OIDC keeps the ID token lean and lets the Client fetch richer profile data separately when needed.

Claims: what’s new and what you already know

The sub claim appeared back in the JWT section, and it’s present in both access tokens and ID tokens. In an access token, sub identifies the user for the Resource Server — it’s the identifier the application uses in its own database queries to scope what gets returned. In an ID token, sub serves the same identifying role, but the audience is the Client application itself, which uses it to know who just signed in.

OIDC introduces a separate set of profile claims that don’t appear in access tokens by default. These come in the ID token or from the UserInfo endpoint, and they describe the person rather than their authorization:

name, given_name, family_name — the user’s display and legal name
email — the user’s email address
picture — a URL to the user’s profile photo
auth_time — when the authentication event occurred
amr — the authentication methods used (e.g., password, hardware key)

Which of these you receive depends on the scopes requested. The profile scope grants access to name and picture; email grants access to the email address. The openid scope alone gives you only the minimum: sub and the authentication metadata.

The practical difference: an access token says “this token can read calendar events for user user_8472.” An ID token says “user user_8472 is Jane Doe, her email is jane@example.com, and she authenticated two minutes ago using a hardware key.” One is about permission; the other is about identity.

In practice, the lines blur

The conceptual separation between authentication and authorization is clean on paper. Real-world identity providers — Google, Okta, Auth0, Cognito — implement both OIDC and OAuth 2.0, and they deliver both tokens in the same flow. When a user signs into the AI secretary and connects their Google Calendar, a single authorization request produces an ID token (who this user is) and an access token (what the app can do on their behalf). The protocol is separate; the infrastructure that delivers it is combined.

There’s a second, deeper source of blurriness: authorization doesn’t only happen at the OAuth layer. The Authorization Server controls token-level access — which scopes a token carries, which audiences it’s valid for, when it expires. But the application itself also makes authorization decisions that OAuth knows nothing about: is this user an admin? Do they own this record? Are they allowed to see data from this specific organization? These decisions live in application code, and they typically use the token’s claims as inputs — sub to identify the user, email or group membership claims to check roles.

This means authorization in a real system happens at (at least) two layers: what OAuth 2.0 permits at the token level, and what the application permits at the business logic level. A token with calendar.events scope doesn’t mean the user can see all calendar events — it means the application is allowed to query the calendar API on their behalf. Which events are actually returned is an application decision, informed by the sub claim and whatever business rules apply. OAuth 2.0 is one layer of authorization, not the whole story.

Choosing the Right Grant Type

flowchart TD
    A([New resource access scenario]) --> B{Is a user present and authorizing?}
    B -->|No| C[Client Credentials Grant]
    B -->|Yes| D{Can the client use a browser redirect?}
    D -->|No| E[Device Authorization Grant]
    D -->|Yes| F{Can the client securely store a client secret?}
    F -->|Yes| G[Authorization Code Grant]
    F -->|No| H[Authorization Code + PKCE]
    G --> I{Need ongoing access without user re-auth?}
    H --> I
    E --> I
    I -->|Yes| J[+ Refresh Token Grant]
    I -->|No| K([Done])
    J --> K
    C --> K

Agent-to-agent delegation: when an agent needs to call another agent while preserving the user’s identity, apply Token Exchange (RFC 8693) on top of whichever flow issued the original token.

When to use each path

Client Credentials is the right choice whenever there’s no user in the loop — a background pipeline, a scheduled job, or any service-to-service call where the client is acting on its own behalf. The agent authenticates directly with its credentials and gets a token back. No redirects, no consent screen.

Device Authorization handles the cases where a user needs to authorize something but the client can’t complete a redirect — a CLI tool, a headless agent, or any environment without a browser. The user approves on a separate device while the client polls for the result.

Authorization Code is for server-side applications that can store a client secret and need to request user-delegated access. The two-step code exchange keeps the token out of the browser.

Authorization Code + PKCE covers the same user-delegated scenario for clients that can’t store a secret — mobile apps, single-page apps, desktop tools. PKCE is also recommended on top of the base Authorization Code flow even for confidential clients.

Refresh Token isn’t a standalone choice — it’s the companion to Authorization Code and Device Authorization that keeps access alive after the initial token expires, without requiring the user to re-authorize.

Token Exchange sits outside the main tree because it’s not how a client gets its first token — it’s what happens when an agent needs to delegate work to another agent mid-flow. The calling agent exchanges its token for a new one that is scoped down and carries the delegation trail.

Most real-world agentic applications use several of these at once. The AI secretary uses Authorization Code + PKCE for the mobile app, Refresh Token to maintain calendar access, Client Credentials for the weather API, and Token Exchange when it delegates tasks to sub-agents. Each resource access interaction has its own flow, its own token, and its own scope.

Protecting the agent itself

Choosing the right grant type covers how your agent accesses things. But there’s another side to this: what happens when something accesses your agent.

Throughout this post we’ve been looking at the agent as a Client — the party requesting access to external resources. But agents also receive requests. A frontend calls your agent on behalf of a user. An orchestrator delegates a task to your agent. Another service calls your agent directly. When that happens, your agent is the Resource Server, and it needs to protect itself with the same rigor we’ve applied everywhere else.

Every incoming request to your agent should carry a token, and your agent should validate it fully before acting. The checklist:

Verify the signature — confirm the token was signed by a trusted Authorization Server using its public key
Check iss — confirm it was issued by an Authorization Server you actually trust
Check aud — confirm the token was issued specifically for your agent. This is the guard against token forwarding attacks, where a token obtained for one service is replayed against another. If aud doesn’t match your agent’s identifier, reject the token
Check exp — confirm the token hasn’t expired
Check scope — confirm the caller has the permissions required for the specific operation they’re requesting

Beyond the baseline, different caller types warrant different handling.

A frontend calling on behalf of a user — the token carries a sub (the user’s identity) and the scopes the user authorized. Your agent operates in that user’s context and applies its own application-layer authorization accordingly.

Another agent calling with a delegated token — the token carries both sub (the original user) and an act claim (the calling agent’s identity). Inspect the act claim to understand who is delegating, and consider enforcing a delegation depth limit — refusing tokens that have passed through more hops than your policy allows.

Another agent calling on its own behalf — the token won’t carry a user sub. Be explicit about which operations are available to machine-to-machine callers versus user-delegated ones, and restrict accordingly.

Finally: define your agent’s own scopes. Just like any Resource Server, your agent should require callers to request specific permissions to invoke it. This makes access intentional and auditable — not a free-for-all for anyone who holds a valid token from a trusted Authorization Server.

Tools & Libraries

OAuth 2.0 is a protocol — not something you implement from scratch. In production, the Authorization Server role is almost always handled by a managed identity provider, and the client side by well-maintained libraries. Your job as an AI engineer is to understand the protocol well enough to configure these tools correctly, not to reimplement them.

Identity Providers

Services like Auth0, Okta, AWS Cognito, Google Identity, and Azure Entra ID act as your Authorization Server out of the box. They handle token issuance, scope enforcement, refresh token rotation, consent screens, and more. Which one you use typically comes down to your existing infrastructure: Cognito is a natural fit for AWS-heavy stacks, Entra ID for Microsoft ecosystems, and Auth0 or Okta for teams that want a provider-agnostic solution with strong developer tooling. Keycloak is worth knowing as a self-hosted open-source option for teams that need to keep everything in-house.

Client Libraries

When your application needs to act as an OAuth Client, most identity providers ship their own SDKs that abstract away the raw protocol work. For cases where you’re integrating with a provider that doesn’t have a dedicated SDK, or when you want more control, general-purpose libraries like Authlib (Python) and openid-client (Node.js) cover the full OAuth 2.0 and OIDC surface area.

The libraries handle the mechanics. The protocol knowledge you now have is what lets you configure them correctly, pick the right grant type, scope tokens appropriately, and diagnose problems when something breaks.

The AI secretary we started with — joining calls, scheduling meetings, accessing calendars — is a useful lens because it’s not a contrived example. It’s the kind of system AI engineers are building right now, and it touches almost every concept in this post: user-delegated access, machine-to-machine calls, token lifetimes, scope design, multi-agent delegation, and the responsibility of protecting your own service from callers.

OAuth 2.0 doesn’t make these problems disappear. What it gives you is a standard, well-understood vocabulary for solving them — one that your tools, your libraries, and your teammates all share. The more deliberately you apply it, the more secure and durable your systems will be.