Governing Agentic AI: Why Workflow Oversight Must Replace Tool-by-Tool Approval

AI Governance

Agentic AI

Risk Management

Public Institutions

AI governance was already difficult when systems mostly generated text, recommendations, or predictions. It becomes more difficult when AI systems can act. The emerging generation of agentic AI can call tools, operate software, retrieve records, draft messages, update systems, and continue a workflow after a human approves the next step. This does not mean that AI has become truly autonomous in the human sense. It means that the object of governance has changed.

When AI only advises, governance can focus heavily on model behavior, data protection, bias, and user guidance. When AI acts across systems, governance must also cover workflow behavior. What can the system access? What can it do without approval? How are actions logged? Who reviews exceptions? What happens when the agent fails, hallucinates, or triggers the wrong tool? Who is accountable when a decision pathway includes both human judgment and machine execution?

This shift is already visible in the technical direction of the market. OpenAI’s Operator demonstrated a computer-using agent that can interact with websites through a browser.¹ Anthropic’s computer-use capability allows Claude to perceive a screen, move a cursor, click, and type, while acknowledging that such capabilities create distinctive safety issues.² LangGraph emphasizes long-running, stateful agents with human-in-the-loop control, persistence, and observability.³ These are not merely new user interfaces. They are early signs of an AI execution layer.

For serious institutions, the lesson is clear: approving AI one tool at a time is no longer enough. A writing assistant, a customer service bot, a procurement analysis workflow, and a browser-controlling agent may all be “AI tools,” but they do not carry the same risk. Governance must become workflow-specific, risk-tiered, and operationally enforceable.

The mistake: treating agentic systems like ordinary software

Many institutions have inherited software governance processes that assume predictable systems. A tool is reviewed, approved, configured, and deployed. Users are trained. Security teams monitor access. Procurement manages the vendor. This approach still matters, but it is incomplete for agentic AI because the system may generate intermediate plans and actions that were not explicitly scripted in advance.

The institution therefore needs to govern not only the software product, but the conditions under which the system can act. A model that drafts an internal memo is different from a system that can send that memo. A tool that recommends a supplier risk score is different from a workflow that updates a supplier file. A chatbot that answers policy questions is different from an agent that collects data from multiple platforms, creates a decision brief, and routes it to an official for approval.

AI use case	Primary governance focus	Why it matters
Drafting and summarization	Accuracy, confidentiality, review norms	The output may be wrong, but action still depends on the user.
Decision support	Bias, explainability, evidence quality, human review	The system can influence consequential judgments.
Tool-using assistant	Permissions, data access, tool boundaries	The system can retrieve and combine information across environments.
Agentic workflow	Logs, escalation, approvals, rollback, process ownership	The system can move work through institutional processes.
Multi-agent orchestration	Coordination failure, compound risk, auditability	Errors may emerge from interactions between agents and tools.

The governance burden increases as the AI system gains permission to act. That burden is not a reason to avoid adoption. It is a reason to design adoption properly.

Workflow oversight as the new governance unit

A workflow is a sequence of tasks that moves work from one state to another. It includes inputs, decision points, tools, people, handoffs, approvals, outputs, and records. In an agentic AI environment, the workflow becomes the right unit of governance because risk emerges from the interaction between the model, the tools, the data, the human reviewers, and the institutional context.

NIST’s AI Risk Management Framework provides a useful foundation because it organizes risk management around governance, mapping, measurement, and management.⁴ For agentic AI, these functions should be applied to the workflow rather than only the model. Institutions should map each workflow in which AI acts, measure both performance and risk, govern roles and permissions, and manage incidents through clear escalation procedures.

The EU AI Act also points toward a more structured deployment environment by classifying AI systems according to risk and imposing obligations on providers and deployers for certain uses.⁵ Institutions do not need to copy the Act mechanically in every jurisdiction, but its risk-based logic is valuable. A workflow that supports internal knowledge management should not be governed the same way as a workflow that screens applicants, prioritizes inspections, advises citizens, or affects access to public services.

CentPol view

The question is not whether an AI tool has been approved. The question is whether the workflow in which it acts has been designed, bounded, monitored, and made accountable.

The five controls every agentic workflow needs

The first control is permission design. Agentic systems should operate on least privilege. They should only access the data and tools necessary for the workflow, and they should only be allowed to perform actions appropriate to their risk tier. A research assistant may read documents and draft summaries. A procurement workflow may retrieve vendor information and flag anomalies. A system that can submit forms, send official notices, or update records requires stronger approval gates.

The second control is human approval at consequential points. Human-in-the-loop governance should not mean that a person vaguely supervises the system after the fact. It should specify the exact points at which a human must review, approve, reject, or revise an AI-generated action. LangGraph’s design emphasis on human-in-the-loop workflows reflects the practical need to pause execution at defined checkpoints.³

The third control is traceability. Institutions need records of what the agent did, what data it used, what tools it called, what output it produced, which human approved it, and what changed afterward. Without logs, there can be no meaningful audit, learning, or accountability. Traceability is also essential for procurement: vendors should be evaluated on whether their systems expose sufficient logs and operational evidence for institutional oversight.

The fourth control is escalation and stop conditions. Agentic systems should know when not to continue. Ambiguous inputs, missing records, conflicting sources, low confidence, sensitive categories, and unusual requests should trigger escalation. In high-stakes domains, the safest design may be one where the agent prepares work but cannot complete the final action.

The fifth control is continuous evaluation. AI workflows should be measured over time, not approved once and forgotten. Performance can degrade as models, data, users, vendors, and institutional needs change. Evaluation should include accuracy, error rates, cycle time, user satisfaction, fairness, security incidents, escalation frequency, and impact on staff workload.

Control	Governance question	Evidence the institution should retain
Permission design	What can the agent access and do?	Access matrix, tool permissions, data classification.
Human approval	Where must people intervene?	Approval logs, reviewer identity, decision rationale.
Traceability	Can we reconstruct the workflow?	Action logs, prompts, tool calls, outputs, version history.
Escalation	When does the system stop or hand over?	Escalation rules, incident records, exception categories.
Evaluation	Is the workflow still safe and useful?	Performance metrics, audits, user feedback, risk reviews.

Procurement must become governance-aware

One of the most important governance decisions happens before deployment: procurement. If procurement teams buy AI systems without requiring auditability, data controls, performance evidence, model-update policies, and exit rights, governance teams will inherit risk they cannot fully manage.

AI procurement should therefore ask different questions from ordinary software procurement. Does the vendor support role-based permissions? Can the institution inspect logs? How are model updates communicated? Where is data processed and stored? Does the tool use customer data for training? Can workflows be exported or migrated? What human approval features are available? How are incidents reported? What happens if the system performs poorly in a specific language, domain, or population group?

These questions should be part of the contract, not only the sales conversation. Serious institutions should prefer vendors that make governance easier. A system that cannot be audited may be unsuitable for public-interest contexts even if its demo is impressive.

Why this matters for public trust

Agentic AI can improve institutional capacity. It can reduce delays, support staff, improve documentation, identify risks earlier, and help smaller teams deliver more complex work. But public trust depends on more than efficiency. People need to know that when AI is involved in institutional processes, there is still responsibility, review, appeal, and explanation.

This is especially important in the public sector, education, development, healthcare, finance, migration, and social services. In these domains, workflows affect real opportunities and rights. An AI system that helps process information may be useful. An AI system that quietly shapes outcomes without transparency may be unacceptable. The line between assistance and decision influence must be made explicit.

The public does not need every technical detail of a model. But affected communities deserve clarity about where AI is used, what it does, what it does not do, who supervises it, and how errors can be corrected. Governance is therefore not only a compliance function. It is part of institutional legitimacy.

A practical roadmap for institutions

Institutions should begin by classifying existing and proposed AI uses into risk tiers. Low-risk internal productivity uses can move faster with basic controls. Medium-risk workflows require stronger documentation, approved data sources, human review, and monitoring. High-risk workflows that affect rights, eligibility, safety, resources, or public trust require formal governance review, legal assessment, testing, auditability, and ongoing oversight.

They should then create workflow cards for each AI-enabled process. A workflow card is a concise record that explains the purpose of the workflow, users, data sources, AI functions, allowed actions, human approval points, risks, metrics, vendor dependencies, and review schedule. This makes governance concrete and reusable.

Institutions should also build an AI incident process. Most organizations will experience AI errors. What matters is whether they detect them, learn from them, and correct them. Incident processes should cover harmful outputs, unauthorized actions, privacy concerns, security events, discrimination risks, and significant performance failures.

Finally, governance should be connected to capability building. If staff are not trained, they may bypass controls or over-trust outputs. If leaders are not trained, they may approve systems without understanding operational risk. If procurement teams are not trained, they may buy tools that cannot be governed. Workflow oversight requires organizational literacy.

Conclusion: accountability must move at the speed of capability

Agentic AI is not science fiction. It is emerging through browser agents, tool-using models, orchestration frameworks, multi-agent platforms, and enterprise workflow systems. Many of these systems remain imperfect, but their direction is clear: AI will increasingly participate in the movement of work through institutions.

Governance must therefore evolve. Tool approval is necessary but insufficient. Model evaluation is necessary but insufficient. Acceptable-use policies are necessary but insufficient. The next governance frontier is workflow oversight: knowing where AI acts, what it can do, who supervises it, how it is logged, when it stops, and how the institution remains accountable.

CentPol’s position should be direct. Institutions do not need to choose between innovation and responsibility. They need operating models that make both possible. The goal is not to slow AI adoption for its own sake. The goal is to ensure that as AI systems gain the ability to act, human institutions retain the ability to govern.