From AI Pilots to AI Operating Models: Why Institutions Struggle to Scale AI

AI Strategy

Operating Models

Implementation

Institutional Capability

Many institutions now have an AI pilot. Far fewer have an AI operating model. That gap explains why the public conversation around artificial intelligence often feels more advanced than the institutional reality. Leaders see demonstrations that appear transformative, but inside the organization the work still depends on fragmented data, manual approvals, uncertain ownership, weak procurement discipline, and staff who are expected to adopt tools without being given a redesigned way to work.

The result is the familiar AI scaling problem. A promising pilot produces a useful demo. A small team proves that a model can summarize documents, classify requests, draft responses, or analyze procurement data. Senior leaders become interested. Then the project slows down. It is not connected to core systems. The legal team has unresolved questions. The data is incomplete. The procurement route is unclear. Frontline workers do not trust the output. No one owns the workflow after the pilot budget ends.

This is not mainly a technology failure. It is an operating-model failure. McKinsey has argued that organizations need to rewire themselves around distributed digital and AI capabilities rather than treating technology as a side function.¹ Boston Consulting Group has made a similar point, emphasizing that AI value depends on the combination of technology, people, process, and governance rather than the model alone.² For CentPol’s audiences, the lesson is direct: institutions should stop asking only, “Which AI tool should we use?” They should also ask, “What operating model would allow AI to become useful, trusted, and accountable inside real work?”

The pilot trap

A pilot is useful when it tests a genuine question. It becomes a trap when it allows an institution to postpone harder decisions. Many AI pilots are designed to demonstrate possibility, not to test operational readiness. They are often built around a narrow use case with exceptional support, clean sample data, and a small group of enthusiastic users. These conditions rarely represent the institution as a whole.

The problem becomes visible when the pilot moves toward scale. At that point, the organization must answer practical questions that the demonstration did not resolve. Who maintains the model or workflow? How does the system access approved data? Which decisions require human review? What happens when the AI output conflicts with professional judgment? How will the institution measure value beyond usage statistics? What procurement terms protect the institution if the vendor changes pricing, performance, or data policies?

Pilot question	Scale question
Can the model perform this task in a controlled setting?	Can the institution run this workflow safely, repeatedly, and accountably?
Can a small team get value from the tool?	Can the tool fit into the organization’s operating rhythm?
Does the output look impressive?	Does it improve cycle time, quality, cost, compliance, or service delivery?
Can the project pass initial review?	Can governance, procurement, legal, security, and frontline operations support it over time?

The difference is not semantic. A pilot asks whether AI can do something. An operating model asks whether the institution can absorb AI into how it plans, decides, delivers, and learns.

What an AI operating model actually includes

An AI operating model is the set of structures, roles, processes, standards, data practices, procurement rules, and measurement systems that allow AI to be used responsibly at scale. It is not a slide deck and it is not a center of excellence that sits apart from the rest of the organization. It is the practical architecture of institutional adoption.

At minimum, a serious AI operating model includes six layers. First, it defines ownership. Every AI-enabled workflow needs a business owner, a technical owner, a risk owner, and a user owner. Without named responsibility, the system becomes everyone’s innovation and no one’s accountability.

Second, it establishes data foundations. AI systems are only as useful as the information they can access and the context they are allowed to use. Fragmented records, inconsistent taxonomies, missing metadata, and uncertain data rights will limit even strong models. NIST’s AI Risk Management Framework emphasizes that organizations must map context, measure performance and risk, and govern AI across the lifecycle.³ That lifecycle begins with knowing what data the system uses and why.

Third, it redesigns workflows. AI cannot simply be inserted into a process designed for manual execution. If a model drafts a document but every review step remains unchanged, the institution may gain little. If an AI tool flags risk but no one is responsible for acting on the signal, the risk remains. Workflow redesign means specifying which tasks are automated, which are augmented, which require human approval, and which remain outside the system.

Fourth, it integrates procurement and vendor governance. The EU AI Act, adopted in 2024, formalized a risk-based approach to AI regulation and created obligations for providers and deployers of certain AI systems.⁴ Even where the Act does not directly apply, its structure signals the direction of travel: institutions will be expected to understand risk, document use, and govern deployment. Procurement teams therefore need AI-specific criteria covering data handling, auditability, interoperability, security, model updates, incident reporting, and exit rights.

Fifth, it invests in workforce capability. Workers need more than tool access. They need role-specific AI literacy: how to frame tasks, verify outputs, identify risk, protect sensitive information, and escalate uncertainty. Microsoft’s research on AI at work points toward a future of human-agent teams, in which workers increasingly supervise and coordinate digital assistants rather than simply using standalone applications.⁵ That future requires training in judgment and workflow design, not only prompting.

Sixth, it measures institutional outcomes. Tool usage is not transformation. The right metrics depend on the workflow: reduced cycle time, improved decision quality, fewer errors, better compliance, lower administrative burden, faster service delivery, or more consistent documentation. Measurement must also capture negative effects, including review overload, overreliance, exclusion, privacy risk, and staff distrust.

Why institutions struggle to build the model

Institutions struggle because AI cuts across boundaries that most organizations manage separately. Technology teams think in systems. Legal teams think in risk. Procurement teams think in contracts. Program teams think in delivery. Funders think in outputs and evidence. Senior leaders think in strategy and reputation. AI touches all of these at once.

This creates a coordination problem. If the technology team leads alone, adoption may be technically competent but operationally weak. If the program team leads alone, the use case may be relevant but insecure or hard to maintain. If legal or compliance leads alone, the institution may avoid unacceptable risk but also avoid useful innovation. If leadership announces ambition without funding implementation, staff may experience AI as another unfunded mandate.

The operating-model approach solves this by treating AI as a shared institutional capability. It does not remove specialist roles; it connects them. A good model creates a clear path from idea to deployment: intake, risk classification, data review, workflow design, pilot testing, user training, procurement approval, evaluation, monitoring, and continuous improvement.

CentPol view

Scaling AI is not a matter of moving from one pilot to many pilots. It is a matter of moving from experimentation to institutional capability.

The public-sector and policy challenge

The operating-model problem is especially acute in the public and civic sectors. Public institutions often face legacy systems, procurement constraints, limited technical staff, high accountability requirements, and politically sensitive service environments. A poorly governed AI tool can damage trust quickly. Yet avoiding AI entirely may also carry costs: slower services, weaker evidence, higher administrative burden, and missed opportunities to improve policy delivery.

This is why public-sector AI adoption should be framed less as “digital modernization” and more as institutional reform. The question is not whether ministries, agencies, universities, or civic organizations can purchase AI tools. Many can. The harder question is whether they can create the conditions under which AI supports lawful, fair, transparent, and effective work.

For funders, this has implications for program design. Funding isolated AI pilots may produce case studies but not capability. Better support would help institutions build reusable foundations: data governance, responsible procurement templates, AI literacy curricula, evaluation methods, workflow libraries, and cross-functional implementation teams.

For researchers, the operating-model frame opens a more useful evaluation agenda. Instead of asking only whether a model performs a task, research should ask how AI changes organizational behavior, professional judgment, delivery speed, error patterns, access, trust, and accountability. The institution, not only the model, becomes the object of analysis.

A practical maturity model

Institutions can use a maturity model to determine where they actually are. The point is not to produce a flattering score. The point is to decide what capability must be built next.

Maturity stage	Typical behavior	Main risk	Next move
Exploration	Staff experiment with public AI tools informally.	Shadow AI, data leakage, uneven quality.	Issue clear guidance and identify priority workflows.
Pilot	A controlled team tests a defined use case.	Demo success without operational path.	Define owners, data access, risk tier, and evaluation criteria.
Integration	AI is embedded in selected workflows.	Governance and support lag behind adoption.	Build monitoring, training, procurement controls, and escalation paths.
Scaling	Multiple workflows use shared AI infrastructure.	Complexity, inconsistent standards, accountability gaps.	Establish portfolio governance and reusable implementation patterns.
Institutional capability	AI is part of strategy, budgeting, workforce development, and service design.	Complacency or over-automation.	Continuously evaluate impact, risk, and public value.

Most organizations are earlier on this curve than they believe. The goal is not to appear advanced. The goal is to move deliberately.

What leaders should do next

Leaders should begin by creating an AI workflow inventory. This is a simple but powerful exercise. It maps where AI is already being used, where staff are experimenting informally, which workflows have high pain and high value, and which processes are too risky for early automation. The inventory should include data sources, affected users, decision points, legal constraints, and expected outcomes.

Next, institutions should create a risk-tiering framework. Not every AI use case needs the same level of oversight. Drafting internal meeting notes is different from advising citizens, ranking applicants, screening vendors, or supporting decisions about benefits. A tiered framework allows innovation to proceed without pretending all risks are equal.

Third, leaders should redesign one or two priority workflows end to end. The best early candidates are important enough to matter but not so sensitive that failure would cause serious harm. Examples might include internal knowledge management, grant reporting support, procurement market scans, policy evidence summaries, or program monitoring dashboards. These use cases create value while teaching the institution how to govern AI.

Fourth, organizations should fund the unglamorous layer. Data cleaning, integration, security review, staff training, process documentation, evaluation, and change management often determine whether AI succeeds. They rarely receive the same attention as model selection, but they are where institutional value is created.

Finally, leaders should communicate honestly. AI should not be sold internally as a miracle tool or externally as a branding exercise. Staff need to know what is changing, what is not changing, how their expertise remains central, and what protections exist. Serious communication builds trust; inflated claims create resistance.

Conclusion: the real work begins after the demo

The next stage of AI adoption will reward institutions that can translate promise into operating discipline. Better models will help, but they will not fix unclear ownership, broken workflows, weak data, or absent governance. In fact, more capable models may expose those weaknesses faster.

The institutions that succeed will treat AI as a capability to be designed, governed, measured, and improved. They will connect strategy to workflow, procurement to accountability, data to service delivery, and workforce development to institutional learning. They will understand that the real question is not whether AI can generate impressive outputs. The real question is whether the institution can become more capable, more responsive, and more trustworthy because of it.

For CentPol, this is the strategic message: AI scale is not a technology milestone. It is an institutional achievement.

AI Strategy

Operating Models

Implementation

Institutional Capability

The pilot trap

Pilot question	Scale question
Can the model perform this task in a controlled setting?	Can the institution run this workflow safely, repeatedly, and accountably?
Can a small team get value from the tool?	Can the tool fit into the organization’s operating rhythm?
Does the output look impressive?	Does it improve cycle time, quality, cost, compliance, or service delivery?
Can the project pass initial review?	Can governance, procurement, legal, security, and frontline operations support it over time?

The difference is not semantic. A pilot asks whether AI can do something. An operating model asks whether the institution can absorb AI into how it plans, decides, delivers, and learns.

What an AI operating model actually includes

Why institutions struggle to build the model

CentPol view

Scaling AI is not a matter of moving from one pilot to many pilots. It is a matter of moving from experimentation to institutional capability.

The public-sector and policy challenge

A practical maturity model

Institutions can use a maturity model to determine where they actually are. The point is not to produce a flattering score. The point is to decide what capability must be built next.

Maturity stage	Typical behavior	Main risk	Next move
Exploration	Staff experiment with public AI tools informally.	Shadow AI, data leakage, uneven quality.	Issue clear guidance and identify priority workflows.
Pilot	A controlled team tests a defined use case.	Demo success without operational path.	Define owners, data access, risk tier, and evaluation criteria.
Integration	AI is embedded in selected workflows.	Governance and support lag behind adoption.	Build monitoring, training, procurement controls, and escalation paths.
Scaling	Multiple workflows use shared AI infrastructure.	Complexity, inconsistent standards, accountability gaps.	Establish portfolio governance and reusable implementation patterns.
Institutional capability	AI is part of strategy, budgeting, workforce development, and service design.	Complacency or over-automation.	Continuously evaluate impact, risk, and public value.

Most organizations are earlier on this curve than they believe. The goal is not to appear advanced. The goal is to move deliberately.

What leaders should do next

Conclusion: the real work begins after the demo

For CentPol, this is the strategic message: AI scale is not a technology milestone. It is an institutional achievement.

From AI Pilots to AI Operating Models: Why Institutions Struggle to Scale AI

AI Operating Models

The pilot trap

What an AI operating model actually includes

Why institutions struggle to build the model

The public-sector and policy challenge

A practical maturity model

What leaders should do next

Conclusion: the real work begins after the demo

How to read this insight

References

From AI Pilots to AI Operating Models: Why Institutions Struggle to Scale AI

AI Operating Models

The pilot trap

What an AI operating model actually includes

Why institutions struggle to build the model

The public-sector and policy challenge

A practical maturity model

What leaders should do next

Conclusion: the real work begins after the demo

How to read this insight

References

The pilot trap

What an AI operating model actually includes

Why institutions struggle to build the model

The public-sector and policy challenge

A practical maturity model

What leaders should do next

Conclusion: the real work begins after the demo

How to read this insight

References

Related CentPol insights

AI Agent Workflows and Orchestration: What Agentic Systems Mean for the Future of Work

Governing Agentic AI: Why Workflow Oversight Must Replace Tool-by-Tool Approval

The pilot trap

What an AI operating model actually includes

Why institutions struggle to build the model

The public-sector and policy challenge

A practical maturity model

What leaders should do next

Conclusion: the real work begins after the demo

How to read this insight

References

Related CentPol insights

AI Agent Workflows and Orchestration: What Agentic Systems Mean for the Future of Work

Governing Agentic AI: Why Workflow Oversight Must Replace Tool-by-Tool Approval