Token costs are dropping, but enterprise AI spending is exploding. West Monroe CIO Kevin Rooney explains why measuring AI activity misses the point and how to measure what actually matters.
For decades, IT organizations managed software licenses and compute capacity. Now they're managing something fundamentally different: intelligence, consumed in small increments. Every prompt, retrieval, agent action, and workflow step burns tokens: small chunks of text that AI systems process and vendors charge for. And, as CIOs are learning, those tokens add up fast.
Token pricing has dropped significantly in the last year, but enterprise AI spending keeps rising. Agentic AI workflows can require 5 to 30 times as many tokens per task as simple chatbots, according to Gartner. The resulting AI bills can surprise even the CIOs who approved the deployments.
At West Monroe, our adoption curve exploded as we moved from experimentation and individual chats to real applications and workflows. Based on usage during the initial rollout, we decided to revisit our contract to protect against overruns. We also did a deep dive into the top two percent of users, who accounted for 50% of usage. We found that heavy users weren't wasting tokens—they were generating real business impact across sales, delivery, and core workflows. The spending was worth it, but only because we could see what it was producing.
That visibility is exactly what most organizations lack. The problem isn't that organizations are using AI. It's that most IT leaders still can't answer a basic question: What did we get for what we spent?
For years, CIOs measured technology consumption in licenses and compute hours—units that were easy to track and tie to budgets. With AI, the unit has shifted to tokens: small, variable, and harder to connect to outcomes. CIOs need a new metric that bridges that gap. Call it Return on Tokens: business value created divided by tokens consumed. It's the difference between measuring AI activity and measuring AI impact.
Return on tokens isn't just a single number, though. It's a discipline. It requires connecting every AI deployment to a workflow and measuring what that workflow produces, whether that's revenue boosted, costs reduced, quality improved, risk mitigated, or cycle time shortened. Tokens without workflow context are noise. Deployed in the right workflows, though, tokens are leverage.
The Problem with Current AI Metrics
Most organizations measure AI the way they measured early cloud adoption: by activity. Prompts sent. Users onboarded. Copilots deployed. Adoption rates. These metrics tell you AI is being used, but they don't tell you whether it's creating real value.
Salesforce introduced "Agentic Work Units" earlier this year to help customers quantify what their AI agents do. The metric counts how many "things" AI did, but those things weren't directly tied to business outcomes. Analysts were skeptical.
Activity metrics like these answer the question "Are we using AI?" but dodge the harder one: "What changed because of it?" And it's that more difficult question that boards and CEOs are starting to ask. Directives to implement AI often came without clearly defined financial targets or accountability models. Now the bills are coming due, and CIOs need to present more than adoption curves.
What High-R.O.T. Organizations Do Differently
Organizations seeing real returns on their AI investments share a few characteristics:
-
They connect AI to workflows, not experiments. Standalone AI pilots are easy to launch and hard to measure. High-R.O.T. organizations embed AI into existing business processes—order management, customer service, claims processing—where the inputs and outputs are already defined and measurable. At West Monroe, we focused on our three largest workflows: lead-to-close, hire-to-retire, and delivery. We inserted AI into our platforms where it made sense and used existing processes and terminology to accelerate adoption. We track everything, tagged by how it's being applied, and correlate it with efficiencies and gains in core KPIs.
-
They baseline before they deploy. You can't assess improvement without knowing where you started. That means documenting cycle times, error rates, costs, and quality metrics before AI enters the workflow. But analysis paralysis is a killer. You don't need perfect baselines to start. We focus on the core business measures and hold ourselves accountable there: Is our pipeline growing? Are sales accelerating? Are we winning more? Is client satisfaction moving?
-
They measure across multiple dimensions. Speed alone doesn't equal value. Neither does cost reduction in isolation. High-R.O.T. organizations track quality, speed, risk, cost, and experience together because optimizing one at the expense of the others is a false economy. We measure at the highest level: OKRs that ensure we're making an actual impact that then cascade down to leading indicators for quality, speed, risk, cost, and experience.
-
They treat learning as an enterprise asset. The first deployment rarely delivers the highest return. Organizations that build feedback loops—capturing what works, scaling that, and killing what doesn't—compound their returns over time. Experimentation isn't a waste; it's an investment, but only if the learning is applied. We started with centers of excellence and now run firmwide programs across sales, delivery, and a builder lab for technical teams. Our AI champions share what's working across practices and bring insights back to their teams. The next frontier will be turning the individual knowledge that power users have built into something the whole enterprise can leverage.
The New Governance Question
Measuring Return on Tokens isn't just an accounting exercise—it's a governance question. The old model was all about restrictions: who could use AI, what data it could access, and what it wasn't allowed to do. Those guardrails still matter. But governance is increasingly about direction: where should intelligence spending flow to create the most enterprise value?
This is as much a resource allocation issue as a risk management problem. CIOs who can answer "Where are we getting the highest return on tokens?" can make better investment decisions than those who can only answer "How much are we spending?"
The shift requires visibility that most organizations don't have yet. Real-time monitoring of token consumption by workflow, tied to outcome metrics, is becoming the new table stakes. FinOps discipline—already essential for cloud cost management—will become mandatory for AI.
Stop Celebrating Consumption. Start Proving Impact.
Token costs will keep falling. Enterprise consumption will keep rising. Cheap consumption without measurable value won't hold up forever.
Boards and CEOs are shifting from "Are we using AI?" to "What are we getting for it?" CIOs who can answer that question with data will earn continued investment. Those who can't will find their AI budgets under the same scrutiny that cloud spending faced a decade ago.
Written by Kevin Rooney
Kevin Rooney is the Chief Information Officer at West Monroe, where he leads the firm's technology strategy and operations. Since becoming CIO in 2022, he has spearheaded initiatives like West Monrobot, a conversational AI chatbot that streamlines internal tech support, and driven significant cost savings through automation, data platforms, and upskilling. Kevin joined West Monroe in 2019 as Chief Administrative Officer, creating the Shared Services function, and previously served as an Executive Partner at Gartner. He holds a degree in Management Information Systems from the University of Notre Dame and was named CIO+ of the Year by SIM Chicago.