March 31, 2026

Where AI Budgets Go

Type

    Topics

      Industry

        Most organizations still try to understand AI costs the way they understand cloud. You look at a usage dashboard, multiply tokens by a rate card, and assume you now know what AI costs. That framing is already failing.

        Chip Huyen offers one of the clearest definitions of what an AI engineer actually does. It is not training models, and it is not clever prompt writing. It is the end-to-end work of building, evaluating, deploying, monitoring, and evolving systems that use models in real products. AI engineering is about making models operational, not just intelligent.

        Once you accept that definition, cost stops being something you can read off a bill. It becomes a property of how teams and systems are designed. This is where Team Topologies (how you structure teams determines how work flows) becomes more useful than GPU pricing. AI does not live in an API call. It lives inside an organization.

        The immediate cost drivers of an AI program sit in its first-order relationships. Product, data, platform, legal, security, and finance all have to interact with the AI team. Every handoff, every approval, and every dependency creates friction. When those relationships are poorly designed, cost increases even if model usage stays flat.

        There is also a baseline staffing cost that most companies underestimate. A real AI team needs a product manager who understands probabilistic systems and failure modes. It needs a technical program manager who can coordinate experiments, evaluations, and releases. It needs a tech lead who can design architectures that survive constant model churn. It also needs systems engineers who can design and run the compute stack and capacity planning for GPU demand. Without those roles, teams burn time rediscovering the same problems and undoing their own work.

        Planning and estimation are harder in AI because outcomes are uncertain. A prompt change might improve quality by 20 percent or degrade it. A new embedding model might reduce storage costs or double them. Switching model versions might boost accuracy but increase latency and GPU cost. A new safety filter might lower risk but increase false positives and block legitimate requests. That uncertainty forces teams to run more experiments, perform more evaluations, and hold more alignment conversations. You are paying not just for delivery, but for learning.

        This is where traditional cost models break. In conventional software, cost is roughly people × time. In AI engineering, cost is people × time × uncertainty. That uncertainty shows up as retries, reprocessing, reindexing, and revalidation.

        AI spend resembles cloud costs on the surface, showing up as a usage line item, but it behaves more like a blended budget with three distinct buckets. The first two are familiar: fixed costs (teams, data pipelines, evaluation infrastructure) and variable costs (tokens, GPUs, vector search, storage). The third is what most dashboards miss: the costs of uncertainty. This is what you pay when model behavior shifts, requirements change, or data drifts, triggering reruns, revalidation, and redesign. That bucket tends to be larger than anyone expects, because it scales with organizational complexity, not just usage.

        Prompt engineering is a good illustration. Every meaningful change to a prompt triggers downstream work. Test cases have to be regenerated. Evaluation runs have to be repeated. Results have to be compared and reviewed. That is not free. It is engineering labor, just expressed in a different form.

        Context construction is even more expensive. Retrieval systems require data pipelines, indexing jobs, schema decisions, and monitoring. A single bad document can degrade thousands of responses. Finding that root cause usually involves data engineers, platform engineers, and product teams. None of that shows up in a token invoice.

        When you look at what AI engineering teams actually do, a pattern emerges. Most of their time is spent on designing and maintaining prompt and context architectures, building evaluation frameworks, curating and cleaning data for retrieval, managing embeddings and vector stores, selecting and routing models, handling regressions, working with legal and security on data exposure, coordinating behavior changes with product, monitoring live usage, and refactoring pipelines as models evolve. The moment AI touches customer data, pricing, health, finance, or regulated decisions, governance becomes a core cost driver rather than an afterthought. Model and data risk reviews, privacy impact assessments, audit trails, and clearly defined ownership of system claims are non-negotiable. They determine whether the system can ship, where it can operate, and how quickly it can evolve.

        Those activities are where the real money goes.

        This is why CTO and CFO conversations about AI often feel misaligned. Finance sees a variable usage bill. Engineering sees a growing system that requires people, process, and constant attention to keep it stable. Both are correct, but they are looking at different parts of the same cost structure.

        The way forward is to manage AI like the complex socio-technical system it is. You invest in architectures that reduce uncertainty. You invest in evaluation so you do not pay for mistakes in production. You invest in team design and workflows instead of getting trapped in coordination loops.

        When AI feels expensive, token costs tend to draw the most attention. However, they are rarely the whole story. Organizations are also paying for friction, rework, and unmanaged uncertainty.