Your AI Does not Know What “Income” Means. That’s a Larger Downside Than You Suppose.

08 May 2026

1

Here’s a situation that performs out continually in enterprise software program groups. A product supervisor asks the corporate’s AI assistant: “Who’re our high prospects this quarter?” The system returns a clear, ranked record. It appears proper. Everybody strikes on.

Besides the product group defines “high” by engagement. Finance defines it by internet income. Gross sales defines it by deal measurement. The AI picked one interpretation, introduced it with full confidence, and no one seen till a technique resolution acquired made primarily based on numbers that meant one thing completely different to each particular person within the room.

This isn’t hallucination in the best way folks normally discuss it. The system didn’t make something up. It simply made a alternative about which means that was by no means its option to make.

The Actual Downside Isn’t the Mannequin

There’s a widespread assumption in enterprise AI adoption that for those who decide the suitable mannequin, tune it rigorously, and feed it good information, you’ll get dependable outputs. That assumption misses the precise failure mode.

LLMs are terribly good at language. They don’t seem to be good at organizational which means. Ask your AI what your churn charge is, and watch what occurs. The mannequin doesn’t know whether or not you measure churn on the subscription stage or the client stage. It doesn’t know whether or not you depend downgrades or ignore them. It doesn’t know if enterprise accounts with a number of seats are dealt with in a different way. These will not be solutions buried in a doc someplace. They’re organizational choices that stay in tribal data, workforce agreements, and information mannequin feedback written two years in the past by somebody who has since left the corporate.

The mannequin will infer. And inference, introduced with confidence, is a legal responsibility.

Embeddings Don’t Repair This

The usual response to this drawback is healthier retrieval. Embed your documentation, pull probably the most related chunks, give the mannequin extra context. It’s an inexpensive instinct and a partial enchancment. However it doesn’t remedy the underlying problem.

Embeddings measure how shut two items of textual content are in vector house; they are saying nothing about whether or not a given interpretation is definitely appropriate to your group. “Income” and “revenue” are neighbors in embedding house as a result of they seem collectively continually in monetary writing. In your monetary reporting system, conflating them is a severe error. No quantity of retrieval resolves that as a result of the right reply isn’t in any doc. It’s in a choice your finance workforce made about how you can outline issues, in all probability years in the past, in all probability by no means written down in a kind a machine can use.

The identical structural drawback reveals up all over the place. “Lively person” means one thing completely different to your engineering workforce (an API name) than to your product workforce (a accomplished transaction). “Conversion” means a profitable HTTP request to 1 workforce and a signup-to-paid development to a different. “Engagement” is occasion frequency in a single dashboard and session depth in one other. Retrieval doesn’t resolve definitional ambiguity. It simply retrieves extra textual content that incorporates the anomaly.

Determine 1: With no semantic layer, LLM outputs are believable however inconsistent. With one, they’re grounded and proper.

What Truly Must Occur

The reply is a semantic layer, a structured, machine-readable illustration of what your group’s phrases really imply. Not a glossary. Not higher documentation. A proper encoding of entities, relationships, metrics, and disambiguation guidelines that sits between your information and your AI system, in order that when somebody asks about churn or lively accounts or high prospects, the system isn’t guessing.

This isn’t a brand new thought within the information world. Instruments like dbt and Looker have utilized it to enterprise intelligence for years. What’s new is the stress to increase it into AI pipelines, and the tooling is catching up: the dbt Semantic Layer now helps direct AI pipeline integration, and platforms like Dice are constructing native LLM connections for precisely this objective.

The sensible place to begin for many groups is a schema-based method: YAML or JSON configuration information, version-controlled in git, injected at inference time. Much less rigorous than formal ontologies, however dramatically extra maintainable, and normally adequate. If you have already got a BI semantic layer, your definitional work is basically carried out. The problem is making it queryable when the AI wants it.

The Tougher Downside Is Organizational

Right here’s what most structure posts miss: the technical implementation is the straightforward half. Getting three departments to agree on what “lively” means will not be. Constructing and sustaining a semantic layer forces conversations that organizations routinely keep away from, and it surfaces disagreements which have been quietly producing inconsistent outcomes for years. That’s uncomfortable. It’s additionally the purpose.

There’s a easy take a look at I exploit: if a brand new rent would want to learn inner documentation to grasp what a key enterprise time period means, that time period belongs in a semantic layer, not in a immediate.

The subsequent part of enterprise AI isn’t about which mannequin you employ. It’s about how effectively your group has systematized its personal data for machine consumption. From immediate engineering to context engineering. From information pipelines to which means pipelines. The groups that get this proper will produce AI outputs that aren’t simply fluent; they’ll be appropriate. In enterprise methods, being fluent will not be sufficient. In case your AI will not be definitionally appropriate, it’s operationally unreliable.

As a substitute of asking: “Who’re our high prospects?” — Outline it:

TopCustomer = revenue_last_90_days > $50K AND active_subscription = true

Your AI Does not Know What “Income” Means. That’s a Larger Downside Than You Suppose.

The Actual Downside Isn’t the Mannequin

Embeddings Don’t Repair This

What Truly Must Occur

The Tougher Downside Is Organizational

Related Articles

SED Information: Anthropic’s Mythos, Provide Chain Hacks, and the AI Spending Surge

AI Is Producing Extra Checks. However Are They Stopping the Subsequent Cloud Outage?

The ELM Library: An LLM Analysis Toolset

LEAVE A REPLY Cancel reply

Latest Articles

SED Information: Anthropic’s Mythos, Provide Chain Hacks, and the AI Spending Surge

AI Is Producing Extra Checks. However Are They Stopping the Subsequent Cloud Outage?

The ELM Library: An LLM Analysis Toolset

Birol Yildiz on Constructing an Agentic AI SRE – Software program Engineering Radio

New in Claude Managed Brokers: dreaming, outcomes, and multiagent orchestration