Reference Architecture for Enterprise Internal AI System

Jun 15

Enterprise Architecture · AI Strategy

Building an Enterprise Internal AI System: A Reference Architecture

A five-layer reference model for enterprise architects designing LLM, chat, memory, and skills infrastructure — with governance baked in from the start.

Published June 2026

Sources Gartner · LangChain · Anthropic · Microsoft

Reading time ~9 min

Executive Summary

Enterprises building internal AI systems need more than a chat interface bolted onto an LLM API. A durable architecture requires five distinct layers — experience, orchestration, intelligence, memory, and infrastructure — each with clear ownership boundaries and governance hooks. The most consequential design decision is the orchestration layer, where the choice between single-agent, multi-agent, and workflow-based patterns determines how well the system scales, audits, and adapts. Governance, data classification, and project-scoped memory isolation are not afterthoughts — they are the preconditions for regulatory compliance and controlled rollout across an enterprise.

🏗️The five-layer reference architecture

Most enterprise AI projects begin with a single use case — a chatbot, a document search tool, a code assistant — and evolve organically from there. Without a deliberate architectural frame, these point solutions accumulate into a fragmented estate that is hard to govern, expensive to operate, and impossible to audit. A layered reference architecture solves this by giving each concern a home.

The five layers, from user-facing to infrastructure, are: the experience layer (every channel through which users reach the AI — chat interfaces, IDE plugins, Slack bots, embedded widgets); the orchestration layer (the routing, planning, and tool-dispatch logic that sits between a user request and the models); the intelligence layer (foundation LLMs, embedding models, rerankers, and classifiers); the memory and storage layer (the vector stores, document stores, graph and relational databases that give the system persistent knowledge); and the infrastructure layer (inference compute, API gateways, model registries, message buses, and the observability stack). A vertical governance band — identity, audit, cost controls, and security policies — cuts across all five layers simultaneously.

Experience layer — keep it thin

The experience layer should be channel-agnostic. Build a shared SDK or API contract that all front-ends consume; never let channel-specific logic (Slack formatting, HTML rendering, voice turn-taking) leak into lower layers. Enforce SSO and identity propagation from day one — every request must carry an authenticated principal that the downstream audit log can reference.

Infrastructure layer — two founding decisions

Before any other infrastructure choice, enterprise architects must answer two questions: where does inference run (cloud API, self-hosted, or hybrid), and who owns the model weights (commercial vs open). For European enterprises especially, data residency requirements often mandate private-hosted models for sensitive workloads. Architect for both paths from the start, even if you begin on cloud APIs.

🔀The orchestration layer: the most consequential choice

If the experience layer is what users see and the intelligence layer is where reasoning happens, the orchestration layer is where the system's character is defined. It is also the layer most likely to become technical debt if chosen poorly. There are three dominant patterns.

Single-agent monolith

A single LLM session handles routing, reasoning, and tool use. Simple to deploy and reason about — appropriate for early proof-of-concept work. Does not scale well to multiple specialised use cases, and system prompts become unwieldy as capability grows. Avoid as a long-term architecture.

Multi-agent with a router

A routing LLM or intent classifier dispatches requests to specialised agents — an HR agent, a code agent, a data agent, a policy agent. Each agent operates with a scoped system prompt and a constrained tool set. Scales well to diverse enterprise use cases, adds modest latency at the routing step, and requires careful design of inter-agent handoffs. This is the recommended default for most enterprise deployments.

Workflow orchestration

Deterministic pipelines (LangGraph, Temporal, Prefect) define explicit state machines for LLM-assisted workflows. Each step is logged, retriable, and auditable. Best suited for compliance-sensitive flows — contract review, financial approvals, regulated document generation — where full auditability of every step is non-negotiable. Higher implementation overhead; not the right default for conversational use cases.

"The orchestration layer is where most enterprise AI projects either earn or lose the trust of their compliance teams. If you cannot reconstruct what the agent decided and why, you cannot operate in a regulated environment." — Enterprise Architecture principle, adopted from LangChain and Temporal design guidance

For most organisations, the practical path is to start with a multi-agent router and evolve toward workflow orchestration for regulated flows as those use cases emerge. Build the router as a swappable component from the outset.

🧠Memory architecture: four distinct memory types

Enterprise AI systems need memory that extends well beyond the LLM's context window. A useful taxonomy distinguishes four types, each with a different scope, latency profile, and appropriate backend.

μs

Working memory — in-context session state, bounded by token limit

Backend: LLM context window

Episodic memory — past interactions and events, scoped to user or team

Backend: vector store + summaries

~10ms

Semantic memory — enterprise knowledge corpus, org or team scoped

Backend: vector + document store

Procedural memory — skills, tools, agent plans, system-wide

Backend: KV store / graph DB

Projects as the isolation unit

A "project" in your AI system should bundle together a vector store namespace, a set of allowed tools, a system prompt, and an access policy. This gives you the isolation unit you need for both compliance (data stays within the project boundary) and cost allocation (token consumption and storage are attributable to a team or department). Design projects as first-class architectural citizens, not as an afterthought.

Skills as versioned artifacts

Treat agent skills — tool definitions, prompt templates, few-shot examples — as code: versioned in Git, tested via eval harnesses, promoted through dev, staging, and production environments. The most common enterprise AI failure mode is skills that drift silently in production, producing subtly different outputs over time with no audit trail and no rollback path.

🔒Governance: the precondition, not the afterthought

Enterprise architects who treat governance as a layer to add once the system is working will find themselves in an expensive retrofit. Governance decisions — data classification, identity propagation, audit logging, cost controls — need to be wired into the architecture before any production traffic flows.

Data classification first

Define your data classification tiers (public, internal, confidential, restricted) before any model touches data. Map each tier to an allowed compute boundary — which tiers can flow to cloud APIs, which must stay on-premises, which require encryption at rest and in transit. Encode this as policy-as-code so every new integration is automatically validated against classification rules.

Model substitutability

Wrap every LLM call behind a ModelProvider abstraction so you can swap vendors, add privately hosted models, or route by classification tier without rewriting orchestration logic. This abstraction also enables A/B testing between models and cost-optimised routing — sending simple tasks to smaller, cheaper models and reserving large frontier models for complex reasoning tasks.

Prompt injection defence

Enterprise AI systems that ingest user-provided or external content are vulnerable to prompt injection — adversarial instructions embedded in documents, emails, or tool outputs that attempt to hijack the agent's behaviour. Build input sanitisation, output validation, and permission-scoped tool execution into the orchestration layer from the start. Do not rely on model-level safeguards alone.

Observability and cost governance

You need distributed traces across the full LLM call chain: input prompt, retrieved context chunks, tool calls, model output, latency, token cost, and user feedback signal. Without this, you cannot diagnose quality regressions, attribute costs to business units, or demonstrate compliance to auditors. Build this from day one — retrofitting observability onto a running AI system is significantly harder than including it in the initial design.

✅Readiness assessment for enterprise architects

Before committing to a production rollout, assess your organisation's readiness across these two dimensions.

✓ Strong foundations — move forward

Existing SSO and identity infrastructure that can propagate principals into AI requests. A data classification framework already in use for other systems. An observability platform (such as Elastic) that can be extended to cover LLM traces. Engineering teams familiar with API abstraction patterns and infrastructure-as-code.

⚠ Address before scaling

No formal data classification — models may process data they should not have access to. Skills and prompts managed informally outside version control. No cost attribution model for AI workloads — usage will be invisible to finance. Orchestration logic tightly coupled to a single LLM vendor with no abstraction layer.

📚Further reading

LangChain LangChain architecture concepts and agent patterns
Anthropic Building effective agents — Anthropic research
Microsoft RAG solution design and evaluation guide — Azure Architecture Center
Gartner What is agentic AI — Gartner explainer

AI Fluency Notice · Diligence Requirement

About This Content & Verification Obligations

This article was generated by Claude Sonnet 4.6, an AI assistant developed by Anthropic. It was produced by synthesising publicly available research, architectural guidance, and best-practice documentation from the sources listed above, retrieved in June 2026.

In the spirit of the AI Fluency model, readers are reminded of the following diligence obligations before relying on this content for business, investment, or technical decisions:

Architectural patterns and technology recommendations should be validated against your organisation's specific regulatory environment, existing infrastructure, and team capabilities before adoption.
AI-generated summaries can introduce paraphrasing errors, missed nuance, or context loss. The original sources represent the authoritative record.
The AI tooling landscape evolves rapidly. Technology recommendations (specific frameworks, vendors, model families) may be partially outdated at the time of reading.
This content does not constitute professional, legal, or regulatory advice. Organisations making architectural decisions based on this content should engage qualified enterprise architects and compliance specialists.

Responsible AI deployment requires human oversight, source verification, and contextual judgment — the very principles this article advocates for.

Generated by Claude Sonnet 4.6 · Anthropic · June 2026

Hong Zhu