← Back to Explorer

What is {a}OS Explorer?

The interactive taxonomy and comparison engine for the agentic era.

{a}OS Terminology Primer — Stratum, Substrate, Construct, Primitive

Each level zooms deeper into the one above — from the full 7-stratum stack down to atomic primitives.

{a}OS Explorer is a web application that lets you browse, compare, and understand AI products, frameworks, agents, workflows, and skills through a structured 7-layer reference model.

Think of it as the OSI model for the agentic era — but interactive.

New to the vocabulary?

Use this translation first:

  • Stratum = responsibility layer
  • Substrate = systems in that layer
  • Construct = thing they create or manage
  • Primitive = the smallest field, signal, or action inside it

The User Guide starts with a worked example so the vocabulary earns its keep instead of asking you to memorize jargon.

Key insight Most AI tooling sites tell you what a product claims to do. {a}OS Explorer tells you where it actually belongs, what it overlaps with, and what gaps appear when you compose systems together.

Why It Exists

The agentic AI ecosystem is growing fast. Hundreds of products, frameworks, and tools compete across overlapping domains. Architects, engineers, and technical leaders need a way to:

  • Understand where a product sits in the stack
  • See how products overlap or leave gaps
  • Compare Azure-native vs aftermarket solutions
  • Identify hidden dependencies between systems
  • Defend adoption decisions with evidence

Who It's For

User Need
Architects Compare stacks, identify overlap, assess governance implications
Engineers Understand where tools sit and which layer is failing
CTOs / VPs Explain architecture and defend adoption decisions
Procurement Compare Azure-native vs aftermarket solutions
Security Map compliance and governance controls across the stack

Key Features

🔍
Progressive Drilldown
Navigate Stratum → Substrate → Construct → Primitive. The UI unfolds — no page reloads.
Compare Basket
Add up to 4 products and see side-by-side coverage, overlaps, and gaps.
📊
Evidence-Backed
Every classification includes confidence scores and human-readable rationale.
🔀
Multi-Residency
Products can span multiple layers. The UI never forces false exclusivity.
🧬
Ontology-First
Structured data model beneath every view — not ad-hoc categories.
Keyboard-First
Full keyboard navigation. Press / to search, 17 to jump, ? for shortcuts.

The {a}OS Reference Stack

7 strata. 2 axes. From user intent down to raw model infrastructure.

The 7 Strata

Each stratum answers a specific boundary question about the agentic system:

L7
Experience & Intent
What did the human actually want — and did they get it?
L6
Governance & Trust
Who approved this action and under which policy?
L5
Observability & Evaluation
How well did this work, and can we prove it?
L4
Orchestration & Decisioning
What happens next, in what order, with which agent?
L3
Execution & Interfaces
How does the agent act on the world?
L2
Knowledge & Memory
What does the system know, and how does it recall?
L1
Models & Infrastructure
What is the raw intelligence, and where does it run?

Cross-cutting Axes

Two axes cut across all strata, representing concerns that apply at every layer:

Axis Scope
Governance & Trust Policy enforcement, identity/access, compliance, audit, safety filters, content moderation
Observability & Evaluation Distributed tracing, structured logging, benchmarks, A/B testing, drift/anomaly detection, SLO alerting

Ontology Hierarchy

Every AI product or tool is classified through a 4-level hierarchy:

Level Definition Example
Stratum Top-level layer in the stack L4 Orchestration & Decisioning
Substrate Tools/frameworks operating in that stratum Workflow Engines, Agent Frameworks
Construct Meaningful artifact produced by a substrate execution_plan, agent_roster
Primitive Smallest atomic unit inside a construct task_id, retry_count, priority

Core Agentic Concepts

These are the current first-class agentic concepts the explorer should understand semantically — not just as raw string matches. Each concept below includes its current top-3 mapped products or workflows in the catalog.

Concept Definition Current Top 3
Harness Governing scaffold around execution: sequencing, context control, checkpoints, and stop conditions. BMAD · GSD · Superpowers
loop-core Canonical control cycle for agent behavior: plan → act → evaluate → adjust. LangGraph · Eval Workflow · Human-in-the-Loop
loop-research Discovery loop for gathering evidence, validating sources, and refining the problem before implementation. Azure AI Search · Exa MCP · RAG Pipeline
loop-eval Quality loop for scoring outputs, diagnosing failures, and feeding improvements back into prompts or workflows. Eval & Improve Loop · Agent Skills · Testing Strategy
loop-implementation Delivery loop for spec → build → test → review → iterate. BMAD Method · Superpowers · Claude Code
loop-recovery Failure-handling loop for retry, rollback, checkpoint, and resume semantics. GSD · Shipyard · LangGraph
RAG Retrieval-augmented generation: grounding model outputs with retrieved evidence and citations. Azure AI Search · RAG Pipeline · Mem0
MCP Model Context Protocol — a standard interface for exposing tools, resources, and prompts to agents. GitHub MCP · Playwright MCP · Filesystem MCP
Memory Persistent or working context used across turns, sessions, and tasks. Mem0 · Memory MCP · Hermes Agent
Governance Rules, permissions, approvals, safety boundaries, and audit logic that constrain action. Open Policy Agent · Security Audit · Human-in-the-Loop
tool-calling The construct enabling a model to select, format arguments for, and invoke external APIs. MCP Filesystem · MCP Brave Search · Agent Skills
multi-agent Coordination where multiple specialized agents collaborate, delegate tasks, and share state. CrewAI · LangGraph · OpenHands
handoff Secure transfer of control, working context, and state from one agent to another. LangGraph · CrewAI · Human-in-the-Loop
checkpoint Persists an agent's exact state at a specific moment for time travel, durable execution, and fault recovery. LangGraph · Mem0 · MCP Memory
fallback Implements retry strategies, graceful degradation, or alternate routing when a primary step fails. LangGraph · Claude Code · RAG Pipeline
routing Dynamically dispatches requests to the most appropriate model, prompt, or specialized agent. Azure AI Foundry · LangGraph · CrewAI
guardrails Validates inputs and outputs to ensure safety, policy compliance, and defense against prompt injection. OPA · SK Security · Human-in-the-Loop
observability-loop Captures end-to-end traces and metrics to evaluate decisions, detect regressions, and drive improvement. Eval Workflow · LangGraph · CI/CD Workflow
approval-loop Pauses agent execution at critical junctures to require explicit human sign-off before proceeding. Human-in-the-Loop · Claude Code · Cursor
memory-loop Continuous cycle of storing, retrieving, and consolidating interactions into episodic and semantic stores. Mem0 · Azure AI Search · MCP Memory
context-engineering Dynamically managing, compressing, and structuring information fed into the context window for maximum reasoning. GitIngest · Cursor · Claude Code
planning Hierarchical task decomposition and controlled sequencing that keeps long-horizon execution on-policy. LangGraph · CrewAI · Hermes

Head-to-Head Matchups

These curated comparison sets mirror the in-app Head-to-Head tab and are intended for structured selection tradeoffs.

Matchup When to choose #1 When to choose #2 When to choose #3
LangGraph vs CrewAI vs AutoGen LangGraph: deterministic, stateful production workflows CrewAI: fast role-based multi-agent prototyping AutoGen: conversational research coordination
Mem0 vs Zep vs Custom Vector Stores Mem0: fast personalization + memory integrations Zep: temporal fact evolution and graph memory Custom: full control of retrieval stack and operations
Claude Code vs Devin vs Aider Claude Code: deep multi-file reasoning/refactors Devin: autonomous backlog execution in cloud sandbox Aider: git-native local pair-programming
Azure AI Search vs Pinecone vs Weaviate Azure AI Search: enterprise hybrid search in Azure Pinecone: simplest low-ops serverless vector backend Weaviate: flexible hybrid + multimodal/search modeling
OPA vs Cedar vs Custom Policy OPA: broad policy-as-code governance across stack Cedar: high-performance AWS-aligned authorization Custom: early prototype only, low governance maturity

Golden Path Architectures

Recommended starter stacks that combine product synergy with clear mission boundaries.

Build Mission Core Stack Not ideal for
Enterprise RAG Fortress Accurate, governed enterprise retrieval with strict access control and policy enforcement. Azure AI Foundry · OPA · LangGraph · Azure AI Search · Mem0 Lightweight chatbots without policy/memory requirements
Autonomous Code Forge Multi-file implementation + review + evaluation + deployment with quality gates. Claude Code · MCP Filesystem · SK Code Review · Eval WF · CI/CD WF Fast inline visual autocomplete-first pair programming
Collaborative Research Swarm Role-based research synthesis with retrieval and scored output quality control. CrewAI · MCP Brave Search · MCP Memory · Eval WF Deterministic transaction processing with strict rollback
Full-Stack App Factory Prompt-to-UI acceleration with backend wiring and automated deployment path. v0 · Cursor · MCP PostgreSQL · MCP GitHub · Shipyard Hands-off autonomous backlog clearing
Regulated Compliance Engine High-risk workflows with deterministic control flow, policy enforcement, and mandatory approval gates. LangGraph · OPA · SK Security · Human-in-Loop WF · Eval WF Open-ended exploratory or creative ideation workflows

Glossary

Term Meaning
Primary Stratum The single layer where a product delivers its core value
Secondary Stratum Additional layers a product touches with meaningful capability
Axis Role A product's contribution to a cross-cutting concern (governance or observability)
Confidence Score 0.0–1.0 rating of how certain a classification placement is
Rationale Human-readable explanation of why a product is placed at a given stratum
Multi-Residency A product spanning multiple strata simultaneously
Harness A structured wrapper that constrains how an AI agent or pipeline executes — enforcing sequencing, context budgets, checkpoints, and stop conditions. A harness sits at L4 (Orchestration) and defines how work is done, not just what to do. Examples: execution harnesses (GSD, BMAD), test harnesses (Superpowers), and CI/CD harnesses (Harness.io). Not to be confused with the agent or the tools it calls — the harness is the governing scaffold around them.
→ preview in Explorer
loop-core The canonical agentic control cycle: plan → act → evaluate → adjust, repeated until stop conditions are met.
loop-research Discovery/evidence loop: gather sources, test hypotheses, resolve contradictions, then refine the problem framing.
loop-eval Evaluation loop: score outputs against quality criteria, diagnose failures, and feed fixes back into prompts, routing, or policy.
loop-implementation Build loop: spec → implement → test → review → iterate. Focuses on production-safe delivery and regression control.
loop-recovery Failure-handling loop: backoff, retry, checkpoint, rollback, and resume — designed to preserve forward progress under partial failure.
Agent An LLM-backed system that perceives context, reasons about goals, selects tools or sub-agents, executes actions, and updates state — in a loop until a stop condition is met. Agents span L3–L7 depending on capability; the term alone does not imply a specific stratum.
Primitive The smallest named capability unit within a construct (e.g. goal_statement, tool_call). A product can support a primitive partially or fully; evidence quality determines confidence.
Construct A named cluster of related primitives within a stratum (e.g. Intent Object at L7). Constructs are the intermediate unit between stratum and primitive in the {a}OS ontology hierarchy.

Classification Methodology

How products are mapped to the {a}OS reference model.

The Classification Model

Each entity in the explorer receives a structured classification:

Field Description
Primary Stratum The single layer where the product delivers its core value
Secondary Strata Additional layers the product touches with meaningful capability
Axis Roles Contributions to Governance & Trust or Observability & Evaluation
Capabilities Named features mapped to specific strata
Constructs Artifacts or state objects the product creates or manages
Primitives Atomic units the product reads or writes
Confidence 0.0–1.0 certainty rating
Rationale Explanation of the classification
Evidence Optional links, quotes, or documentation

Confidence Levels

Level Score Badge Meaning
High ≥ 0.85 0.92 Strong evidence, clear primary layer, multiple corroborating sources
Medium 0.60 – 0.84 ~0.75 Reasonable placement, may have secondary interpretations
Low < 0.60 ~0.45 Disputed or emerging product, placement may change

Placement Rules

  1. One primary stratum — every product gets exactly one core layer
  2. Zero or more secondary strata — meaningful but non-primary capabilities
  3. No forced exclusivity — a product spanning 5 layers shows all 5
  4. Primary is visually stronger — full opacity vs dimmed for secondary
  5. Evidence-backed — every placement has a rationale

Disputed Classifications

When a classification is uncertain:

  • Confidence score will be Medium or Low
  • Rationale will note the alternative interpretation
  • Future versions will support explicit dispute annotations
Example dispute A product classified as L4 (Orchestration) primary might note: "Could be argued as L3 (Execution) if tool-calling is considered the core loop rather than multi-step planning."

Product Classifications

The MVP product set classified against the {a}OS reference model.

MVP Product Set

Product Vendor Primary Secondary Deployment Confidence
Azure AI Foundry Microsoft L1 L2, L3, L4 Cloud 0.92
Hermes Internal L4 L5, L6, L7 Hybrid 0.88
Paperclip Internal L7 L3, L4 Local ~0.75
CrewAI CrewAI Inc L4 L3 Cloud 0.90
Mem0 Mem0 AI L2 Cloud 0.95
LangGraph LangChain L4 L3 Hybrid 0.88
Azure AI Search Microsoft L2 Cloud 0.93
Open Policy Agent CNCF L6 Hybrid 0.95

Azure AI Foundry

Primary: L1 Models & Infrastructure   0.92

Secondary: L2 Knowledge, L3 Execution, L4 Orchestration

Rationale: Built on Microsoft.CognitiveServices. Model hosting and inference is the core value prop; prompt flow, search, and agent capabilities are secondary strata.

Hermes

Primary: L4 Orchestration & Decisioning   0.88

Secondary: L5 Observability, L6 Governance, L7 Experience

Rationale: Hermes is an orchestration-first platform. Governance, observability, and UX layers are secondary capabilities.

Paperclip

Primary: L7 Experience & Intent   ~0.75

Secondary: L3 Execution, L4 Orchestration

Rationale: Desktop copilot with strong UX and intent parsing. Orchestration and execution are secondary.

CrewAI

Primary: L4 Orchestration & Decisioning   0.90

Secondary: L3 Execution

Rationale: Multi-agent crew framework focused on orchestrating agent collaboration and task delegation.

Mem0

Primary: L2 Knowledge & Memory   0.95

Secondary: None

Rationale: Purpose-built memory layer for AI. Manages context persistence, recall, and long-term agent memory.

LangGraph

Primary: L4 Orchestration & Decisioning   0.88

Secondary: L3 Execution

Rationale: Stateful graph-based agent orchestration framework from LangChain. Defines execution flows as directed graphs with state management.

Azure AI Search

Primary: L2 Knowledge & Memory   0.93

Secondary: None

Rationale: Cloud-native search-as-a-service with vector + hybrid retrieval. Core of many RAG pipelines.

Open Policy Agent

Primary: L6 Governance & Trust   0.95

Secondary: None

Rationale: General-purpose policy engine (Rego-based). Evaluates access control, resource constraints, and compliance policies.

User Guide

How to get the most out of {a}OS Explorer.

Terminology Primer

Five nouns power the entire {a}OS model. Lock these in and the Explorer feels intuitive instead of academic.

Plain-English shorthand — Every official term has a plain alias in parentheses. Use whichever clicks. Both appear throughout the Explorer UI.
You see… We call it What it means Explorer example
Pick a layer Stratum (layer) One of 7 horizontal tiers in the stack, each defined by a boundary question — “Who decides?”, “How does it learn?”, etc. L4 Orchestration & Decisioning
Notice a theme that cuts across layers Axis (cross-cut) A concern that spans multiple strata vertically — security, cost, latency, compliance, etc. Axes surface in every layer but are governed holistically. Security axis — touches L6 Governance, L3 Execution, and L1 Infrastructure
Expand a group inside a layer Substrate (group) A named group of related capabilities within a stratum. Each substrate bundles the frameworks, engines, or services that do work at that level. Workflow Engines or Agent Frameworks inside L4
Click a card Construct (concept / card) A named concept or data structure within a substrate. Constructs are the items you browse, compare, and classify in the Explorer. execution_plan, agent_roster
Read the fields on a card Primitive (field) The smallest atomic part inside a construct — a field, flag, token, parameter, or actionable unit. task_id, retry_count, priority
Optional mental image — If it helps, picture the stack as a vertical city: each stratum is a floor, substrates are the departments on that floor, constructs are the deliverables they produce, and primitives are the parts inside each deliverable.

Terminology Infographic

{a}OS Terminology Primer — Stratum, Substrate, Construct, Primitive

Each level zooms deeper into the one above — from the full 7-stratum stack down to atomic primitives.

Quick Start

  1. Open the Explorer — you'll see the 7-stratum reference stack
  2. The guided tour will introduce the key areas
  3. Click any stratum to expand its substrates
  4. Click a substrate to see its constructs
  5. Click product cards (right panel) to see classification details

Browsing the Stack

The central canvas shows the {a}OS stack — 7 layers stacked vertically from L7 (top, user-facing) to L1 (bottom, infrastructure).

Each layer card shows:

  • A color-coded bar on the left edge
  • A layer number (L1–L7)
  • A name and boundary question
  • A substrate count

Click any layer to expand it in-place. Inside you'll find substrates, each with their own constructs. The breadcrumb at the top tracks your depth: {a}OS Stack → L4 Orchestration → Workflow Engines

Comparing Products

Click the + button on any product card to add it to your compare basket (up to 4). The badge in the header shows your count.

The compare view provides:

  • Layer coverage heatmap
  • Axis role comparison
  • Construct/primitive overlap table
  • Gap and overlap summary

Filtering

Discoverability tip The top mode pills and hero chips mirror the left rail. If you click a quick chip like Frameworks, MCPs, or Open Source, the matching sidebar filter updates so you can learn what the full filter system supports.

The left rail provides filter groups:

Filter Values
Type Products, Frameworks, Workflows, Agents, Skills, MCPs
Vendor Microsoft, CrewAI Inc, Mem0 AI, CNCF, etc.
Deployment Cloud, Local, Hybrid
License Open Source, Proprietary
Stratum L1–L7
Substrate Agent Frameworks, Workflow Engines, Policy Engines, etc.
Construct execution_plan, agent_roster, policy_bundle, memory_index

Keyboard Shortcuts

Key Action
/ Focus search bar
17 Jump to stratum L1–L7
Navigate between strata
Expand / drill deeper
Collapse / drill up
c Toggle compare mode
? Show all shortcuts
Esc Close / collapse all

Interaction Modes

Mode Description
Stack Mode Classic 7-stratum vertical view (default)
Accordion Mode Expand strata into substrate/construct/primitive rows
Compare Mode Side-by-side comparison of 2–4 selected items
Category Mode Toggle between Products, Frameworks, Agents, Workflows, Skills
Map Mode Multi-layer overlap visualization
Axis Mode Cross-cutting governance/observability analysis

Roadmap

Where {a}OS Explorer is today and where it's headed.

Current — MVP

  • ✅ 7-stratum interactive explorer with progressive drilldown
  • ✅ Stratum → Substrate → Construct expansion
  • ✅ 5 featured product classifications with rationale
  • ✅ Compare basket (add up to 4 products)
  • ✅ Left rail: strata nav, axis toggles, filter placeholders
  • ✅ Right rail: product cards, detail panel
  • ✅ Full keyboard navigation
  • ✅ Guided onboarding tour
  • ✅ Cinematic dark-theme UI with aurora effects

Next Phase

  • 🚧 Full compare screen with heatmap and gap analysis
  • 🚧 Dedicated product detail pages
  • 🚧 Dedicated stratum detail pages
  • 🚧 Connected fuzzy search across all entities
  • 🚧 Active filter system (vendor, deployment, license, maturity)
  • 🚧 Framework comparison with translation matrix
  • 🚧 Evidence panel with source links and quotes
  • 🚧 All 8 MVP products fully classified
  • 🚧 Mode pills connected to canvas views
  • 🚧 Primitive-level drill (Construct → Primitives)

Future

  • 💡 Saved comparison sets
  • 💡 AI-assisted classification suggestions
  • 💡 Team workspaces and collaboration
  • 💡 Public submission pipeline with moderation
  • 💡 Scenario-based evaluation mode
  • 💡 Framework scoring and weighting system
  • 💡 Alternate framework overlays

FAQ

Is this a product directory?

No. It's a classification and comparison engine. Products are mapped to a structured ontology, not just listed.

Can I add my own products?

Not in the MVP. Future phases will support submission pipelines with moderation.

How is {a}OS different from other AI frameworks?

{a}OS provides 7 distinct strata with explicit boundary questions, multi-residency classification, and cross-cutting axes. Most alternatives collapse these layers into fewer, coarser categories.

What if I disagree with a classification?

Every placement shows its confidence score and rationale. Disputed classifications are explicitly marked. Future versions will support formal dispute annotations.

Azure AI Foundry Setup

Step-by-step guides for standing up Azure AI Foundry, organized by {a}OS stratum. Each section shows what Foundry provides, what you must build, and how to set it up.

Glossary — Abbreviations Used Below
GA
Generally Available (production-ready)
Preview
Public preview (not SLA-backed)
PTU
Provisioned Throughput Units (reserved capacity)
CMK
Customer-Managed Keys (encryption)
SFT
Supervised Fine-Tuning
DPO
Direct Preference Optimization
RFT
Reinforcement Fine-Tuning
RAG
Retrieval-Augmented Generation
BYO
Bring Your Own (storage / resources)
MCP
Model Context Protocol
RBAC
Role-Based Access Control
APIM
Azure API Management
VNet
Virtual Network (Azure networking)
IaC
Infrastructure as Code
PII
Personally Identifiable Information
PaaS
Platform as a Service
FIPS
Federal Information Processing Standards
Entra ID
Microsoft's identity platform (formerly Azure AD)
Bicep
Azure's declarative IaC language
Rego
Policy language for Open Policy Agent

Getting Started

Azure AI Foundry (rebranded to "Microsoft Foundry" in 2026) is Microsoft's unified PaaS for AI — built on Microsoft.CognitiveServices. It bundles model hosting, agent frameworks, content safety, evaluation, and a management portal into a single control plane.

Two resource models coexist The "classic" model (based on Microsoft.MachineLearningServices) and the "new" model (based on Microsoft.CognitiveServices) are both active. The new model is where Microsoft invests, but the classic model has more mature network isolation. Check which you are on: az provider show -n Microsoft.CognitiveServices

Prerequisites:

  • Azure subscription (pay-as-you-go is fine for learning)
  • Azure CLI installed — az --version
  • Python 3.10+ and pip install azure-ai-projects azure-identity
  • Register the provider: az provider register -n Microsoft.CognitiveServices

Foundry coverage across the {a}OS stack:

Stratum Coverage Key Takeaway
L7 Experience Partial Portal + publishing, but new portal lacks VNet support
L6 Governance Partial Content Safety + RBAC. No approval gates, no cost governance
L5 Observability Partial Tracing + evaluators. Breaks with private App Insights
L4 Orchestration Partial Prompt agents GA, workflow agents preview
L3 Execution Good Function calling, MCP, Code Interpreter all GA
L2 Knowledge Partial AI Search GA, Foundry IQ preview. No cross-agent state
L1 Models Strong Model catalog, PTU, fine-tuning, CMK — all GA

7-Day Learning Path

A structured curriculum that builds bottom-up through the strata. Each day: ~2-4 hours. See the full curriculum for detailed instructions.

D1
L1 — Models
Create Foundry resource, deploy model, first inference. Posts 1-2.
D2
L2+L3 — Data & Tools
RAG pipeline, AI Search, tool calling, MCP. Posts 5, 8.
D3
L4 — Agents
Prompt agent, identity, versioning, maturity matrix. Post 7.
D4
L5 — Observability
App Insights, evaluators, CI/CD gates, private tracing gap. Post 10.
D5
L6+L7 — Governance
RBAC hardening, Content Safety, approval architecture. Posts 3, 4, 6.
D6
Cross-Cutting
Network isolation, VNet injection, APIM cost governance. Posts 5, 9.
D7
Integration
Bicep deployment, gap analysis, {a}OS classification. Posts 11, 12.

L1: Models & Infrastructure

L1
Models & Infrastructure
Is the model/runtime/compute/network foundation available and behaving correctly?

What Foundry Provides

Capability Feature Status
Model hosting Model catalog (11K+ models, ~50-100 that matter) GA
Deployment types Standard, PTU, Global, DataZone, Batch GA
Fine-tuning SFT/DPO for GPT-4o/4.1; RFT for o4-mini GA
Encryption CMK via Azure Key Vault (FIPS 140-2) GA
Edge inference Foundry Local GA

What You Build

  • Custom model training (requires Azure ML — separate service)
  • Cross-region disaster recovery (manual architecture)
  • Gov cloud model gap mitigation (model abstraction layer)
  • Hybrid cloud patterns for classified environments

Setup Walkthrough

  1. Create a Foundry resource: az cognitiveservices account create --kind AIServices --sku S0
  2. Create a project within the resource
  3. Deploy a model: az cognitiveservices account deployment create --model-name gpt-4.1 --sku-name GlobalStandard
  4. Test inference via Python SDK or REST API
  5. Enable CMK encryption via Key Vault if handling sensitive data

Key Decisions

Decision Option A Option B Recommendation
Deployment type Pay-as-you-go PTU (reserved) Start with pay-as-you-go; PTU when you have predictable load
Scope Global deployment Regional Regional for data residency; Global for cost optimization
Model portability Hardcode model names Abstraction layer Always abstract — build for the model you might lose access to

Related: LinkedIn Post 2 (What Foundry Is), Post 6 (Models & Money) — Learning Day 1

L2: Knowledge & Memory

L2
Knowledge & Memory
What context did the system know, retrieve, remember, or forget?

What Foundry Provides

Capability Feature Status
RAG engine Foundry IQ Preview
Vector search Azure AI Search GA
Agent memory Conversation memory per agent GA
File processing File Search tool GA (no VNet)
Document processing Document Intelligence GA

What You Build

  • Cross-agent shared state (Cosmos DB)
  • Memory compaction and lifecycle management
  • Production RAG governance and access controls
  • Context window management strategies
Critical: Basic vs. Standard Storage "Basic" agent storage is Microsoft-managed multitenant. If you handle sensitive data, PII, financial records, or any controlled data, you need "Standard" with BYO storage — and you need to know this before your first deployment, not after.

Setup Walkthrough

  1. Create Azure AI Search: az search service create --sku standard
  2. Configure private endpoint for AI Search
  3. Upload documents and create a vector index
  4. Connect AI Search as an agent tool
  5. Configure BYO storage (Standard agent setup) — do not use Basic for production

Related: LinkedIn Post 8 (RAG & Data) — Learning Day 2

L3: Execution & Interfaces

L3
Execution & Interfaces
What external action was invoked, against which interface, and what happened?

What Foundry Provides

Capability Feature Status VNet
Function calling Native function/tool calling GA Works
MCP support MCP server integration GA Works
Code execution Code Interpreter GA Works
Enterprise connectors 1,400+ via Logic Apps GA Partial
Web search Bing Grounding GA PUBLIC
Browser Browser Automation Preview Broken
Warning: Public Internet Tools Bing Grounding and Web Search route over public internet even in "isolated" environments. Block via Azure Policy on day one: az policy assignment create --name block-bing --policy "deny Bing grounding"

Setup Walkthrough

  1. Add a function tool to your agent — test with a simple calculator function
  2. Set up MCP server integration for your own services
  3. Test Code Interpreter with a data analysis task
  4. Block Bing/Web Search via Azure Policy in production
  5. Build your personal VNet tool compatibility checklist

Related: LinkedIn Post 5 (Network Isolation), Post 7 (Agent Maturity) — Learning Day 2

L4: Orchestration & Decisioning

L4
Orchestration & Decisioning
What happens next, in what order, with which agent, under which stop conditions?

What Foundry Provides

Agent Type Status VNet Production Ready
Prompt Agents GA Yes Yes
Workflow Agents Preview Partial No
Hosted Agents Preview No No
Start with Prompt Agents They are GA and production-safe. Workflow and hosted agents are preview — build your first value on what is stable, not what is exciting. Complex orchestration? Use Semantic Kernel or LangGraph in code.

Setup Walkthrough

  1. Create a prompt agent with a detailed system prompt
  2. Attach tools from L3 (function calling, MCP)
  3. Configure per-agent Entra managed identity
  4. Test agent in the playground
  5. Create a second version — test versioning and rollback
  6. Explore workflow agents (preview) — understand their limitations before committing

Related: LinkedIn Post 7 (Agent Maturity) — Learning Day 3

L5: Observability & Evaluation

L5
Observability & Evaluation
What actually happened, how well did it perform, and where did it fail?

What Foundry Provides

Capability Feature Status
Tracing OpenTelemetry via App Insights GA (prompt agents)
Evaluators Quality, safety, task adherence GA
Custom evaluation Configurable pipelines GA
Continuous eval Scheduled eval against live data GA
CI/CD GitHub Actions evaluation gates GA
Critical Gap: Private App Insights Tracing does NOT work with private Application Insights. If you need both network isolation and observability — which is most serious deployments — you must architect a workaround: non-private App Insights in a peered network with restricted access, or export traces to Log Analytics.

Setup Walkthrough

  1. Create Application Insights: az monitor app-insights component create
  2. Connect to your Foundry project
  3. Run your agent — inspect traces in the portal
  4. Run a quality evaluator against agent output
  5. Run a safety evaluator — verify Content Safety scoring
  6. Set up a GitHub Actions workflow with an evaluation gate
  7. If using private networking: test tracing (it may break) and plan the workaround

Related: LinkedIn Post 10 (Observability Paradox) — Learning Day 4

L6: Governance & Trust

L6
Governance & Trust
Who is allowed to do this, under what policy, with what risk and budget?

What Foundry Provides

Capability Feature Status
Content filtering Content Safety (always on, configurable) GA
Prompt protection Prompt Shields — jailbreak detection GA
RBAC Entra ID + 4 built-in roles + custom GA
Infra policy Azure Policy integration GA
Threat detection Microsoft Defender GA

What You Build (This Is the Big One)

Governance is not a feature — it's an architecture Foundry has no approval workflows, no governance dashboard, no aggregated audit trail, no compliance reporting, no agent behavior policy engine, and no cost governance. Everything in this list is your responsibility.

RBAC Roles

Role Use For Risk Level
Azure AI User Developers (least privilege) Low — default for all devs
Azure AI Project Manager Team leads Medium
Azure AI Account Owner Resource creation only Medium — no data actions
Azure AI Owner Never without justification High — loaded gun in prod

Setup Walkthrough

  1. Switch to Entra ID authentication — remove all API keys
  2. Assign Azure AI User to developers (least privilege)
  3. Verify Azure AI Owner is NOT assigned to any developer accounts
  4. Configure Content Safety severity thresholds
  5. Test Prompt Shields — attempt a jailbreak, verify detection
  6. Create Azure Policy assignments to block dangerous tools in production
  7. Design your approval gate architecture (Logic Apps or custom workflow — Foundry does not provide this)

Related: LinkedIn Post 3 ({a}OS debut), Post 4 (Identity) — Learning Day 5

L7: Experience & Intent

L7
Experience & Intent
What is the user asking, seeing, approving, or rejecting?

What Foundry Provides

Capability Feature Status Limitation
Portal Foundry Portal (classic + new) GA New portal lacks VNet support
Agent builder Visual agent configuration GA
Publishing Teams, M365, BizChat GA Requires public endpoints
API surface REST + Python/JS SDKs GA
Playground Agent testing playground GA
Most of L7 is yours to build The Foundry Portal is a management surface, not an end-user experience. For most production deployments, the L7 surface — the thing your actual users see and interact with — will be entirely custom-built. Agent publishing to Teams requires public endpoints, which is often a non-starter for security-conscious environments.

Setup Walkthrough

  1. Navigate the Foundry Portal — toggle between classic and new portals
  2. Use the classic portal or CLI for network-isolated management (the new portal doesn't support it)
  3. Test agent publishing to Teams (note the public endpoint requirement)
  4. Design your L7 surface: what will end users actually see?
  5. Plan: approval surfaces, escalation flows, intent disambiguation — none of these come from Foundry

Related: LinkedIn Post 3 (Governance debut) — Learning Day 5

Network Isolation

Three isolation dimensions: inbound (private endpoints), outbound from resource (VNet injection), and outbound from agents (tool-level).

What Works in VNet

Tool / Feature VNet Status
Private endpoints (inbound) Works
VNet injection (agent compute) Works (/27+ subnet)
Function calling Works
MCP (private) Works
Code Interpreter Works
AI Search (private endpoint) Works

What Breaks in VNet

Tool / Feature Issue
File Search Not supported in VNet isolation
OpenAPI Tool Not supported in VNet isolation
Azure Functions (tool) Not supported in VNet isolation
Browser Automation Not supported in VNet isolation
Hosted Agents No VNet support at all
Workflow Agents Inbound only
Tracing (private App Insights) Breaks entirely
New Foundry Portal Must use classic, SDK, or CLI

What Goes Over Public Internet

Tool Risk
Bing Grounding Public internet even in "isolated" environments
Web Search Public internet even in "isolated" environments
SharePoint Grounding Public internet even in "isolated" environments
Block these via Azure Policy in production The marketing says "end-to-end network isolation." The reality is an exceptions list. Know it before your security team finds it for you.

Related: LinkedIn Post 5 (Network Isolation Exceptions Table) — Learning Day 6

Cost Governance

The platform itself is free to explore. Costs come from model consumption, storage, search, networking, and compute.

Native vs. Custom

Capability Source
Budget alerts Native — Azure Cost Management
Per-resource tracking Native — Azure Cost Management
Per-team chargeback Custom — requires APIM gateway
Per-user token budgets Custom — requires APIM + logic
Real-time spend enforcement Custom — requires APIM kill switch
Cross-service attribution Custom — Log Analytics queries
Deploy APIM as an AI Gateway from Day One Azure API Management sits between your consumers and Foundry. It provides token metering, rate limiting, and per-team chargeback that Foundry lacks natively. Build it before you need it — not after an agent loops and spends your monthly budget overnight.

Related: LinkedIn Post 9 (Cost Governance) — Learning Day 6

IaC & CI/CD

Deploy via Bicep/CLI, not the portal. The new portal does not support network isolation workflows. IaC is not optional — it's the only reliable path for production.

Bicep Templates

  • 00-basic — Simple Foundry resource + project
  • 19-hybrid-private-resources-agent-setup — Network-isolated agent setup with BYO storage

Fork the azure-ai-foundry/foundry-samples Bicep templates. Customize for your policies and compliance. Maintain as an internal module.

Phase 1 Architecture (Minimum Viable Platform)

  • Foundry resource + project
  • Private endpoints
  • One model deployment (pay-as-you-go)
  • One prompt agent (GA)
  • Application Insights (with tracing workaround if private)
  • APIM AI Gateway (cost metering from day one)
  • Azure Cost Management budget with alerts
  • Entra-only authentication (no API keys)

CI/CD Pattern

  1. Bicep deployment via GitHub Actions
  2. Agent evaluation gate (quality + safety evaluators)
  3. Cost check gate (budget threshold)
  4. Promote: dev → staging → prod (separate Foundry resources per environment)
Projects are NOT environments Projects within a single Foundry resource share networking and cannot serve as environment boundaries. Use separate Foundry resources per environment (dev, staging, prod).

Related: LinkedIn Post 11 (Phase 1) — Learning Day 7

Standards Track

The path from open reference model to internationally recognized standard — IEEE SA → NIST → ISO/IEC JTC 1/SC 42.

v1.0 — Locked Phase 0: Foundation ✓ License: CC BY 4.0 + Apache 2.0

Roadmap

Eight phases from current foundation to ISO recognition.

# Phase Target Key Deliverable
0 Foundation ✓ Done v1.0 reference model locked, Explorer with 40+ products classified
1 Atlas Adoption Q2-Q3 2026 Pilot deployment under perpetual license; internal validation across real Atlas workflows
2 Community Review Q3-Q4 2026 5-10 external reviewers (industry + academic); objections and endorsements captured
3 Standards-Ready Package Q4 2026 Neutral spec, worked use cases, problem statement, evidence memo
4 NIST Alignment Q4 2026 - Q1 2027 Finalize AI RMF crosswalk; submit for NIST community alignment feedback
5 IEEE Incubation Q1-Q2 2027 IEEE SA pre-standardization outreach; draft PAR
6 Formal Submission Q2-Q3 2027 IEEE PAR submission via myProject; identify sponsoring committee
7 ISO Track (parallel) 2027+ ISO/IEC JTC 1/SC 42 WG 1 route via ANSI (if IEEE traction is strong)

Why Standardize?

{a}OS already solves a real problem — giving architects a shared ontology for the AI operations stack. Standardization turns that ontology into something procurement can cite, regulators can reference, and vendors can map against. The goal is not prestige; it's durability and interoperability.

Core principle — Be willing to lose the branded wording. If the concepts survive in a neutral spec, you win regardless of whether the final standard says "stratum" or "tier."

Phase 1 — Neutral Spec Extraction

Before any standards body will engage, the model must exist as a vendor-neutral, IP-clean specification — no product names, no marketing language, no assumptions about Azure or any other cloud.

# Deliverable Description
1 Problem Statement One-page framing: what gap this model fills, why existing frameworks don't cover it.
2 Scope & Non-Scope What the standard defines vs. what it explicitly does not. Includes boundary diagram.
3 Neutral Terminology Dual-label mapping: Stratum ↔ Tier, Substrate ↔ Capability Group, Construct ↔ Entity, Primitive ↔ Attribute, Axis ↔ Cross-cutting Concern.
4 Metamodel Diagram UML or SysML class diagram showing the 5-concept ontology and relationships.
5 Use Cases (6×) Six scenarios demonstrating model utility: vendor evaluation, compliance mapping, architecture review, gap analysis, procurement scoring, and interop testing.
6 Crosswalk Table How {a}OS maps to existing frameworks: NIST AI RMF, EU AI Act risk tiers, ISO/IEC 42001, OWASP AI Top 10.
7 Evidence of Need Market research, analyst citations, or adoption metrics demonstrating industry demand for such a classification.

Phase 2 — IEEE SA Incubation

IEEE Standards Association is the fastest on-ramp for a new standard. Two entry paths exist:

Path Timeline Best For
Industry Connections Activity (IC) 6–12 months Rapid incubation, building a coalition, publishing a white paper or technical report before committing to a full standard.
Standards Working Group (WG) 18–36 months Full IEEE standard (e.g., P-series number). Requires 5+ organizational participants, a sponsor, and a PAR (Project Authorization Request).
Recommendation — Start with an IC Activity to socialize the model, gather co-sponsors, and harden the spec through peer review. Graduate to a WG once you have 5+ organizational backers.

Phase 3 — NIST Crosswalk

NIST does not create classification standards for industry models, but the AI Risk Management Framework (AI RMF 1.0) is the US government's anchor document. Mapping {a}OS strata to AI RMF functions creates a compliance bridge that federal buyers can reference.

{a}OS Stratum AI RMF Function Alignment Notes
L7 Experience & Intent GOVERN 1 — Policies User intent mapping, transparency requirements
L6 Governance & Trust GOVERN 2–6, MAP 1 Risk tolerance, roles, compliance controls
L5 Observability & Evaluation MEASURE 1–4 Metrics, monitoring, evaluation gates
L4 Orchestration & Decisioning MANAGE 1–4 Workflow control, decision audit trail
L3 Execution & Interfaces MAP 2–3 Runtime boundaries, API contracts
L2 Knowledge & Memory MAP 4–5 Data provenance, knowledge governance
L1 Models & Infrastructure MANAGE 1 (infra) Compute, model lifecycle, deployment targets

Phase 4 — ISO / IEC JTC 1 / SC 42

The ultimate destination: submission to ISO/IEC JTC 1/SC 42 WG 1 (AI Foundational Standards). This is a 3–5 year process but produces a globally recognized standard that procurement, regulators, and auditors rely on.

Prerequisite — An established IEEE standard or technical report is the strongest vehicle for an ISO/IEC fast-track submission. Phase 2 feeds directly into Phase 4.
Stage ISO Process What Happens
NP New Work Item Proposal National body sponsors the proposal; 5+ participating P-members must vote yes.
WD Working Draft WG 1 iterates on the spec through 1–3 drafts.
CD Committee Draft Full SC 42 review and comment resolution.
DIS Draft International Standard Broader JTC 1 ballot; editorial polish.
IS International Standard Published as ISO/IEC xxxxx. Periodic review every 5 years.

90-Day Execution Plan

A realistic first-quarter roadmap to go from open model to submission-ready package.

Weeks Milestone Key Activities
1–2 Neutral Spec v0.1 Strip branding, write problem statement, produce dual-label terminology map, draft metamodel diagram.
3–4 Gap Analysis Complete crosswalk table (NIST AI RMF, EU AI Act, ISO 42001, OWASP). Identify gaps and overlaps.
5–6 External Review Recruit 3–5 domain reviewers (academia, industry, gov). Collect structured feedback. Iterate spec to v0.2.
7–8 Submission Package Compile complete IEEE IC Activity proposal or PAR draft. Include use cases, evidence of need, and letters of support.
9–12 Outreach & Coalition Present at 2+ industry events or webinars. Recruit organizational co-sponsors. Submit to IEEE SA NesCom.

Strategic Advice

The single most important principle — Be willing to lose the branded wording. If the model's concepts survive — the seven tiers, the cross-cutting axes, the stratum→substrate→construct→primitive hierarchy — you win, even if the final ISO standard uses entirely different terminology.
Do Don't Why
Ship dual-label docs from Day 1 Hard-code only branded terms Standards bodies reject vocabulary that implies vendor lock-in.
Map to existing frameworks early Claim the model is "entirely new" Reviewers need to see how it extends — not replaces — prior art.
Recruit co-sponsors before submitting Solo-author the submission IEEE and ISO require multi-org participation. Solo proposals stall.
Publish open-source tooling alongside the spec Gate the model behind a paywall Adoption drives standardization; scarcity kills it.
Focus on the ontology hierarchy Try to standardize implementation details Standards define what, not how. Keep architecture decisions out of the spec.

This roadmap will be updated as the standardization effort progresses. Track deliverable status in the Roadmap tab.