§1Context — why we're doing this
Loquent's AI capabilities live in eight disconnected places. Each has its own prompt-building, model resolution, tool list, logging shape, notification path, and permission story. There is no shared identity an end user can name, configure, or watch evolve.
AiUsageFeature variantsmods::planmods::agent (voice)mods::text_agentWhat an org owner experiences today
They see a dashboard briefing, a daily report email, three reply suggestions on every inbound message, and (if they're using campaigns) a Plan timeline. None of those four AI surfaces are aware of each other. The dashboard briefing doesn't know the campaign agent has been pestering a contact about something specific. The text-reply suggestions don't know what the daily report told the owner this morning.
What we want them to experience
One mental model: "These are my agents. They have personalities, they follow rules I taught them, and they're learning from how I use them." Concretely:
- One identity per agent (org-scoped, kind-tagged, persona on file)
- One timeline showing every operation, with cost and latency per step
- One rule system where platform rules layer under org rules layer under per-agent rules
- One memory attribution: every contact note knows which agent and which run wrote it
- One learning loop: daily reflections → weekly skills → monthly distillation
§2Research summary
2.1 Hermes Agent — the source of inspiration
You mentioned Hermes Agent as the model. The reference is Nous Research's Hermes Agent — a self-improving multi-agent framework where every agent is a profile: a fully isolated unit with its own configuration, persona (SOUL.md), memory database, tool list, and cron jobs. Profiles connect independently to Telegram, Discord, Slack, WhatsApp, Signal, Email. Each is a complete CLI entity with its own personality.
What we steal from Hermes
The SOUL.md pattern — codifying persona as a first-class, shareable record. The isolated-profile concept — each agent is a complete identity, not a function call. Loquent's adaptation: multi-tenant SaaS where org boundaries matter and agents share infrastructure.
2.2 Agent anatomy across modern frameworks
| Dimension | Hermes | CrewAI | LangGraph | Letta (MemGPT) |
|---|---|---|---|---|
| Persona | SOUL.md file |
role, goal, backstory |
System prompt in node | Agent identity + persistent memory |
| Memory | File DB + session history | Short-term + opt-in long-term | Explicit state object | 3-tier: Core / Recall / Archival |
| Goals | SOUL.md + cron | Tasks in a Crew | Graph nodes (implicit) | Memory-driven |
| Rules | SOUL.md + env | System prompt + tools | Node LLM instruction | Constitutional layer (WIP) |
| Tools | Function-calling + skill library | Per-agent tool list | Tools as graph nodes | Tool calls within memory |
| Access | Gateway channel + env | Task-scoped | Node-level (implicit) | Function signatures |
No framework has built-in multi-tenant access control or organization-scoped rules. Loquent has to design this layer itself.
2.3 Self-evolution — three mechanisms we'll blend
| Method | Trigger | Storage | How Loquent uses it |
|---|---|---|---|
| Generative Agents Park et al. 2023 |
Importance threshold (~2–3×/day) | Time-indexed text + embeddings | Daily/weekly/monthly ai_agent_reflection rows; meta-LLM digests the log |
| Reflexion Shinn et al. 2023 |
External feedback / failure | Episodic memory buffer | ai_agent_lesson rows: one-line verbal learnings from failed runs + user feedback |
| Voyager Wang et al. 2023 |
Task success | Indexed code/prompts in vector DB | ai_agent_skill rows: extracted prompt fragments after N successes; retrieved into future runs |
2.4 Observability — OpenTelemetry GenAI conventions
Since 2024, OpenTelemetry has defined semantic conventions for GenAI tracing. This is the emerging industry standard that unifies Langfuse, LangSmith, Helicone, Traceloop. We adopt it in ai_agent_log day one so a future export to any tracing backend is a config change, not a refactor.
| Loquent field | OTel attribute | Example |
|---|---|---|
ai_agent_run.trace_id | trace.id | uuid |
ai_agent_log.span_id | gen_ai.* span id | uuid |
ai_agent_log.parent_span_id | span.parent_id | uuid |
ai_agent.kind | custom: agent.kind | text_reply_agent |
ai_usage_log.model | gen_ai.request.model | claude-opus-4-7 |
ai_usage_log.input_tokens | gen_ai.usage.input_tokens | 1500 |
ai_usage_log.cached_tokens | gen_ai.usage.cache_read_tokens | 200 |
| computed | custom: cost_cents | 0.18 |
ai_agent_log.duration_ms | span duration | 850 |
§3Current state in Loquent
Three modules already do most of what we need. They just don't know about each other.
The Plan system — already 90% an agent runtime
src/mods/plan/ (71 file references) is a sophisticated autonomous-execution framework. It has a typed tool-call log, an 8-state JSONB state machine, a cron-polled executor, contact assignments, autopilot gating, approval gates, re-enrollment policy, per-template model override. The hard problems are solved here.
The 8 plan states (we reuse this exact shape for AiAgentRunState)
The 13 typed tool variants (we keep this typed-per-kind pattern)
SendEmail, SendSms, ListPlanContacts, GetContactDetails, GetContactNotes, WriteInteractionNote, UpdateSystemNote, GetConversationHistory, UpdateContact, AskUser, CompletePlan, FailPlan, ScheduleNextExecution
The Text Agent — already a separate module
src/mods/text_agent/ generates 3 high/med/low-confidence reply suggestions per inbound message. Has its own text_agent table (purpose, tier, model, escalation_instructions, restricted_topics, temperature, knowledge_base_ids) and a text_agent_suggestion table. Critical-path code: generate_text_agent_suggestions_service.rs:11.
The Voice Agent — naming collision risk
Critical: mods::agent is already taken
The voice/realtime agent module owns the Agent type, AgentResource permission, the phone_number.agent_id FK, and the agent table — 54 file references. Any new "Agent" abstraction that reuses the bare name will produce ambiguous imports, compile errors, and confused code reviews. We use AiAgent internally and rename voice → VoiceAgent as a late, optional phase once the new runtime is stable.
The AI infrastructure — already well-organized
src/mods/ai/ wraps aisdk + OpenRouter. The ai_usage_log table + spawn_log_ai_usage(AiUsageEntry) helper at log_ai_usage_service.rs:13 capture every AI call's tokens, cost, model, provider, latency. 26 AiUsageFeature variants already cover every call site (TextAgentSuggestions, DashboardBriefing, AnalyzeCall, SummarizeCall, UpdateContactMemory, GenerateReport, ExtractTasks, RealtimeTurn, …).
No new AiUsageFeature variants needed
Each AiAgentKind maps to an existing feature variant. Billing tier matrix stays stable. Admin dashboards keep working. The enum is the integration seam — don't grow it.
§4Target architecture
The 6 dimensions of an AiAgent
Agent kinds — the launch lineup
| Kind | Phase | What it does | Replaces |
|---|---|---|---|
DashboardBriefingAgent | P1 | Daily summary card on the dashboard | generate_dashboard_briefing_service |
TextReplyAgent | P2 | 3 suggestions per inbound message | mods::text_agent |
CustomReportAgent | P3 | Scheduled digest (leads, calls, messages, tasks) | mods::report |
AutonomousCampaignAgent | P4 | Multi-step contact campaigns | mods::plan executor |
AnalyzerAgent | P8 | Call analysis (custom analyzers) | mods::analyzer |
TaskExtractionAgent | P8 | Extract todos from calls/messages | create_tasks_from_call_service |
AutoTagAgent | P8 | Tag contacts after call | auto_tag_contact_from_call_service |
ContactEnrichmentAgent | P8 | Enrich contact fields from convo | enrich_contact_service |
ContactMemoryAgent | P8 | Maintain contact memory notes | update_contact_memory_service |
AssistantAgent | P8 | In-app assistant chat | mods::assistant |
ReflectionAgent | P9 | Meta-agent that distills other agents' logs | (new) |
Naming: internal vs. UI
- Internal (Rust code):
AiAgent,mods::ai_agent,ai_agenttable,AiAgentResourcepermission. Disambiguates from existing voiceAgent. - User-facing (UI copy): "Agents" everywhere. Voice agents become "Voice Agents". Owners see one mental model.
§5Data model
10 new tables. 4 existing tables augmented with nullable FKs. No drops in the migration window.
ER diagram
The big four — what they store, why they exist
1. ai_agent — the identity (the SOUL.md row)
One row per agent. Org-scoped. Holds: kind, name, slug, is_system_default, is_active; JSONB fields for persona (tone/voice/signature/traits/soul_md), goals, rules_override, tools_allowlist, access_scope, budget; plus model_override, enable_reflection, enable_autopilot.
Indexes: (organization_id, kind, slug) UNIQUE; (kind) for dispatch fan-out; partial on active rows.
2. ai_agent_run — one execution
Has state (the AiAgentRunState enum JSONB, same shape as PlanState), parent_run_id for orchestration chains, trigger_source JSONB (Manual { user_id } | Cron { schedule_id } | Webhook { entity, entity_id } | Triggered { by_run_id }), input_payload, output_payload, total cost/tokens/tool-calls, and trace_id for OTel.
3. ai_agent_log — unified, OTel-compatible timeline
The single most important new table. Every operation an agent performs becomes one row. Six kinds:
kind | When written | Payload |
|---|---|---|
system_event | State transition, schedule, retry, budget enforced | event, prev_state, new_state, reason |
thought | LLM produced reasoning text | text, confidence |
llm_generation | Every aisdk call | provider, model, finish_reason, summaries + FK to ai_usage_log |
tool_call | Every typed tool invocation | ToolCallVariant<I,O> per kind (typed I/O) |
reflection | A reflection cycle ran | scope, period, source_log_ids[] |
failover | New path failed, legacy path took over | from_path, to_path, reason |
Indexes for fast analysis AND reflection pipeline: (ai_agent_run_id, created_at) primary for timeline rendering; (ai_agent_id, created_at DESC) for "recent activity"; (organization_id, created_at DESC) partial on llm_generation|tool_call for cost dashboards; GIN on entry for JSONB attribute search.
4. ai_agent_schedule — cron-driven runs
One row per scheduled agent. cron_spec + timezone + next_run_at + is_enabled + config_payload JSONB (recipients, filters — kind-specific). A single run_due_ai_agents_job polls every minute with SELECT … FOR UPDATE SKIP LOCKED and dispatches.
The evolution tables (Phase 9)
| Table | What it stores |
|---|---|
ai_agent_memory | Short-term scratchpad + long-term facts. JSONB value; nullable embedding (pgvector deferred) |
ai_agent_reflection | Daily/weekly/monthly digests. content_markdown, key_insights JSONB, source_run_ids, tokens consumed |
ai_agent_skill | Voyager-style growing library. prompt_fragment, trigger_condition, success_count, failure_count, admin curation flags |
ai_agent_lesson | Reflexion verbal lessons. One-line learning + provenance FK to failure run + optional user feedback |
System rules
platform_ai_rule — system-wide immutable rules (slug, applies_to_kinds JSONB, rule_text, priority). Edited only by code-reviewed migrations. The existing organization_ai_rule table becomes the org tier.
Why JSONB instead of separate persona/goal/rule tables
We don't proliferate side tables until a dimension proves it needs first-class queryability. Persona, goals, agent-rules, tools-allowlist, access-scope, and budget all start as JSONB columns on ai_agent. If a future requirement (e.g., "list every agent that has rule X") needs a structured query, we promote that one dimension to a table — not before.
§6Migration phases
Eleven phases. The first ships in a week with zero behavior change. The next three retarget existing modules. Phase 4 attaches Plans to the new runtime structurally without changing the executor.
-
PHASE 0 FoundationsM · 1 week · first PR
Land
ai_agent,ai_agent_run,ai_agent_logtables + Rust types + genericrun_ai_agentexecutor + admin-only debug endpoint. Zero behavior change. Adds theAiAgentpermission resource. -
PHASE 1 Dashboard Briefing Agent (pilot)M · lowest blast radius
Retarget
generate_dashboard_briefing_serviceto run via the new runtime. Failures are cosmetic (stale briefing) — perfect first pilot. Per-org flag, A/B parity test, hard token budget. -
PHASE 2 Text Reply AgentL · structured output
Refactor
mods::text_agentto back its persona/rules onai_agent. Critical path (every inbound message). Auto-fallback to legacy on schema parse failure. 10% rollout watched for a week before 100%. -
PHASE 3 Custom Report Agent + schedule dispatcherL · cron consolidation
Replace
SendDailyReportJobwithRunDueAiAgentsJobdriven byai_agent_schedule. Mutually-exclusive flags prevent duplicate sends. 7-day shadow-run before cutover. -
PHASE 4 Autonomous Campaign Agent (attach plans)XL · structural only
Every
plan+plan_templategets anai_agent_idFK (kind=AutonomousCampaignAgent). Plan log dual-writes toai_agent_log. Executor body unchanged. Plans-specific bookkeeping (contacts, actions, re-enrollment) stays where it is. -
PHASE 5 Seed default agents on signupS · 2–3 days
New
seed_default_agents_servicecalled fromfinalize_signup_service:174. Creates DashboardBriefing + CustomReport agents. Non-blocking like existing notification-pref seeder. -
PHASE 6 Layered rules (Platform > Org > Agent)M
3-tier rule injection via
build_layered_rules. Platform rules immutable; ship via code-reviewed migration. Every run snapshots effective rules in its log for auditability. -
PHASE 7 Memory unificationM
All contact-memory writes stamped with
ai_agent_run_id. Newai_agent_memorytable for short-term scratchpad + long-term facts. pgvector deferred (JSONB + GIN until proven needed). -
PHASE 8 Onboard remaining AI opsXL · 6 small PRs · parallelizable
One PR per kind: AnalyzerAgent, TaskExtractionAgent, AutoTagAgent, ContactEnrichmentAgent, ContactMemoryAgent (messages batch), AssistantAgent. Each is a drop-in replacement of the inline aisdk call.
-
PHASE 9 Self-evolution pipelineXL · opt-in
Daily/weekly/monthly reflection cron + skill extraction (Voyager) + lesson capture (Reflexion).
ReflectionAgentkind. UI at/agents/{id}/evolution. Opt-in per org due to cost. -
PHASE 10 Budgets, throttling, observability UIL
Per-agent + per-org budget enforcement before every LLM call. Unified timeline view at
/agents/{id}/runs/{run_id}. Cost/latency/success-rate metrics dashboard. -
PHASE 11 Voice agent rename (optional, late)L · mechanical
Resolve long-term collision:
Agent→VoiceAgent. Only after the new runtime is stable for ≥6 months. Pure rename PR withcargo checkguardrails.
Top 5 risks & mitigations
| # | Risk | Mitigation |
|---|---|---|
| 1 | Naming collision with existing voice Agent | Use AiAgent internally; defer voice rename to Phase 11 |
| 2 | Live plan execution disruption (paying customers) | Coexist with feature flags; Phase 4 is structural-only (no semantic change) |
| 3 | Billing/usage logging double-counting | Mutually-exclusive flags; duplicate-detection guard in spawn_log_ai_usage |
| 4 | Structured-output parse failure (Text Agent) | Auto-fallback to legacy path; failover recorded as ai_agent_log.kind=failover; 10%→100% rollout |
| 5 | Cron-job overlap during Custom Report transition | Mutually-exclusive flag; legacy cron short-circuits when new flag on; FOR UPDATE SKIP LOCKED |
§7Observability — tracing every operation
Every thought, tool call, LLM generation, and system event becomes one indexed row in ai_agent_log. Optimized for fast timeline rendering AND for feeding the reflection pipeline.
The indexes that matter
| Index | Powers |
|---|---|
(ai_agent_run_id, created_at) | Timeline UI |
(ai_agent_id, created_at DESC) | Recent activity per agent · Reflection pipeline reads this |
(organization_id, created_at DESC) partial | Cost dashboards (LLM gen + tool calls only) |
(kind, created_at DESC) | Filter by kind in admin UI |
GIN on entry JSONB | Attribute search ("find all tool_call where input.to_number = X") |
Day-one OTel-compatible
We don't ship to Langfuse/Datadog yet — but the field names, span/parent_span structure, and trace IDs are OTel-conformant. When we want to export, it's a collector config change, not a refactor.
§8Self-evolution — the learning loop
The loop closes when build_ai_agent_prompt_service retrieves the top-K relevant skills + recent lessons and injects them into the next run's prompt. Voyager pattern: agents get better at the same task family over time, without retraining.
Cost discipline
- Opt-in per org with a 14-day free trial. Default off.
- Cheapest available model for reflection (e.g.
deepseek-v3.2). - Hard per-agent monthly token budget enforced.
- Skills with failure rate > threshold are auto-disabled.
- Weekly distillation re-evaluates daily reflections (averaging effect against bad-day reinforcement).
Admin curation
The owner can view /agents/{id}/evolution: every reflection, every extracted skill (with success/failure counts), every lesson. They can edit, suppress, or retire any of them. This keeps the loop transparent — agents that learn must also be agents you can teach.
§9The three initial agents (Phase 1–3 deliverables)
Dashboard Briefing Agent
Trigger: Dashboard page load + scheduled (daily 5 AM org-tz)
Persona default: "Concise, data-driven analyst. Plain English, no jargon."
Tools: get_workspace_summary, get_workspace_needs_attention, get_engagement_stats
Output: Markdown briefing rendered on the dashboard
Maps to existing: generate_dashboard_briefing_service.rs
Text Reply Agent
Trigger: Webhook (inbound SMS)
Persona default: Inherits from existing text_agent (purpose, escalation, restricted topics)
Tools: query_knowledge
Output: Structured 3-suggestion JSON (high/med/low confidence)
Special: Auto-fallback to legacy on schema parse failure
Custom Report Agent
Trigger: Scheduled (ai_agent_schedule row, default daily at 5 AM org-tz)
Persona default: "Helpful daily-digest assistant. Highlight what changed and what needs attention."
Toggleable sections: new leads · handled calls · messages in · messages out · prioritized tasks
Delivery: notify_service → email + in-app + push (per user pref)
Maps to existing: generate_daily_report_service.rs
§10Org onboarding — defaults on signup
Hook added to finalize_signup_service.rs:174, right beside the existing notification-preference seeding. The new seed_default_agents creates:
- 1× DashboardBriefingAgent per org — system default, active, autopilot
- 1× CustomReportAgent per org — system default, active, daily 5 AM schedule
- 1× TextReplyAgent per phone number provisioned — autopilot=false (suggestions, not auto-replies)
Each seeded agent carries is_system_default = true. The org admin can edit any of these. Setting is_system_default = false orphans the row from future platform updates — so a platform-level persona update doesn't overwrite a customer's edits.
Non-blocking like notification preferences
If seeding fails, signup still completes. We log the error and the org can have agents seeded later via a backfill job. Signup never fails because the agent table had a transient issue.
§11Layered rules — system > org > agent
Three layers concatenated into every agent's system prompt. Platform rules are immutable from any UI; they ship via code-reviewed migration.
Example platform rules (immutable)
- Never make legal, medical, or financial advice claims.
- Never share PII outside the org boundary.
- Always announce when a message is AI-generated if asked.
- Never autopilot a destructive action without explicit org-owner approval.
- Always log every tool call with full input/output.
Example org rules (editable)
- Sign messages as "Alex from Acme Co."
- Use first-name only when addressing contacts.
- Never discuss competitor X.
Example agent rules (per-agent overrides)
- This text agent only handles inbound questions about pricing.
- This report agent emails the founder, not the team.
Auditable enforcement
Each ai_agent_run.ai_agent_log snapshots the effective rules used. Platform changes ship via migration, not SQL. If an agent acted in a way that violates a rule, the log proves what rules were in force at the time.
§12Verification plan
How we know each phase is safe to ship.
| Phase | Acceptance signal |
|---|---|
| 0 | cargo check green; migrations apply + roll back in CI; admin endpoint manually exercised; state-machine unit tests pass |
| 1 | Side-by-side parity test for 7 days; semantic content diff acceptable; cost parity within ±10%; no double-write to ai_usage_log |
| 2 | Nightly synthetic: 100 simulated inbounds → 3 suggestions each on new path, 0 fallbacks; 10% rollout watched 1 week before 100% |
| 3 | 7-day shadow-run dispatcher verified; single test org switched to new job; no duplicate reports |
| 4 | Every plan + template has non-null ai_agent_id after backfill; plan execution success rate stable 2 weeks |
| 5 | New-signup smoke: org gets 2+ ai_agent rows immediately |
| 6 | Unit test: org-rule cannot override platform-rule; effective rules snapshotted in log |
| 7 | Plan-driven memory writes show written_by_ai_agent_run_id set |
| 8 | Per-kind parity test against legacy code path |
| 9 | 30-day-old agent has ≥1 reflection + ≥1 skill, viewable in admin UI |
| 10 | Budget exceeded → next run rejected with user-visible message + system_event log entry |
End-to-end smoke (after Phase 5)
- Sign up a new org
- Provision a phone number
- Observe DashboardBriefing + CustomReport + TextReply agents created
- Send an SMS to the phone number → observe 3 suggestions generated via new path
- Wait for daily report cron tick → observe email delivered
- Open dashboard → observe briefing rendered
- Inspect
/agents/{id}/runs/{run_id}timeline for each - Confirm
ai_usage_logrows have matchingai_agent_logrows
§13Open questions for you
These are the decisions where multiple reasonable answers exist. Recommended option is highlighted. Please confirm or redirect.
Use AiAgent (defer voice rename) — or Agent (rename voice now) — or a fresh word entirely?
- AiAgent internally; UI says "Agents"; rename voice →
VoiceAgentlater (Phase 11). Internal disambiguation, clean UX, low day-one risk. - Rename voice →
VoiceAgentfirst, then useAgent. Cleaner code long-term but expensive coordination during the migration. - Fresh word (
Operator,Worker,AiAssistant). Avoids collision but introduces yet another concept.
Are "Plans" subsumed by Agents, or do they stay as a separate concept?
- Plans become one kind:
AutonomousCampaignAgent. Other agent kinds are simpler — they don't need contacts, actions table, or re-enrollment policy. - Force everything into the plan shape. Adds complexity to simple agents.
- Plans + Agents stay as fully separate concepts. Loses the unification you asked for.
How aggressive should the cutover be?
- Coexist with feature flags; retire legacy paths after 30 days at 100%. Safest for paying customers.
- Hard cutover per phase. Faster but each phase ships with rollback panic if something breaks.
Default-on, opt-in, or paid premium?
- Opt-in per org with a 14-day free trial. Demonstrates value without surprise costs.
- Default-on with hard token budgets. Aggressive learning but billing complaints likely.
- Premium-tier feature only. Monetizes evolution; lower-tier orgs miss out.
Add pgvector for semantic memory now or later?
- Defer to a focused spike after Phase 9. Initial implementation: JSONB + GIN + recency. Add when one kind proves the need.
- Add pgvector in Phase 0. Higher upfront cost; locks in retrieval strategy before we know what's needed.
Typed per kind, one megaenum, or untyped JSONB?
- One typed enum per kind, JSONB column, kind-discriminated parse. Type safety per kind; no megaenum; shared storage.
- Single megaenum across all kinds. Type safety everywhere; gets unwieldy.
- Untyped — just JSONB. Loses the safety already proven valuable in plans.
Structured output (schema::<T>) vs prompt template?
- Keep structured output, auto-fallback to legacy on parse failure. Reuses existing approach; safe.
- Switch to a prompt template + post-hoc parsing. More fragile.
- Mixed: structured for capable models, prompt template for fallback model. Most complex; only if (a) proves unreliable.
When (if ever) does the voice agent join the AiAgent family?
- Phase 12+: voice joins, sharing only persona/rules/budget; realtime session machinery stays separate.
- Force voice agents into the same
ai_agent_runshape now. Probably forces awkward modeling. - Leave voice agents fully separate forever. Misses cross-cutting features (rules, budget, audit).
Canned options, free-text prompt, or visual builder?
- Canned options + a single optional 500-char custom prompt. Predictable cost; predictable quality.
- Fully free-text custom report prompt. Maximum flexibility; harder to budget; quality varies.
- Visual builder (sections + filters + frequency) with no free-text. Most polished UX; biggest scope.
First PR scope?
- Phase 0 only. Schema + types + admin endpoint, no behavior change. Low review burden, sets the foundation.
- Phase 0 + Phase 1 together. Faster end-to-end demo; bigger review.
- Skip Phase 0; rename
plan_templatetoagentin place. Riskiest; collides with voice agent.
§14References
Primary sources used in this research.
- Hermes Agent — Nous Research · Profiles
- Generative Agents — Park et al., Stanford/Google (UIST '23)
- Reflexion: Language Agents with Verbal Reinforcement Learning — Shinn et al. (NeurIPS 2023)
- Voyager: Open-Ended Embodied Agent with LLMs — Wang et al. (2023)
- Anthropic — Building Effective Agents (Dec 2024)
- CrewAI Documentation — Agents & Memory
- Letta (MemGPT) — Agent Memory
- OpenTelemetry GenAI Semantic Conventions
- Langfuse — Open-Source LLM Observability
- Policy-as-Prompt — arXiv:2509.23994