The Self-Improving Agent Stack
How CubiCube's 10 specialized agents are automatically discovered, certified, selected, and optimized — with no human in the loop. A complete walkthrough of the first self-improving agentic web development stack.
The Problem: You Have 10 Agents. Are They the Best?
CubiCube is a modular web development platform powered by Pegasus — an orchestrator that coordinates 10 specialized AI agents. There's a layout agent, a design-system agent, an accessibility agent, a performance agent, and six others. Each agent handles a specific slice of the web development workflow, and Pegasus routes tasks to the right agent at the right time.
But here's the question nobody had a good answer for: how do you know these are the best agents for the job?
Before our stack existed, the answer was: you don't. You pick agents based on benchmarks, vibes, and whatever the latest Hacker News thread recommends. You hard-code them into your orchestrator. When a better agent appears, someone has to notice, evaluate it manually, and swap it in. That's not engineering — it's hope.
We built something different. A five-layer system where agents are automatically discovered, cryptographically certified, intelligently selected using multi-armed bandit algorithms, economically incentivized, and — crucially — optimized by another agent that watches the entire system and makes it better.
This is the complete lifecycle. No human touches any of it after initial deployment.
Layer 1: Discovery & Registration
Every agent in the system starts life with an identity. When CubiCube's layout agent is deployed, it registers with two systems simultaneously:
KnowYourModel Trust Registry
The agent gets a registry entry in KYM — a trust credential authority that tracks capabilities, performance history, and trust scores. KYM manages over 20,800 entities across models, agents, and datasets.
NANDA Index
The agent publishes an AgentFact — a W3C Verifiable Credential containing its capabilities, endpoint URL, supported protocols, and Ed25519 cryptographic identity. Any NANDA node can now discover it.
KYM handles trust and selection. NANDA handles discovery and verification. Together, they give the agent a globally discoverable, cryptographically verifiable identity that any orchestrator in the world can find and validate.
When Pegasus needs a layout agent, it doesn't have one hard-coded. It queries the KYM
registry for agents with the web-layout capability, and KYM returns a ranked list
— scored, certified, and ready to use.
Layer 2: Certification & Trust
Registration gets you a listing. Certification proves you deserve it.
NANDA's Capability Certifier runs automated test suites against every registered agent. For a layout agent, that means generating test layouts across viewport sizes, validating semantic HTML output, checking responsive breakpoints, and measuring render performance. Results are scored using Wilson confidence intervals — the same statistical method used for grading products on e-commerce platforms — producing grades from A+ to F.
Each grade is issued as a W3C Verifiable Credential v2 signed with Ed25519. This isn't a number in a database — it's a cryptographic attestation that any participant in the NANDA network can independently verify without trusting our infrastructure. The credential includes:
- The specific capabilities tested and their individual scores
- The test suite version and methodology
- A
Bitstring Status Listrevocation index for real-time invalidation - Evidence stored in R2 (Cloudflare's object storage) — the actual test inputs and outputs
Meanwhile, NANDA's Observer Evaluator runs hourly liveness probes. It's not enough to pass certification once — agents must prove they're consistently available. The observer computes a reputation score using a weighted formula:
All of this happens without any human involvement. Agents are tested, scored, credentialed, and monitored on a continuous automated cycle.
Layer 3: Intelligent Selection
This is where it gets interesting. KYM doesn't just rank agents by a static score — it uses multi-armed bandit algorithms to learn which agent performs best in practice, balancing exploitation of known good agents with exploration of new ones.
Registry owners choose from four selection strategies:
Interactive · Selection Algorithms
Click any algorithm to compare · Registry owners choose per-registry
Here's the key insight: every time Pegasus uses an agent, the outcome feeds back into the bandit. If the layout agent produces clean, accessible HTML, that's a success signal. If it fails validation or times out, that's a failure. Over hundreds of invocations, Thompson Sampling converges on the genuinely best agent — not the one with the best marketing, but the one that actually performs.
The feedback mechanism is cryptographic. Orchestrators submit Ed25519-signed usage receipts that attest to the outcome of each invocation. These receipts are tamper-proof — you can't inflate an agent's success rate without a valid orchestrator signature. KYM verifies each receipt before updating the bandit's posterior distribution.
Layer 4: The Economic Layer
Trust and selection need an economic substrate. Agents aren't free — they consume compute, API calls, and inference tokens. The stack uses the x402 protocol to gate access with real payments, creating genuine economic incentives for quality.
x402 revives the HTTP 402 Payment Required status code with a three-step pattern:
- Challenge — the agent endpoint returns
402with payment requirements (amount, accepted currencies, payment address) - Pay — the orchestrator completes payment and receives a receipt
- Prove — the orchestrator re-sends the request with the payment proof in
an
X-PAYMENTheader
The system supports dual payment rails:
USDC (On-Chain)
Stablecoin payments on Base L2 with ~24-second settlement. Fully decentralized, no intermediary. Best for high-value transactions and cross-organization payments where neither party trusts the other.
Nanda Points (Off-Chain)
Internal ledger with HMAC-SHA256 signed transactions and instant settlement. Zero gas fees, sub-millisecond verification. Pegged at 1,000 NP = $1 USD — making a registry listing cost 1,000 NP and a registry creation cost 5,000 NP. Every new agent is seeded with 1,000 NP (worth $1), enough for one free listing. NP verification is delegated to the NANDA Node's Points Auditor, keeping the auditor as the single source of truth for balances.
When an agent calls a KYM endpoint, the 402 Payment Required response
advertises both payment options — the USDC challenge and the NP alternative with
exact header instructions. The agent (or its orchestrator) picks whichever rail it has funds on.
KYM verifies USDC on-chain directly; for NP, it delegates verification to the NANDA Node, which
checks the HMAC signature and settles the transaction on its internal ledger.
The economic layer creates a critical feedback loop: agents that perform well get selected more often, earn more revenue, and can invest in their own optimization. Agents that underperform lose selection probability, earn less, and eventually get displaced. It's natural selection with a financial substrate.
Layer 5: The Optimization Agent
Here's the part that makes this architecture genuinely novel: a NANDA agent whose sole purpose is making other agents better.
The Optimization Agent is itself registered in the NANDA Index and certified through KYM — it's an agent like any other, subject to the same trust infrastructure. But its capability is unique: it watches the entire system and acts on what it sees.
It has read access to:
- KYM bandit posteriors — the full Beta distribution parameters for every agent in every registry, showing which agents are converging toward success and which are declining
- NANDA observer telemetry — availability windows, probe success rates, latency distributions, and health trends over 24-hour sliding windows
- Certification evidence — the actual test inputs, outputs, and Wilson interval calculations stored in R2
- Usage receipt streams — the Ed25519-signed outcome data from every orchestrator invocation
- Payment transaction history — settlement patterns, revenue flow, and economic activity across both USDC and NP rails
With this data, the Optimization Agent can take concrete actions:
Trigger Re-Certification
When an agent's observer data shows degradation but its certification grade is stale, the Optimization Agent queues a re-certification run. If the agent fails, the credential is revoked via Bitstring Status List and the bandit prior is reset — immediately removing it from selection.
Scout New Agents
The Optimization Agent queries the NANDA Index for agents with matching capabilities that aren't yet in the KYM registry. When it finds promising candidates, it registers them — giving the bandit new arms to explore. It uses certification grades and observer data from other NANDA nodes to pre-filter before registration.
Tune Selection Parameters
If a registry's Thompson Sampling posterior shows all agents clustering near the same score (the bandit has converged but exploration is still burning budget), the Optimization Agent can recommend switching to Static selection. If a new batch of agents arrives, it can switch back to Thompson Sampling to explore them.
Report & Recommend
The Optimization Agent generates periodic reports: which agents improved, which degraded, what the expected regret is for the current selection strategy, and whether different algorithm parameters would reduce it. These reports feed into dashboards visible in both KYM and NANDA admin panels.
The critical point: the Optimization Agent itself is subject to the same trust infrastructure. It has a certification grade, observer reputation, and bandit selection probability. If it makes bad recommendations, its trust score drops. If a better optimization agent appears, it gets displaced. The system optimizes its own optimizer.
The Complete Loop
Let's trace one full cycle through the system to see how all five layers work together:
Interactive · The Complete Self-Improving Lifecycle
Step 1 New Agent Appears
Discovery
A new CSS-in-JS layout agent appears in the NANDA Index, published by a third-party developer. It has an AgentFact with Ed25519 identity and declared capabilities.
Click any step to explore · 10 steps from discovery to autonomous promotion
This isn't a theoretical architecture — it's a running system. Every component described in this post is built on Cloudflare Workers (V8 isolates at the edge), with D1 for operational state, R2 for certification evidence, KV for hot caches, and cron-triggered inline processing for background tasks. The entire stack runs serverless with zero cold-start infrastructure management.
Further Reading
This work sits at the intersection of several active research areas. The following papers and resources explore the foundational ideas behind each layer of the stack:
Self-Improving Agent Systems
- DARWIN: Dynamic Agentically Rewriting Self-Improving Network (Jiang, Feb 2026) — LLM agents that autonomously rewrite their own network architecture through evolutionary optimization.
- AgentBreeder: Mitigating AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement (Rosser et al., NeurIPS 2025 Spotlight) — Multi-objective evolutionary search over agent scaffolds, demonstrating that self-improving systems can be aligned with safety constraints.
- ADAS: Automated Design of Agentic Systems (Hu et al., 2025) — A meta-agent that programs new agents in code, searching over the space of possible agent designs.
- A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions (Yuksel et al., REALM @ ACL 2025) — Autonomous refinement of agent roles and workflows through LLM-driven feedback loops.
- Evolver: aiXplain's Meta-Agent for Self-Improving AI (Oct 2025) — Industry implementation of a meta-agent that treats agent optimization as an evolutionary search problem.
- Stanford CS329A: Self-Improving AI Agents (Mirhoseini & Chowdhery, Fall 2025) — Graduate seminar covering the latest techniques in agents that continuously improve through experience.
Multi-Armed Bandits for Agent Selection
- A Pragmatic Approach Towards Self Evolving Agent (Jan 2026) — Directly models LLM agent selection as a multi-armed bandit problem, enabling dynamic selection among competing agents.
- KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination (Zhang et al., ICML 2025) — Thompson Sampling strategy for dynamically routing tasks to the right expert in multi-agent systems.
- Multi-Armed Bandits Meet Large Language Models (May 2025) — Surveys the intersection of MAB algorithms and LLM-based systems, including agent selection and model routing.
Agent Discovery & Trust Infrastructure
- Beyond DNS: Unlocking the Internet of AI Agents via the NANDA Index (Jul 2025) — The foundational NANDA paper describing the AgentFacts schema and decentralized peer-to-peer agent index.
- AgentHub: A Registry for Discoverable, Verifiable, and Reproducible AI Agents (Jiang et al., Oct 2025) — Vision for an agent registry supporting production and consumption of software agents.
- Agent Discovery in Internet of Agents: Challenges and Solutions (Guo et al., Nov 2025) — Survey of agent discovery challenges including capability matching and decentralized resolution.
- Inter-Agent Trust Models: A Comparative Study (Nov 2025) — Comparative analysis of trust mechanisms across A2A, NANDA, and blockchain-based agent protocols.
- W3C Verifiable Credentials Data Model v2.0 (May 2025) — The standard underlying our certification grades and AgentFacts.
Agent Payments & Economic Incentives
- x402: Internet-Native Payments Standard — The open protocol for payment-gated agent access, reviving HTTP 402 with zero protocol fees.
- Secure Use of the Agent Payments Protocol (AP2) (CSA, Oct 2025) — Cloud Security Alliance framework for trustworthy AI-driven payment transactions.
Trust Governance & Standards
- ToIP & DIF: Trust in the Age of AI Working Groups (Sep 2025) — Three working groups addressing decentralized trust graphs, human trust in AI, and trusted AI agents under the Linux Foundation.
- SingularityNET + Privado ID: First Decentralized AI Agent Trust Registry (Mar 2025) — Partnership launching the first decentralized trust registry in the ASI ecosystem.