Published February 2026 14 min read

The Self-Improving Agent Stack

How CubiCube's 10 specialized agents are automatically discovered, certified, selected, and optimized — with no human in the loop. A complete walkthrough of the first self-improving agentic web development stack.

Technical Ecosystem

The Problem: You Have 10 Agents. Are They the Best?

CubiCube is a modular web development platform powered by Pegasus — an orchestrator that coordinates 10 specialized AI agents. There's a layout agent, a design-system agent, an accessibility agent, a performance agent, and six others. Each agent handles a specific slice of the web development workflow, and Pegasus routes tasks to the right agent at the right time.

But here's the question nobody had a good answer for: how do you know these are the best agents for the job?

Before our stack existed, the answer was: you don't. You pick agents based on benchmarks, vibes, and whatever the latest Hacker News thread recommends. You hard-code them into your orchestrator. When a better agent appears, someone has to notice, evaluate it manually, and swap it in. That's not engineering — it's hope.

We built something different. A five-layer system where agents are automatically discovered, cryptographically certified, intelligently selected using multi-armed bandit algorithms, economically incentivized, and — crucially — optimized by another agent that watches the entire system and makes it better.

This is the complete lifecycle. No human touches any of it after initial deployment.

Layer 1: Discovery & Registration

Every agent in the system starts life with an identity. When CubiCube's layout agent is deployed, it registers with two systems simultaneously:

KnowYourModel Trust Registry

The agent gets a registry entry in KYM — a trust credential authority that tracks capabilities, performance history, and trust scores. KYM manages over 20,800 entities across models, agents, and datasets.

NANDA Index

The agent publishes an AgentFact — a W3C Verifiable Credential containing its capabilities, endpoint URL, supported protocols, and Ed25519 cryptographic identity. Any NANDA node can now discover it.

KYM handles trust and selection. NANDA handles discovery and verification. Together, they give the agent a globally discoverable, cryptographically verifiable identity that any orchestrator in the world can find and validate.

When Pegasus needs a layout agent, it doesn't have one hard-coded. It queries the KYM registry for agents with the web-layout capability, and KYM returns a ranked list — scored, certified, and ready to use.

Layer 2: Certification & Trust

Registration gets you a listing. Certification proves you deserve it.

NANDA's Capability Certifier runs automated test suites against every registered agent. For a layout agent, that means generating test layouts across viewport sizes, validating semantic HTML output, checking responsive breakpoints, and measuring render performance. Results are scored using Wilson confidence intervals — the same statistical method used for grading products on e-commerce platforms — producing grades from A+ to F.

Each grade is issued as a W3C Verifiable Credential v2 signed with Ed25519. This isn't a number in a database — it's a cryptographic attestation that any participant in the NANDA network can independently verify without trusting our infrastructure. The credential includes:

  • The specific capabilities tested and their individual scores
  • The test suite version and methodology
  • A Bitstring Status List revocation index for real-time invalidation
  • Evidence stored in R2 (Cloudflare's object storage) — the actual test inputs and outputs

Meanwhile, NANDA's Observer Evaluator runs hourly liveness probes. It's not enough to pass certification once — agents must prove they're consistently available. The observer computes a reputation score using a weighted formula:

reputation = availability × 0.4 + probe_success × 0.4 + cert_score × 0.2 − fraud_rate × 0.1

All of this happens without any human involvement. Agents are tested, scored, credentialed, and monitored on a continuous automated cycle.

Layer 3: Intelligent Selection

This is where it gets interesting. KYM doesn't just rank agents by a static score — it uses multi-armed bandit algorithms to learn which agent performs best in practice, balancing exploitation of known good agents with exploration of new ones.

Registry owners choose from four selection strategies:

Interactive · Selection Algorithms

Click any algorithm to compare · Registry owners choose per-registry

Here's the key insight: every time Pegasus uses an agent, the outcome feeds back into the bandit. If the layout agent produces clean, accessible HTML, that's a success signal. If it fails validation or times out, that's a failure. Over hundreds of invocations, Thompson Sampling converges on the genuinely best agent — not the one with the best marketing, but the one that actually performs.

The feedback mechanism is cryptographic. Orchestrators submit Ed25519-signed usage receipts that attest to the outcome of each invocation. These receipts are tamper-proof — you can't inflate an agent's success rate without a valid orchestrator signature. KYM verifies each receipt before updating the bandit's posterior distribution.

Why bandits, not leaderboards? Static leaderboards optimize for a single moment in time. Bandits continuously adapt. When a new agent enters the registry, Thompson Sampling automatically explores it without degrading the experience for existing users. When an agent's quality degrades, the bandit's posterior shifts and selection probability drops — no human intervention required.

Layer 4: The Economic Layer

Trust and selection need an economic substrate. Agents aren't free — they consume compute, API calls, and inference tokens. The stack uses the x402 protocol to gate access with real payments, creating genuine economic incentives for quality.

x402 revives the HTTP 402 Payment Required status code with a three-step pattern:

  1. Challenge — the agent endpoint returns 402 with payment requirements (amount, accepted currencies, payment address)
  2. Pay — the orchestrator completes payment and receives a receipt
  3. Prove — the orchestrator re-sends the request with the payment proof in an X-PAYMENT header

The system supports dual payment rails:

USDC (On-Chain)

Stablecoin payments on Base L2 with ~24-second settlement. Fully decentralized, no intermediary. Best for high-value transactions and cross-organization payments where neither party trusts the other.

Nanda Points (Off-Chain)

Internal ledger with HMAC-SHA256 signed transactions and instant settlement. Zero gas fees, sub-millisecond verification. Pegged at 1,000 NP = $1 USD — making a registry listing cost 1,000 NP and a registry creation cost 5,000 NP. Every new agent is seeded with 1,000 NP (worth $1), enough for one free listing. NP verification is delegated to the NANDA Node's Points Auditor, keeping the auditor as the single source of truth for balances.

When an agent calls a KYM endpoint, the 402 Payment Required response advertises both payment options — the USDC challenge and the NP alternative with exact header instructions. The agent (or its orchestrator) picks whichever rail it has funds on. KYM verifies USDC on-chain directly; for NP, it delegates verification to the NANDA Node, which checks the HMAC signature and settles the transaction on its internal ledger.

The economic layer creates a critical feedback loop: agents that perform well get selected more often, earn more revenue, and can invest in their own optimization. Agents that underperform lose selection probability, earn less, and eventually get displaced. It's natural selection with a financial substrate.

Layer 5: The Optimization Agent

Here's the part that makes this architecture genuinely novel: a NANDA agent whose sole purpose is making other agents better.

The Optimization Agent is itself registered in the NANDA Index and certified through KYM — it's an agent like any other, subject to the same trust infrastructure. But its capability is unique: it watches the entire system and acts on what it sees.

It has read access to:

  • KYM bandit posteriors — the full Beta distribution parameters for every agent in every registry, showing which agents are converging toward success and which are declining
  • NANDA observer telemetry — availability windows, probe success rates, latency distributions, and health trends over 24-hour sliding windows
  • Certification evidence — the actual test inputs, outputs, and Wilson interval calculations stored in R2
  • Usage receipt streams — the Ed25519-signed outcome data from every orchestrator invocation
  • Payment transaction history — settlement patterns, revenue flow, and economic activity across both USDC and NP rails

With this data, the Optimization Agent can take concrete actions:

Trigger Re-Certification

When an agent's observer data shows degradation but its certification grade is stale, the Optimization Agent queues a re-certification run. If the agent fails, the credential is revoked via Bitstring Status List and the bandit prior is reset — immediately removing it from selection.

Scout New Agents

The Optimization Agent queries the NANDA Index for agents with matching capabilities that aren't yet in the KYM registry. When it finds promising candidates, it registers them — giving the bandit new arms to explore. It uses certification grades and observer data from other NANDA nodes to pre-filter before registration.

Tune Selection Parameters

If a registry's Thompson Sampling posterior shows all agents clustering near the same score (the bandit has converged but exploration is still burning budget), the Optimization Agent can recommend switching to Static selection. If a new batch of agents arrives, it can switch back to Thompson Sampling to explore them.

Report & Recommend

The Optimization Agent generates periodic reports: which agents improved, which degraded, what the expected regret is for the current selection strategy, and whether different algorithm parameters would reduce it. These reports feed into dashboards visible in both KYM and NANDA admin panels.

The critical point: the Optimization Agent itself is subject to the same trust infrastructure. It has a certification grade, observer reputation, and bandit selection probability. If it makes bad recommendations, its trust score drops. If a better optimization agent appears, it gets displaced. The system optimizes its own optimizer.

The Complete Loop

Let's trace one full cycle through the system to see how all five layers work together:

Interactive · The Complete Self-Improving Lifecycle

🌐

Step 1 New Agent Appears

Discovery

A new CSS-in-JS layout agent appears in the NANDA Index, published by a third-party developer. It has an AgentFact with Ed25519 identity and declared capabilities.

Click any step to explore · 10 steps from discovery to autonomous promotion

This isn't a theoretical architecture — it's a running system. Every component described in this post is built on Cloudflare Workers (V8 isolates at the edge), with D1 for operational state, R2 for certification evidence, KV for hot caches, and cron-triggered inline processing for background tasks. The entire stack runs serverless with zero cold-start infrastructure management.

The deeper implication. This architecture doesn't just optimize agent selection — it creates evolutionary pressure across the entire agentic ecosystem. Agent developers who want their agents selected by CubiCube (and the revenue that comes with it) must build agents that pass certification, maintain availability, and actually perform well on real tasks. The system doesn't just pick winners — it makes the whole ecosystem better.

Further Reading

This work sits at the intersection of several active research areas. The following papers and resources explore the foundational ideas behind each layer of the stack:

Self-Improving Agent Systems

Multi-Armed Bandits for Agent Selection

Agent Discovery & Trust Infrastructure

Agent Payments & Economic Incentives

Trust Governance & Standards

Continue Reading

Coming Soon

By Invitation Only