Research note

What Agent-Mediated Deals Need to Be Verifiable

Date: 2026-04-27

Agent-mediated commerce is no longer only a thought experiment. Once an agent can negotiate, agree to terms, and spend money on behalf of a person or company, the hard question is not whether the agent was useful. The hard question is what evidence remains after the deal.

On 24 April 2026, Anthropic published Project Deal, a one-week experiment run in December 2025. Sixty-nine Anthropic employees gave Claude agents a $100 budget and let them negotiate trades in four Slack marketplaces. After the markets opened, humans did not intervene. In the run designated as the real exchange, agents closed 186 deals with more than $4,000 in total transaction value.

The experiment is useful because it is concrete. Agents did not merely simulate commerce. They negotiated with other agents and produced real outcomes. Some outcomes were awkward: an agent bought a snowboard its owner already had, another introduced a fabricated personal anecdote, and another spent budget on 19 ping-pong balls as a gift for itself.

Project Deal was a research experiment, not an accountability system. That distinction matters. The experiment shows that agents can transact. It also makes visible the evidence problem that every serious agent-commerce system will eventually face.

The key questions are simple:

A useful answer needs more than logs. Logs are usually local to one system. They are often readable only by the operator. They may be enough for debugging, but they are weak evidence for counterparties, auditors, compliance teams, or dispute processes.

A better model is to treat agent commerce as a chain of verifiable artifacts.

1. Delegation before action

Before an agent negotiates or spends, there should be a signed record of authority. The record should say who delegated authority, which agent received it, what the agent was allowed to do, and when that authority expires.

In Project Deal, this role was played by an intake interview that became part of the agent setup. That is operationally useful, but it is not a portable artifact. A counterparty cannot verify it independently. An auditor cannot check it without access to the original system.

For commerce, delegation should be explicit. A receipt should bind a principal identity to an agent identity and a machine-checkable scope. If an agent buys a snowboard, the question “was that in scope?” should not require reading the original prompt.

2. Agreement evidence

Negotiation is not the same thing as agreement. A chat transcript may contain offers, counteroffers, jokes, mistakes, and irrelevant context. The agreement is the final object: who agreed, what they agreed to, at what price, under which authority.

Agent-mediated deals need a compact agreement artifact that can stand apart from the conversation. It should reference the delegations that authorized each side. It should capture the terms that matter. It should be inspectable by someone who was not present in the original Slack channel or marketplace.

Without this layer, every dispute starts by replaying a transcript.

3. Execution records

An agreement says what should happen. Execution records say what did happen.

A payment may clear. A transfer may fail. Delivery may happen late. A service may be partially completed. These facts belong in a separate record linked back to the agreement.

Most current systems already have execution logs somewhere: payment providers, order systems, ticketing tools, cloud APIs, GitHub events, internal admin panels. The missing piece is a common way to connect those records to the agent decision that caused them.

4. Post-trade reputation

After execution, counterparties need a way to record how the interaction went. This should not be only a survey or a private rating inside one marketplace. It should be an attestation tied to a specific interaction.

Project Deal reported an important behavioral detail: participants did not reliably perceive when their agent was weaker in negotiation. That is a warning for reputation systems. Subjective impressions are useful, but they are not enough.

A stronger reputation layer should be based on signed attestations about actual interactions. The signal should be portable enough to affect future discovery without requiring every marketplace to share the same database.

5. Dispute evidence

When something goes wrong, the system needs a comparison point.

Without signed artifacts, this becomes reconstruction from logs. With signed artifacts, it becomes a comparison between records.

The snowboard, the fabricated backstory, and the ping-pong balls are memorable because they are small failures. Larger failures will look similar, only with higher stakes.

Where AVP is today

AgentVeil Protocol’s current public surface includes signed attestations, W3C Verifiable Credential reputation outputs, did:key identities, and offline-verifiable proof formats. Of the five evidence layers above, post-trade reputation is closest to what AVP already exposes publicly.

Delegation now has a first small artifact. Agreement objects, execution records with deal-specific semantics, and dispute formats are still open design work.

This is not a claim that AVP solves agent commerce. It does not. The claim is narrower: the evidence layers can be named, separated, and tested one artifact at a time.

A first artifact: DelegationReceipt

The first layer to make public is delegation.

AVP now publishes a minimal DelegationReceipt format and a standalone verifier. The format is a W3C Verifiable Credential v2.0 with a Data Integrity Proof using the eddsa-jcs-2022 cryptosuite. The issuer is the principal’s did:key. The subject is the agent’s did:key.

The scope is intentionally small in version one. It supports:

Validity is bounded by validFrom and validUntil. The credential is canonicalized with RFC 8785 JCS before signing. Verification is therefore straightforward: resolve the issuer key from did:key, canonicalize the payload without the proof, verify the Ed25519 signature, and check the validity window.

The reference verifier is a single Python file under 200 lines. It depends on pynacl, base58, and jcs. It does not import the AVP SDK. It does not call an AVP backend. An auditor can read it, run it, and verify a receipt offline.

DelegationReceipt is not a standard. It is an early public artifact. The point is to make one layer of agent authority inspectable now, without waiting for a complete agent-commerce stack.

Closing

Project Deal made agent-mediated transactions easier to discuss because it produced real deals, real prices, and real failure cases. The next question is evidence.

A durable design probably will not be one large “agent commerce platform.” It will be a set of small artifacts that different systems can produce and verify: delegation, agreement, execution, reputation, and dispute records.

Delegation is the first piece AVP is publishing. The rest should be built only where the need becomes concrete.