Skip to Content
Full Walkthrough

Full Walkthrough

A complete, step-by-step guide to testing your AI agent with Veris — from installation to your first evaluation report.

If you just want to get running quickly, see the Quickstart.

Prerequisites

  • An AI agent with an HTTP, WebSocket, or email interface. This can be a chatbot, a customer support agent, an automation tool, or any agent that responds to user messages.
  • Docker installed and running on your machine.
  • Python 3.11+ (for the Veris CLI).

Step 1: Install the Veris CLI

The Veris CLI is the primary interface for packaging your agent, pushing it to the Veris cloud, and managing your test workflow.

pip install veris-cli # or using uv (recommended for isolated tool installs) uv tool install veris-cli

Verify the installation:

veris --version

Step 2: Authenticate

Before you can interact with the Veris platform, you need to authenticate. The CLI supports two methods:

veris login

This opens your browser for Google OAuth. After authorizing, the CLI stores your credentials locally at ~/.veris/config.yaml.

API key login (for CI/CD)

veris login YOUR_API_KEY

Use this method for headless environments like CI pipelines.

The CLI supports profiles for managing multiple environments (dev, staging, prod). Use veris --profile staging login to configure a separate profile. See CLI Reference for details.

Step 3: Initialize Your Project

Navigate to your agent’s project directory and initialize it for Veris:

cd ~/my-agent veris init --name "my-support-agent"

This command does two things:

  1. Creates a .veris/ directory with three template files: Dockerfile.sandbox, veris.yaml, and .dockerignore.
  2. Creates an environment on the Veris platform and saves its ID to .veris/config.yaml. An environment is a named container that holds versioned images of your agent.

An environment in Veris is like a repository for your agent. It stores Docker image tags (versions) and associated configuration. You’ll see it listed on the Environments page in the Console.

Step 4: Configure veris.yaml

The .veris/veris.yaml file is the heart of your Veris configuration. It tells the sandbox three things:

  1. Which mock services your agent needs (Salesforce, Calendar, Stripe, etc.)
  2. How the simulated user communicates with your agent (HTTP, WebSocket, or email)
  3. How to start your agent inside the container

Here’s an example for an agent that uses Google Calendar and accepts HTTP requests:

.veris/veris.yaml
services: - name: calendar dns_aliases: - www.googleapis.com - calendar.google.com actor: channels: - type: http url: http://localhost:8008 method: POST headers: Content-Type: application/json request: message_field: message session_field: session_id response: type: json message_field: response agent: code_path: /agent entry_point: python -m app.main port: 8008 environment: GOOGLE_APPLICATION_CREDENTIALS: /certs/mock-service-account.json

Understanding services

Each entry in services enables a mock API inside the sandbox. The dns_aliases field lists the domains your agent calls — these are intercepted via DNS and routed to the mock service instead of the real API. Your agent doesn’t need any code changes.

For example, if your agent calls https://www.googleapis.com/calendar/v3/..., the sandbox intercepts that request and routes it to the Calendar mock service, which uses an LLM to generate a contextually appropriate response based on the scenario.

Understanding the actor

The actor section defines how the simulated user (persona) communicates with your agent. The actor is an LLM-powered agent that follows the objectives defined in a scenario. It supports five communication channels:

  • Chat (HTTP) — sends POST/GET requests to your agent’s chat API endpoint (JSON or SSE streaming)
  • Email — sends emails that your agent processes asynchronously
  • Voice — simulates phone/voice interactions with your agent
  • Browser-use — drives a headless browser to interact with your agent’s web UI
  • WebSocket — maintains a persistent connection for real-time messaging

The request and response fields tell the actor how to format messages and parse your agent’s replies. For SSE streaming responses, set response.type: sse and configure the event fields.

Understanding the agent block

The agent section tells the sandbox where your code lives and how to start it:

  • code_path — directory where your code is copied (default: /agent)
  • entry_point — the command to start your agent (e.g., python -m app.main or uvicorn app:app)
  • port — the port your agent listens on (default: 8008)
  • environment — environment variables injected at runtime

See the full veris.yaml Reference for all options.

Step 5: Configure the Dockerfile

The .veris/Dockerfile.sandbox extends the Veris base image and adds your agent’s code and dependencies. The base image already contains all mock services, the simulation engine, and supporting infrastructure.

.veris/Dockerfile.sandbox
FROM us-central1-docker.pkg.dev/veris-ai-prod/veris-sandbox/veris-gvisor:latest # Copy and install dependencies COPY requirements.txt /agent/ RUN pip install -r /agent/requirements.txt # Copy your agent code COPY app /agent/app # Return to the Veris app directory (required) WORKDIR /app

The final WORKDIR /app is required. The Veris entrypoint script lives at /app and must be the working directory when the container starts. Your agent code should be placed at /agent (or the path specified in agent.code_path).

The base image uses Python 3.12 with uv pre-installed. If you use uv for dependency management:

.veris/Dockerfile.sandbox (with uv)
FROM us-central1-docker.pkg.dev/veris-ai-prod/veris-sandbox/veris-gvisor:latest COPY pyproject.toml uv.lock /agent/ WORKDIR /agent RUN uv sync --frozen --no-dev COPY app /agent/app WORKDIR /app

See the Dockerfile Reference for more examples including Node.js agents and database schemas.

Step 6: Set Environment Variables

If your agent needs API keys or other secrets at runtime, set them as environment variables on the Veris platform. These are securely injected into the container at runtime and never embedded in your Docker image.

# Set secrets (encrypted at rest) veris env set OPENAI_API_KEY=sk-... --secret veris env set ANTHROPIC_API_KEY=sk-ant-... --secret # Set non-secret config veris env set LOG_LEVEL=info

You can also define environment variables in the agent.environment section of veris.yaml for non-sensitive values. Runtime variables set with veris env set take precedence.

For local development with veris run local, create a .env file in your project root. The local runner loads it automatically.

Step 7: Push Your Agent

Build and push your agent image to the Veris registry:

veris env push

This command:

  1. Creates a new image tag in your environment
  2. Builds the Docker image locally using your Dockerfile.sandbox
  3. Pushes the image to the Veris container registry

By default, images are tagged latest. Use --tag v1.0 to create named versions:

veris env push --tag v1.0

If you prefer to build in the cloud (useful for CI or Apple Silicon Macs):

veris env push --remote

After pushing, your environment will show the new tag on the Environments page and the image will appear on the Images page.

Step 8: Generate Scenarios & Graders

Scenarios are test cases that define:

  • Persona — who is the simulated user? (name, background, personality)
  • Objectives — what does the user want to accomplish?
  • Context — what data should mock services have? (e.g., “User has 3 meetings tomorrow”)
  • Success criteria — how do we know the agent succeeded?

Graders are evaluation criteria that analyze simulation transcripts for specific failure modes (hallucination, incorrect tool usage, poor communication, etc.).

Veris can auto-generate both scenarios and graders by analyzing your agent’s code:

veris scenarios generate

This launches an async job that explores your agent’s codebase and produces a scenario set (a collection of scenarios) along with matching graders. You can monitor progress with:

veris scenarios list

You can also generate scenarios from the Console by navigating to Scenarios → Generate.

Auto-generation works best when your agent’s code is well-structured with clear endpoint handlers and service integration patterns. The generator uses Claude to analyze your code and produce realistic test scenarios.

Step 9: Run Simulations

Create a run to execute your scenarios:

veris run create

The interactive prompt asks you to select a scenario set and set concurrency (how many simulations run in parallel). You can also provide these as flags:

veris run create \ --scenario-set-id ss_abc123 \ --concurrency 10

Each scenario in the set becomes one simulation. During a simulation:

  1. A fresh container starts with your agent and mock services
  2. The mock services seed their data based on the scenario context
  3. The actor (simulated user) sends the first message based on the persona’s objectives
  4. Your agent processes the message, potentially calling mock services
  5. The actor evaluates the response, decides on the next action, and continues
  6. The simulation ends when the actor completes all objectives or max_turns is reached

Monitor progress:

# CLI - watch mode polls every 3 seconds veris run status RUN_ID --watch # Or view real-time progress in the Console # Navigate to Simulations → click on your run

Running locally

For faster iteration during development, you can run simulations locally without pushing to the cloud:

# Run all scenarios in the scenarios/ directory veris run local # Run a specific scenario veris run local schedule_meeting # Skip rebuilding the image veris run local --skip-build

Local runs use Docker on your machine and load environment variables from a .env file in your project root.

Step 10: Evaluate Results

After simulations complete, run graders against the transcripts:

veris evaluation-runs create

The interactive prompt asks which completed run and which grader to use. Graders analyze each simulation transcript and produce scores and findings for categories like:

  • Hallucination detection (did the agent fabricate information?)
  • Tool execution verification (did the agent actually call the right APIs?)
  • Communication quality (was the agent clear and helpful?)
  • Procedural correctness (did the agent follow the right steps?)
  • Success criteria fulfillment (were the user’s goals achieved?)

Watch the evaluation progress:

veris evaluation-runs status EVAL_RUN_ID --run-id RUN_ID --watch

You can also trigger evaluations from the Console on the Evaluations page.

Step 11: Generate a Report

Finally, generate a report that aggregates your evaluation results:

veris reports create

Reports analyze patterns across all simulations in an evaluation run, identifying systemic issues and providing actionable insights. Download the report as HTML:

veris reports get REPORT_ID -o my-report.html

Or view it directly in the Console under Reports.

The Iteration Loop

Once you’ve completed your first pass, the workflow becomes a tight iteration loop:

  1. Review the report to identify agent weaknesses
  2. Fix the issues in your agent’s code
  3. Push a new image version: veris env push --tag v1.1
  4. Re-run simulations: veris run create
  5. Re-evaluate: veris evaluation-runs create
  6. Generate a new report and compare

Use veris run local during active development for faster feedback, then run cloud simulations at scale once you’re confident in the changes.

Next Steps