Agent-JAE/default-skills/venice-chat-benchmark/README.md
jae 19b25341bd
Some checks are pending
CI / build-check-test (push) Waiting to run
feat: add 11 Venice AI skills as bundled defaults
Skills included:
- venice-chat: Chat with Venice LLM models, vision, reasoning
- venice-chat-benchmark: Benchmark chat models with infographics
- venice-image-gen: Generate images via Venice API
- venice-list-image-models: List available image models
- venice-list-text-models: List available text models
- venice-list-video-models: List available video models
- venice-tts: Text-to-speech via Venice API
- venice-video-generate: Generate videos from text/images
- venice-video-queue: Queue video generation jobs
- venice-video-quote: Get video generation cost quotes
- venice-video-retrieve: Retrieve completed videos

All rebranded from Agent Zero paths to Agent JAE (~/.jae/agent/skills/).
Requires VENICE_API_KEY environment variable.
2026-03-23 18:46:23 +01:00

4 KiB

Venice Chat Benchmark

Benchmark Venice.ai chat completion models with complex tool_choice payloads. Runs N iterations, captures detailed timing and reliability metrics, and optionally generates a 4K infographic summary.

Features

  • Stress testing -- run configurable iterations against any Venice chat model
  • Tool choice analysis -- measures tool call rate, distribution across 7 defined tools, and JSON argument validity
  • Timing statistics -- average, median, min, max, standard deviation, P90, P95, and P99
  • Error categorization -- groups failures by type (HTTP, timeout, connection, JSON decode)
  • Token tracking -- per-run and aggregate prompt, completion, and total token usage
  • Finish reason tracking -- counts of tool_calls, stop, and other finish reasons
  • 4K infographic -- optional visual summary generated via the venice-image-gen skill
  • Intermediate saves -- results are written to disk after every run so data is preserved if interrupted

Prerequisites

pip install requests
export VENICE_API_KEY="your_venice_api_key"

For infographic generation, the venice-image-gen skill must be available.

Usage

Basic benchmark (50 runs, default model)

python scripts/benchmark.py --model minimax-m27 --runs 50 --output ./chat_benchmark

Custom run count and timeout

python scripts/benchmark.py --model minimax-m27 --runs 100 --timeout 60 --output ./chat_benchmark

With infographic generation

python scripts/benchmark.py --model minimax-m27 --runs 50 --output ./chat_benchmark --infographic

Options

Option Short Default Description
--model -- minimax-m27 Model ID to benchmark
--runs -- 50 Number of test iterations
--timeout -- 120 Request timeout in seconds
--output -- /a0/usr/workdir/chat_benchmark Output directory for results
--infographic -- off Generate a 4K infographic summary when done

Test Payload

The benchmark sends a fixed travel planning scenario to every run:

  • System prompt enforces tool-only responses (no plain text)
  • 7 function tools defined: set_travel_dates, set_secondary_destinations, set_traveler_info, set_travel_priorities, set_budget, present_choices, suggest_primary_destinations
  • User message contains multiple extractable data points (dates, destinations, interests, budget)
  • tool_choice: auto lets the model decide which tool(s) to call

Python Import

from benchmark import run_benchmark

results = run_benchmark(
    api_key="your_key",
    model="minimax-m27",
    num_runs=10,
    output_dir="./benchmark_output",
    timeout=120
)
print(results["stats"]["success_rate"])

Response Format

The benchmark writes benchmark_results.json to the output directory:

{
  "metadata": {
    "model": "minimax-m27",
    "num_runs": 50,
    "timeout": 120,
    "num_tools": 7,
    "tool_names": ["set_travel_dates", "..."],
    "tool_choice": "auto",
    "start_time": "2026-03-20T12:00:00",
    "end_time": "2026-03-20T12:15:00"
  },
  "runs": [
    {
      "run": 1,
      "success": true,
      "duration_seconds": 2.451,
      "finish_reason": "tool_calls",
      "has_tool_calls": true,
      "tool_calls": [{"name": "set_travel_dates", "args_valid_json": true}],
      "usage": {"prompt_tokens": 850, "completion_tokens": 120, "total_tokens": 970}
    }
  ],
  "stats": {
    "total_runs": 50,
    "success_rate": 98.0,
    "tool_call_rate": 95.0,
    "json_validity_rate": 100.0,
    "timing": {"avg": 2.5, "median": 2.3, "min": 1.1, "max": 5.2, "stdev": 0.8},
    "tool_call_distribution": {"set_travel_dates": 40, "set_budget": 8},
    "token_usage": {"avg_total_tokens": 970, "total_all_tokens": 48500}
  }
}

With the --infographic flag, a benchmark_infographic.png file is also generated.

Environment Variables

Variable Required Description
VENICE_API_KEY Yes Venice.ai API key