CI / build-check-test (push) Waiting to run

Details

feat: add 11 Venice AI skills as bundled defaults

Skills included:
- venice-chat: Chat with Venice LLM models, vision, reasoning
- venice-chat-benchmark: Benchmark chat models with infographics
- venice-image-gen: Generate images via Venice API
- venice-list-image-models: List available image models
- venice-list-text-models: List available text models
- venice-list-video-models: List available video models
- venice-tts: Text-to-speech via Venice API
- venice-video-generate: Generate videos from text/images
- venice-video-queue: Queue video generation jobs
- venice-video-quote: Get video generation cost quotes
- venice-video-retrieve: Retrieve completed videos

All rebranded from Agent Zero paths to Agent JAE (~/.jae/agent/skills/).
Requires VENICE_API_KEY environment variable.

2026-03-23 18:47:33 +01:00

4 KiB

Raw Blame History

Venice Chat Benchmark

Benchmark Venice.ai chat completion models with complex tool_choice payloads. Runs N iterations, captures detailed timing and reliability metrics, and optionally generates a 4K infographic summary.

Features

Stress testing -- run configurable iterations against any Venice chat model
Tool choice analysis -- measures tool call rate, distribution across 7 defined tools, and JSON argument validity
Timing statistics -- average, median, min, max, standard deviation, P90, P95, and P99
Error categorization -- groups failures by type (HTTP, timeout, connection, JSON decode)
Token tracking -- per-run and aggregate prompt, completion, and total token usage
Finish reason tracking -- counts of tool_calls, stop, and other finish reasons
4K infographic -- optional visual summary generated via the venice-image-gen skill
Intermediate saves -- results are written to disk after every run so data is preserved if interrupted

Prerequisites

pip install requests
export VENICE_API_KEY="your_venice_api_key"

For infographic generation, the venice-image-gen skill must be available.

Usage

Basic benchmark (50 runs, default model)

python scripts/benchmark.py --model minimax-m27 --runs 50 --output ./chat_benchmark

Custom run count and timeout

python scripts/benchmark.py --model minimax-m27 --runs 100 --timeout 60 --output ./chat_benchmark

With infographic generation

python scripts/benchmark.py --model minimax-m27 --runs 50 --output ./chat_benchmark --infographic

Options

Option	Short	Default	Description
`--model`	--	`minimax-m27`	Model ID to benchmark
`--runs`	--	`50`	Number of test iterations
`--timeout`	--	`120`	Request timeout in seconds
`--output`	--	`~/chat_benchmark`	Output directory for results
`--infographic`	--	off	Generate a 4K infographic summary when done

Test Payload

The benchmark sends a fixed travel planning scenario to every run:

System prompt enforces tool-only responses (no plain text)
7 function tools defined: set_travel_dates, set_secondary_destinations, set_traveler_info, set_travel_priorities, set_budget, present_choices, suggest_primary_destinations
User message contains multiple extractable data points (dates, destinations, interests, budget)
tool_choice: auto lets the model decide which tool(s) to call

Python Import

from benchmark import run_benchmark

results = run_benchmark(
    api_key="your_key",
    model="minimax-m27",
    num_runs=10,
    output_dir="./benchmark_output",
    timeout=120
)
print(results["stats"]["success_rate"])

Response Format

The benchmark writes benchmark_results.json to the output directory:

{
  "metadata": {
    "model": "minimax-m27",
    "num_runs": 50,
    "timeout": 120,
    "num_tools": 7,
    "tool_names": ["set_travel_dates", "..."],
    "tool_choice": "auto",
    "start_time": "2026-03-20T12:00:00",
    "end_time": "2026-03-20T12:15:00"
  },
  "runs": [
    {
      "run": 1,
      "success": true,
      "duration_seconds": 2.451,
      "finish_reason": "tool_calls",
      "has_tool_calls": true,
      "tool_calls": [{"name": "set_travel_dates", "args_valid_json": true}],
      "usage": {"prompt_tokens": 850, "completion_tokens": 120, "total_tokens": 970}
    }
  ],
  "stats": {
    "total_runs": 50,
    "success_rate": 98.0,
    "tool_call_rate": 95.0,
    "json_validity_rate": 100.0,
    "timing": {"avg": 2.5, "median": 2.3, "min": 1.1, "max": 5.2, "stdev": 0.8},
    "tool_call_distribution": {"set_travel_dates": 40, "set_budget": 8},
    "token_usage": {"avg_total_tokens": 970, "total_all_tokens": 48500}
  }
}

With the --infographic flag, a benchmark_infographic.png file is also generated.

Environment Variables

Variable	Required	Description
`VENICE_API_KEY`	Yes	Venice.ai API key

4 KiB Raw Blame History