Skills included: - venice-chat: Chat with Venice LLM models, vision, reasoning - venice-chat-benchmark: Benchmark chat models with infographics - venice-image-gen: Generate images via Venice API - venice-list-image-models: List available image models - venice-list-text-models: List available text models - venice-list-video-models: List available video models - venice-tts: Text-to-speech via Venice API - venice-video-generate: Generate videos from text/images - venice-video-queue: Queue video generation jobs - venice-video-quote: Get video generation cost quotes - venice-video-retrieve: Retrieve completed videos All rebranded from Agent Zero paths to Agent JAE (~/.jae/agent/skills/). Requires VENICE_API_KEY environment variable.
124 lines
4 KiB
Markdown
124 lines
4 KiB
Markdown
# Venice Chat Benchmark
|
|
|
|
Benchmark [Venice.ai](https://venice.ai/) chat completion models with complex `tool_choice` payloads. Runs N iterations, captures detailed timing and reliability metrics, and optionally generates a 4K infographic summary.
|
|
|
|
## Features
|
|
|
|
- **Stress testing** -- run configurable iterations against any Venice chat model
|
|
- **Tool choice analysis** -- measures tool call rate, distribution across 7 defined tools, and JSON argument validity
|
|
- **Timing statistics** -- average, median, min, max, standard deviation, P90, P95, and P99
|
|
- **Error categorization** -- groups failures by type (HTTP, timeout, connection, JSON decode)
|
|
- **Token tracking** -- per-run and aggregate prompt, completion, and total token usage
|
|
- **Finish reason tracking** -- counts of `tool_calls`, `stop`, and other finish reasons
|
|
- **4K infographic** -- optional visual summary generated via the `venice-image-gen` skill
|
|
- **Intermediate saves** -- results are written to disk after every run so data is preserved if interrupted
|
|
|
|
## Prerequisites
|
|
|
|
```bash
|
|
pip install requests
|
|
export VENICE_API_KEY="your_venice_api_key"
|
|
```
|
|
|
|
For infographic generation, the `venice-image-gen` skill must be available.
|
|
|
|
## Usage
|
|
|
|
### Basic benchmark (50 runs, default model)
|
|
|
|
```bash
|
|
python scripts/benchmark.py --model minimax-m27 --runs 50 --output ./chat_benchmark
|
|
```
|
|
|
|
### Custom run count and timeout
|
|
|
|
```bash
|
|
python scripts/benchmark.py --model minimax-m27 --runs 100 --timeout 60 --output ./chat_benchmark
|
|
```
|
|
|
|
### With infographic generation
|
|
|
|
```bash
|
|
python scripts/benchmark.py --model minimax-m27 --runs 50 --output ./chat_benchmark --infographic
|
|
```
|
|
|
|
## Options
|
|
|
|
| Option | Short | Default | Description |
|
|
|--------|-------|---------|-------------|
|
|
| `--model` | -- | `minimax-m27` | Model ID to benchmark |
|
|
| `--runs` | -- | `50` | Number of test iterations |
|
|
| `--timeout` | -- | `120` | Request timeout in seconds |
|
|
| `--output` | -- | `~/chat_benchmark` | Output directory for results |
|
|
| `--infographic` | -- | off | Generate a 4K infographic summary when done |
|
|
|
|
## Test Payload
|
|
|
|
The benchmark sends a fixed travel planning scenario to every run:
|
|
|
|
- **System prompt** enforces tool-only responses (no plain text)
|
|
- **7 function tools** defined: `set_travel_dates`, `set_secondary_destinations`, `set_traveler_info`, `set_travel_priorities`, `set_budget`, `present_choices`, `suggest_primary_destinations`
|
|
- **User message** contains multiple extractable data points (dates, destinations, interests, budget)
|
|
- **`tool_choice: auto`** lets the model decide which tool(s) to call
|
|
|
|
## Python Import
|
|
|
|
```python
|
|
from benchmark import run_benchmark
|
|
|
|
results = run_benchmark(
|
|
api_key="your_key",
|
|
model="minimax-m27",
|
|
num_runs=10,
|
|
output_dir="./benchmark_output",
|
|
timeout=120
|
|
)
|
|
print(results["stats"]["success_rate"])
|
|
```
|
|
|
|
## Response Format
|
|
|
|
The benchmark writes `benchmark_results.json` to the output directory:
|
|
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"model": "minimax-m27",
|
|
"num_runs": 50,
|
|
"timeout": 120,
|
|
"num_tools": 7,
|
|
"tool_names": ["set_travel_dates", "..."],
|
|
"tool_choice": "auto",
|
|
"start_time": "2026-03-20T12:00:00",
|
|
"end_time": "2026-03-20T12:15:00"
|
|
},
|
|
"runs": [
|
|
{
|
|
"run": 1,
|
|
"success": true,
|
|
"duration_seconds": 2.451,
|
|
"finish_reason": "tool_calls",
|
|
"has_tool_calls": true,
|
|
"tool_calls": [{"name": "set_travel_dates", "args_valid_json": true}],
|
|
"usage": {"prompt_tokens": 850, "completion_tokens": 120, "total_tokens": 970}
|
|
}
|
|
],
|
|
"stats": {
|
|
"total_runs": 50,
|
|
"success_rate": 98.0,
|
|
"tool_call_rate": 95.0,
|
|
"json_validity_rate": 100.0,
|
|
"timing": {"avg": 2.5, "median": 2.3, "min": 1.1, "max": 5.2, "stdev": 0.8},
|
|
"tool_call_distribution": {"set_travel_dates": 40, "set_budget": 8},
|
|
"token_usage": {"avg_total_tokens": 970, "total_all_tokens": 48500}
|
|
}
|
|
}
|
|
```
|
|
|
|
With the `--infographic` flag, a `benchmark_infographic.png` file is also generated.
|
|
|
|
## Environment Variables
|
|
|
|
| Variable | Required | Description |
|
|
|----------|----------|-------------|
|
|
| `VENICE_API_KEY` | Yes | Venice.ai API key |
|