Some checks are pending
CI / build-check-test (push) Waiting to run
Skills included: - venice-chat: Chat with Venice LLM models, vision, reasoning - venice-chat-benchmark: Benchmark chat models with infographics - venice-image-gen: Generate images via Venice API - venice-list-image-models: List available image models - venice-list-text-models: List available text models - venice-list-video-models: List available video models - venice-tts: Text-to-speech via Venice API - venice-video-generate: Generate videos from text/images - venice-video-queue: Queue video generation jobs - venice-video-quote: Get video generation cost quotes - venice-video-retrieve: Retrieve completed videos All rebranded from Agent Zero paths to Agent JAE (~/.jae/agent/skills/). Requires VENICE_API_KEY environment variable.
2.6 KiB
2.6 KiB
| name | description | version | author | tags | trigger_patterns | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| venice-chat-benchmark | Benchmark Venice.ai chat models with complex tool_choice payloads. Runs N iterations, captures timing, tool call distribution, JSON validity, errors, token usage, and generates a 4K infographic. | 1.0.0 | Agent JAE |
|
|
Venice Chat Model Benchmark
Benchmark Venice.ai chat completion models with complex tool_choice payloads.
When to Use
Use this skill when you need to:
- Stress test a Venice chat model with tool calling
- Measure response time, reliability, and tool call accuracy
- Compare model behavior across many runs
- Generate visual benchmark reports
Usage
Basic (50 runs, minimax-m27)
export VENICE_API_KEY="your-key"
python ~/.jae/agent/skills/venice-chat-benchmark/scripts/benchmark.py --model minimax-m27 --runs 50 --output ~/chat_benchmark
With Infographic
python ~/.jae/agent/skills/venice-chat-benchmark/scripts/benchmark.py --model minimax-m27 --runs 50 --output ~/chat_benchmark --infographic
Options
| Option | Default | Description |
|---|---|---|
| --model | minimax-m27 | Model ID to benchmark |
| --runs | 50 | Number of test iterations |
| --timeout | 120 | Request timeout in seconds |
| --output | ~/chat_benchmark | Output directory |
| --infographic | off | Generate 4K infographic when done |
What It Measures
- Response time (avg, median, min, max, stdev, P90, P95)
- Success rate (HTTP errors, timeouts, connection errors)
- Tool call rate (% of responses that include tool calls)
- Tool call distribution (which tools get selected)
- JSON validity (whether tool call arguments parse correctly)
- Token usage (prompt, completion, total)
- Finish reasons (tool_calls vs stop vs other)
- Error categorization (by type, with details)
Test Payload
The benchmark uses a complex travel planning scenario with:
- Detailed system prompt enforcing tool-only responses
- 7 function tools defined (dates, destinations, traveler info, priorities, budget, choices, suggestions)
- A rich user message with multiple extractable data points
tool_choice: auto
Output
benchmark_results.json— Full results with all run data and computed statsbenchmark_infographic.png— 4K visual summary (with --infographic flag)
Requirements
VENICE_API_KEYenvironment variablerequestsPython packagevenice-image-genskill (for infographic generation, optional)