Agent-JAE/default-skills/venice-chat-benchmark/SKILL.md
jae 19b25341bd
Some checks are pending
CI / build-check-test (push) Waiting to run
feat: add 11 Venice AI skills as bundled defaults
Skills included:
- venice-chat: Chat with Venice LLM models, vision, reasoning
- venice-chat-benchmark: Benchmark chat models with infographics
- venice-image-gen: Generate images via Venice API
- venice-list-image-models: List available image models
- venice-list-text-models: List available text models
- venice-list-video-models: List available video models
- venice-tts: Text-to-speech via Venice API
- venice-video-generate: Generate videos from text/images
- venice-video-queue: Queue video generation jobs
- venice-video-quote: Get video generation cost quotes
- venice-video-retrieve: Retrieve completed videos

All rebranded from Agent Zero paths to Agent JAE (~/.jae/agent/skills/).
Requires VENICE_API_KEY environment variable.
2026-03-23 18:46:23 +01:00

2.6 KiB

name description version author tags trigger_patterns
venice-chat-benchmark Benchmark Venice.ai chat models with complex tool_choice payloads. Runs N iterations, captures timing, tool call distribution, JSON validity, errors, token usage, and generates a 4K infographic. 1.0.0 Agent JAE
venice
api
benchmark
chat
tool_choice
testing
benchmark chat
test model
venice benchmark
tool choice test

Venice Chat Model Benchmark

Benchmark Venice.ai chat completion models with complex tool_choice payloads.

When to Use

Use this skill when you need to:

  • Stress test a Venice chat model with tool calling
  • Measure response time, reliability, and tool call accuracy
  • Compare model behavior across many runs
  • Generate visual benchmark reports

Usage

Basic (50 runs, minimax-m27)

export VENICE_API_KEY="your-key"
python ~/.jae/agent/skills/venice-chat-benchmark/scripts/benchmark.py --model minimax-m27 --runs 50 --output ~/chat_benchmark

With Infographic

python ~/.jae/agent/skills/venice-chat-benchmark/scripts/benchmark.py --model minimax-m27 --runs 50 --output ~/chat_benchmark --infographic

Options

Option Default Description
--model minimax-m27 Model ID to benchmark
--runs 50 Number of test iterations
--timeout 120 Request timeout in seconds
--output ~/chat_benchmark Output directory
--infographic off Generate 4K infographic when done

What It Measures

  • Response time (avg, median, min, max, stdev, P90, P95)
  • Success rate (HTTP errors, timeouts, connection errors)
  • Tool call rate (% of responses that include tool calls)
  • Tool call distribution (which tools get selected)
  • JSON validity (whether tool call arguments parse correctly)
  • Token usage (prompt, completion, total)
  • Finish reasons (tool_calls vs stop vs other)
  • Error categorization (by type, with details)

Test Payload

The benchmark uses a complex travel planning scenario with:

  • Detailed system prompt enforcing tool-only responses
  • 7 function tools defined (dates, destinations, traveler info, priorities, budget, choices, suggestions)
  • A rich user message with multiple extractable data points
  • tool_choice: auto

Output

  • benchmark_results.json — Full results with all run data and computed stats
  • benchmark_infographic.png — 4K visual summary (with --infographic flag)

Requirements

  • VENICE_API_KEY environment variable
  • requests Python package
  • venice-image-gen skill (for infographic generation, optional)