{% extends "base.html" %} {% from "components/custom_dropdown.html" import render_dropdown %} {% set active_page = 'benchmark' %} {% block title %}Benchmark Configuration - Deep Research System{% endblock %} {% block extra_head %} {% endblock %} {% block content %}
Purpose: Benchmarks are designed to help you evaluate if your configuration works well, not for research papers or production use.
Responsible Usage: Please use reasonable example counts to avoid overwhelming search engines. The default of 75 examples provides a good balance for configuration testing.
Requirements: Benchmarks require an evaluation model for grading results. You can configure your preferred provider and model in the Evaluation Settings below. The default uses OpenRouter with Claude 3.7 Sonnet, but you can choose from various providers including OpenAI, Anthropic, or local models.
🔧 For Shared Resources: When using SearXNG or other shared engines, reduce iterations and questions per iteration in Settings to minimize load on shared infrastructure.