BLXBench Docs
BLXBench Docs
LeaderboardOur TestsSponsor / PartnershipDocumentationInstallationQuick StartTUICommandsHeadless ModeConfigurationLeaderboardOur TestsAccountAboutFAQSupport

Configuration

Configure blxbench via files, environment variables, and flags.

The CLI is distributed as the npm package @bitslix/blxbench; configuration below applies to the blxbench command in your shell.

Configuration Files

.env File

BLXBench loads configuration from a .env file in the current directory or the path specified with --dotenv-path:

# Example keys (use the env vars required by your chosen adapter)
OPENROUTER_API_KEY=sk-or-...
OPENAI_API_KEY=sk-...
BLXBENCH_API_KEY=your-blxbench-key

Results Directory

By default, generated reports are saved to the repo/workspace results/ directory. In headless mode, --save-json writes an additional JSON export to a custom path:

blxbench --headless --save-json ./custom-results.json

In the TUI, use /set output-dir PATH to change the report directory for the interactive run.

Provider Configuration

Providers are loaded from adapter folders under packages/benchmark-core/adapters/. Each adapter exposes a provider alias (argument in meta.json):

AliasAdapterTypical env var
oprOpenRouterOPENROUTER_API_KEY
oaiOpenAIOPENAI_API_KEY
hgfHugging FaceHF_TOKEN
tgrTogetherTOGETHER_API_KEY
ptkPortkeyPORTKEY_API_KEY
cfrCloudflareCLOUDFLARE_API_TOKEN

The default --provider is opr. Model ids are whatever that endpoint accepts (e.g. OpenRouter-style vendor/model-name).

Test Filters

Categories

Filter tests by category (folder names under packages/benchmark-core/tests/):

CategoryRole
speedLatency-sensitive correctness
securitySafe outputs and vulnerability awareness
reasoningStructured / numeric reasoning
debuggingSmall patches and bug fixes
refactoringBehavior-preserving edits
hallucinationGrounding under tricky prompts
coding_uiHTML artifacts + optional Playwright render

Difficulty Levels

Filter by difficulty:

LevelDescription
easyLighter fixtures
mediumRepresentative difficulty
hardStricter / longer tasks

Legacy German labels (leicht, …) are normalized to these ids.

Advanced Options

Playwright Configuration

For coding_ui (and other HTML render checks), Playwright Chromium should be installed:

blxbench --headless --install-chromium

Skip render validation if Chromium is missing:

blxbench --headless --skip-render-validation

Rate Limiting

ValueBehavior
UnsetNo rate limiting
--ratelimitDefault RPM
--ratelimit 30Custom RPM

Fail Fast

Stop on first test failure:

blxbench --headless --fail-fast

Configuration Priority

Roughly: built-in defaults → values from .env → environment variables → flags (flags win when both apply).

Custom Tests Directory

Use a custom test tree:

blxbench --headless --tests-dir ./my-tests --provider opr --models openai/gpt-5.4-mini

Your directory should mirror the fixture layout expected by benchmark-core. See Our Tests for how catalog entries map to files.

Headless Mode

Running benchmarks in automated environments.

Leaderboard

How to read and interpret the BLXBench leaderboard.

On this page

Configuration Files.env FileResults DirectoryProvider ConfigurationTest FiltersCategoriesDifficulty LevelsAdvanced OptionsPlaywright ConfigurationRate LimitingFail FastConfiguration PriorityCustom Tests Directory