Reporting¶
pytest-agent-eval adds score information to pytest's standard terminal output and can optionally write a full markdown report.
Verbosity levels¶
Default (no -v)¶
Tests appear with the standard PASSED/FAILED status. No extra LLM eval detail is shown inline.
-v (one verbose flag)¶
A score summary line is appended to each eval test's output section:
tests/test_booking.py::test_booking_flow PASSED
---- LLM Eval ----
[3/3 runs, score=1.00 >= 0.80]
Run 1 ✅
Run 2 ✅
Run 3 ✅
-vv (two verbose flags)¶
Per-turn evaluator reasoning is included:
tests/test_booking.py::test_booking_flow PASSED
---- LLM Eval ----
[3/3 runs, score=1.00 >= 0.80]
Run 1 ✅
All substring checks passed
All tool call checks passed
Run 2 ✅
All substring checks passed
All tool call checks passed
The --agent-eval-report flag¶
Pass a file path to write a full markdown report after the session:
Markdown report format¶
The generated report has two sections: a summary table and per-transcript details.
The summary table lists each transcript with its run count, pass count, score, threshold, and pass/fail status. The details section shows every run with turn-level evaluator reasoning.
Configuring the report path¶
You can set a default report path in pyproject.toml so you do not need to pass the flag every time:
The command-line flag always takes precedence over the config value.