Models API Reference¶
A multi-turn evaluation transcript.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Unique identifier used as the pytest test name. |
required |
turns
|
list[Turn]
|
Ordered list of turns. |
required |
threshold
|
float
|
Fraction of runs that must pass (0.0-1.0). |
0.8
|
runs
|
int
|
Number of times to execute this transcript. |
1
|
tags
|
list[str]
|
Optional quality-gate tags (e.g. ["gate:booking"]). |
list()
|
Source code in src/pytest_agent_eval/models.py
A single turn in a transcript.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user
|
str
|
The user message (also used as the transcript when |
required |
audio
|
_PathLike | None
|
Optional path to a WAV file for voice adapters. Resolved relative to the YAML file's directory when loaded from YAML. |
None
|
expect
|
Expect
|
Expectations for the agent's reply. |
Expect()
|
Source code in src/pytest_agent_eval/models.py
Expectations for a single transcript turn.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
evaluators
|
list[Any]
|
Programmatic evaluators (Python API). |
list()
|
judge
|
JudgeConfig | None
|
YAML-defined judge config. |
None
|
tool_calls_include
|
list[str]
|
Tool names that must appear in tool_calls. |
list()
|
tool_calls_exclude
|
list[str]
|
Tool names that must NOT appear in tool_calls. |
list()
|
reply_contains_any
|
list[str]
|
Reply must contain at least one of these strings. |
list()
|
reply_contains_all
|
list[str]
|
Reply must contain all of these strings. |
list()
|
Source code in src/pytest_agent_eval/models.py
Context passed to every evaluator for a turn.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user
|
str
|
The user message for this turn. |
required |
reply
|
str
|
The agent's reply. |
required |
tool_calls
|
list[str]
|
Names of tools called during the turn. |
required |
history
|
list[dict[str, Any]]
|
Full conversation history in OpenAI message format, up to but not including the assistant reply for this turn. |
required |
Source code in src/pytest_agent_eval/models.py
Aggregated result across all runs of a transcript.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
passed
|
bool
|
True if score >= threshold. |
required |
score
|
float
|
Fraction of runs that passed (0.0-1.0). |
required |
threshold
|
float
|
Required pass fraction. |
required |
runs
|
list[RunResult]
|
Individual run results. |
required |