Models API Reference¶

A multi-turn evaluation transcript.

Parameters:

Name	Type	Description	Default
`id`	`str`	Unique identifier used as the pytest test name.	required
`turns`	`list[Turn]`	Ordered list of turns.	required
`threshold`	`float`	Fraction of runs that must pass (0.0-1.0).	`0.8`
`runs`	`int`	Number of times to execute this transcript.	`1`
`tags`	`list[str]`	Optional quality-gate tags (e.g. ["gate:booking"]).	`list()`

Source code in src/pytest_agent_eval/models.py

@dataclass
class Transcript:
    """A multi-turn evaluation transcript.

    Args:
        id: Unique identifier used as the pytest test name.
        turns: Ordered list of turns.
        threshold: Fraction of runs that must pass (0.0-1.0).
        runs: Number of times to execute this transcript.
        tags: Optional quality-gate tags (e.g. ["gate:booking"]).
    """

    id: str
    turns: list[Turn]
    threshold: float = 0.8
    runs: int = 1
    tags: list[str] = field(default_factory=list)

A single turn in a transcript.

Parameters:

Name	Type	Description	Default
`user`	`str`	The user message (also used as the transcript when `audio` is set).	required
`audio`	`_PathLike \| None`	Optional path to a WAV file for voice adapters. Resolved relative to the YAML file's directory when loaded from YAML.	`None`
`expect`	`Expect`	Expectations for the agent's reply.	`Expect()`

Source code in src/pytest_agent_eval/models.py

@dataclass
class Turn:
    """A single turn in a transcript.

    Args:
        user: The user message (also used as the transcript when ``audio`` is set).
        audio: Optional path to a WAV file for voice adapters. Resolved relative to
            the YAML file's directory when loaded from YAML.
        expect: Expectations for the agent's reply.
    """

    user: str
    audio: _PathLike | None = None
    expect: Expect = field(default_factory=Expect)

Expectations for a single transcript turn.

Parameters:

Name	Type	Description	Default
`evaluators`	`list[Any]`	Programmatic evaluators (Python API).	`list()`
`judge`	`JudgeConfig \| None`	YAML-defined judge config.	`None`
`tool_calls_include`	`list[str]`	Tool names that must appear in tool_calls.	`list()`
`tool_calls_exclude`	`list[str]`	Tool names that must NOT appear in tool_calls.	`list()`
`reply_contains_any`	`list[str]`	Reply must contain at least one of these strings.	`list()`
`reply_contains_all`	`list[str]`	Reply must contain all of these strings.	`list()`

Source code in src/pytest_agent_eval/models.py

@dataclass
class Expect:
    """Expectations for a single transcript turn.

    Args:
        evaluators: Programmatic evaluators (Python API).
        judge: YAML-defined judge config.
        tool_calls_include: Tool names that must appear in tool_calls.
        tool_calls_exclude: Tool names that must NOT appear in tool_calls.
        reply_contains_any: Reply must contain at least one of these strings.
        reply_contains_all: Reply must contain all of these strings.
    """

    evaluators: list[Any] = field(default_factory=list)
    judge: JudgeConfig | None = None
    tool_calls_include: list[str] = field(default_factory=list)
    tool_calls_exclude: list[str] = field(default_factory=list)
    reply_contains_any: list[str] = field(default_factory=list)
    reply_contains_all: list[str] = field(default_factory=list)

Context passed to every evaluator for a turn.

Parameters:

Name	Type	Description	Default
`user`	`str`	The user message for this turn.	required
`reply`	`str`	The agent's reply.	required
`tool_calls`	`list[str]`	Names of tools called during the turn.	required
`history`	`list[dict[str, Any]]`	Full conversation history in OpenAI message format, up to but not including the assistant reply for this turn.	required

Source code in src/pytest_agent_eval/models.py

@dataclass
class TurnContext:
    """Context passed to every evaluator for a turn.

    Args:
        user: The user message for this turn.
        reply: The agent's reply.
        tool_calls: Names of tools called during the turn.
        history: Full conversation history in OpenAI message format, up to but not including
            the assistant reply for this turn.
    """

    user: str
    reply: str
    tool_calls: list[str]
    history: list[dict[str, Any]]

Result from a single evaluator on a single turn.

Source code in src/pytest_agent_eval/models.py

@dataclass
class EvalResult:
    """Result from a single evaluator on a single turn."""

    passed: bool
    reasoning: str = ""

Aggregated result across all runs of a transcript.

Parameters:

Name	Type	Description	Default
`passed`	`bool`	True if score >= threshold.	required
`score`	`float`	Fraction of runs that passed (0.0-1.0).	required
`threshold`	`float`	Required pass fraction.	required
`runs`	`list[RunResult]`	Individual run results.	required

Source code in src/pytest_agent_eval/models.py

@dataclass
class TranscriptResult:
    """Aggregated result across all runs of a transcript.

    Args:
        passed: True if score >= threshold.
        score: Fraction of runs that passed (0.0-1.0).
        threshold: Required pass fraction.
        runs: Individual run results.
    """

    passed: bool
    score: float
    threshold: float
    runs: list[RunResult]

    @property
    def passed_run_count(self) -> int:
        """Number of runs that passed."""
        return sum(r.passed for r in self.runs)

    def assert_threshold(self) -> None:
        """Raise AssertionError if score is below threshold."""
        if not self.passed:
            raise AssertionError(
                f"LLM eval failed: score={self.score:.2f} < threshold={self.threshold:.2f} "
                f"({self.passed_run_count}/{len(self.runs)} runs passed)"
            )

`passed_run_count: int` `property` ¶

Number of runs that passed.

`assert_threshold() -> None` ¶

Raise AssertionError if score is below threshold.

Source code in src/pytest_agent_eval/models.py

def assert_threshold(self) -> None:
    """Raise AssertionError if score is below threshold."""
    if not self.passed:
        raise AssertionError(
            f"LLM eval failed: score={self.score:.2f} < threshold={self.threshold:.2f} "
            f"({self.passed_run_count}/{len(self.runs)} runs passed)"
        )

Models API Reference¶

passed_run_count: int property ¶

assert_threshold() -> None ¶

`passed_run_count: int` `property` ¶

`assert_threshold() -> None` ¶