Smolagents Adapter Implementation Plan¶

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add a SmolagentsAdapter so users can drop a smolagents agent into pytest-llm-eval the same way they would a pydantic-ai, LangChain, or OpenAI agent.

Architecture: A duck-typed adapter wraps any object exposing .run(task, reset=...) and .memory.steps. It maps our async (history) -> (reply, tool_calls) contract onto smolagents's sync API by running agent.run in a worker thread. First-turn detection (len(history) == 1) drives reset=True/reset=False so multi-turn transcripts work. Tool calls are gathered from new entries in agent.memory.steps; smolagents internals (python_interpreter, final_answer) are filtered by default.

Tech Stack: Python 3.11+, asyncio.to_thread for thread offloading, pytest/pytest-asyncio for tests using a hand-rolled fake agent (no real smolagents dependency in the test path).

File Structure¶

Create: src/pytest_llm_eval/adapters/smolagents.py — SmolagentsAdapter class with reset logic, tool-call extraction, and internal-tool filtering.
Create: tests/test_smolagents_adapter.py — unit tests against a duck-typed fake agent built from types.SimpleNamespace.
Modify: pyproject.toml — add smolagents = ["smolagents>=1.0"] optional extra.
Modify: docs/adapters.md — add the SmolagentsAdapter section with install tabs and constructor table.
Modify: docs/index.md — add a "Smolagents" tab to the "Supported frameworks" tabbed block.
Modify: README.md — add smolagents to the framework table and add a [smolagents] install line.

Task 1: Adapter shell with reset behaviour¶

Files: - Create: tests/test_smolagents_adapter.py - Create: src/pytest_llm_eval/adapters/smolagents.py

Step 1: Write the failing tests

Create tests/test_smolagents_adapter.py:

import types
from typing import Any

import pytest

from pytest_llm_eval.adapters.smolagents import SmolagentsAdapter


def _make_fake_agent(reply: Any = "ok", new_steps: list[Any] | None = None):
    """Build a duck-typed fake smolagents agent that records `run` calls."""
    fake = types.SimpleNamespace()
    fake.memory = types.SimpleNamespace(steps=[])
    fake.calls: list[tuple[str, bool]] = []

    def run(task: str, reset: bool = True) -> Any:
        fake.calls.append((task, reset))
        if reset:
            fake.memory.steps = []
        for step in new_steps or []:
            fake.memory.steps.append(step)
        return reply

    fake.run = run
    return fake


async def test_first_turn_passes_reset_true():
    fake = _make_fake_agent()
    adapter = SmolagentsAdapter(fake)
    history = [{"role": "user", "content": "hello"}]

    await adapter(history)

    assert fake.calls == [("hello", True)]


async def test_subsequent_turn_passes_reset_false():
    fake = _make_fake_agent()
    adapter = SmolagentsAdapter(fake)
    history = [
        {"role": "user", "content": "hello"},
        {"role": "assistant", "content": "hi there"},
        {"role": "user", "content": "follow up"},
    ]

    await adapter(history)

    assert fake.calls == [("follow up", False)]


async def test_returns_reply_string():
    fake = _make_fake_agent(reply=42)
    adapter = SmolagentsAdapter(fake)

    reply, _ = await adapter([{"role": "user", "content": "hi"}])

    assert reply == "42"

Step 2: Run tests to verify they fail

uv run pytest tests/test_smolagents_adapter.py -v

Expected: ModuleNotFoundError: No module named 'pytest_llm_eval.adapters.smolagents'

Step 3: Implement the adapter shell

Create src/pytest_llm_eval/adapters/smolagents.py:

"""Adapter for smolagents agents (ToolCallingAgent, CodeAgent, ...)."""

from __future__ import annotations

import asyncio
from typing import Any


class SmolagentsAdapter:
    """Wrap a smolagents agent to conform to the agent callable contract.

    Duck-typed: works with any object exposing ``.run(task, reset=...)`` and
    ``.memory.steps``. Smolagents's sync ``run`` is offloaded with
    ``asyncio.to_thread`` so the event loop stays responsive.

    Args:
        agent: A smolagents agent (e.g. ``ToolCallingAgent``, ``CodeAgent``).
        include_internal_tools: When ``True``, smolagents-internal pseudo-tools
            (``python_interpreter``, ``final_answer``) are included in the
            returned tool-call list. Defaults to ``False``.

    Example:
        ```python
        from smolagents import ToolCallingAgent, InferenceClientModel
        from pytest_llm_eval.adapters.smolagents import SmolagentsAdapter

        model = InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct")
        agent = ToolCallingAgent(tools=[...], model=model)

        @pytest.fixture
        def llm_eval_agent():
            return SmolagentsAdapter(agent)
        ```
    """

    def __init__(self, agent: Any, *, include_internal_tools: bool = False) -> None:
        """Store the smolagents agent and the internal-tool filter setting."""
        self._agent = agent
        self._include_internal_tools = include_internal_tools

    async def __call__(self, history: list[dict[str, Any]]) -> tuple[str, list[str]]:
        """Run the agent against the latest user message and return (reply, tool_calls)."""
        user_msg = history[-1]["content"]
        reset = len(history) == 1
        result = await asyncio.to_thread(self._agent.run, user_msg, reset=reset)
        return str(result), []

Step 4: Run tests to verify they pass

uv run pytest tests/test_smolagents_adapter.py -v

Expected: 3 PASS

Step 5: Commit

git add src/pytest_llm_eval/adapters/smolagents.py tests/test_smolagents_adapter.py
git commit -m "feat: add SmolagentsAdapter shell with reset detection"

Task 2: Tool-call extraction from `memory.steps`¶

Files: - Modify: tests/test_smolagents_adapter.py - Modify: src/pytest_llm_eval/adapters/smolagents.py

Step 1: Write the failing tests

Append to tests/test_smolagents_adapter.py:

def _step(*tool_call_names: str) -> Any:
    """Build a fake step with a `.tool_calls` list of objects exposing `.name`."""
    return types.SimpleNamespace(
        tool_calls=[types.SimpleNamespace(name=n) for n in tool_call_names]
    )


def _step_no_tool_calls() -> Any:
    """Build a fake step that has no `tool_calls` attribute (e.g. a planning step)."""
    return types.SimpleNamespace()


async def test_extracts_new_tool_calls_only():
    fake = _make_fake_agent(new_steps=[_step("web_search"), _step("create_booking")])
    fake.memory.steps.append(_step("ignored_prior_step"))
    adapter = SmolagentsAdapter(fake)
    history = [
        {"role": "user", "content": "first"},
        {"role": "assistant", "content": "ok"},
        {"role": "user", "content": "second"},
    ]

    _, tool_calls = await adapter(history)

    assert tool_calls == ["web_search", "create_booking"]


async def test_handles_steps_without_tool_calls():
    fake = _make_fake_agent(
        new_steps=[_step_no_tool_calls(), _step("create_booking"), _step_no_tool_calls()]
    )
    adapter = SmolagentsAdapter(fake)

    _, tool_calls = await adapter([{"role": "user", "content": "hi"}])

    assert tool_calls == ["create_booking"]

Step 2: Run tests to verify they fail

uv run pytest tests/test_smolagents_adapter.py::test_extracts_new_tool_calls_only tests/test_smolagents_adapter.py::test_handles_steps_without_tool_calls -v

Expected: both FAIL with AssertionError (current implementation returns [])

Step 3: Implement extraction

Replace the body of __call__ in src/pytest_llm_eval/adapters/smolagents.py with:

    async def __call__(self, history: list[dict[str, Any]]) -> tuple[str, list[str]]:
        """Run the agent against the latest user message and return (reply, tool_calls)."""
        user_msg = history[-1]["content"]
        reset = len(history) == 1
        prev = len(self._agent.memory.steps)
        result = await asyncio.to_thread(self._agent.run, user_msg, reset=reset)
        new_steps = self._agent.memory.steps[prev:] if not reset else self._agent.memory.steps
        tool_calls = [
            tc.name
            for step in new_steps
            for tc in getattr(step, "tool_calls", None) or []
        ]
        return str(result), tool_calls

Note the if not reset else self._agent.memory.steps branch: when reset=True the agent clears its own memory before adding new steps, so the pre-run snapshot index would be wrong — every step in memory.steps after the run is new.

Step 4: Run all adapter tests to verify

uv run pytest tests/test_smolagents_adapter.py -v

Expected: 5 PASS

Step 5: Commit

git add src/pytest_llm_eval/adapters/smolagents.py tests/test_smolagents_adapter.py
git commit -m "feat: extract tool calls from smolagents memory.steps"

Task 3: Filter smolagents-internal tool calls¶

Files: - Modify: tests/test_smolagents_adapter.py - Modify: src/pytest_llm_eval/adapters/smolagents.py

Step 1: Write the failing tests

Append to tests/test_smolagents_adapter.py:

async def test_filters_python_interpreter_and_final_answer_by_default():
    fake = _make_fake_agent(
        new_steps=[
            _step("python_interpreter"),
            _step("create_booking"),
            _step("final_answer"),
        ]
    )
    adapter = SmolagentsAdapter(fake)

    _, tool_calls = await adapter([{"role": "user", "content": "hi"}])

    assert tool_calls == ["create_booking"]


async def test_include_internal_tools_returns_them():
    fake = _make_fake_agent(
        new_steps=[
            _step("python_interpreter"),
            _step("create_booking"),
            _step("final_answer"),
        ]
    )
    adapter = SmolagentsAdapter(fake, include_internal_tools=True)

    _, tool_calls = await adapter([{"role": "user", "content": "hi"}])

    assert tool_calls == ["python_interpreter", "create_booking", "final_answer"]

Step 2: Run tests to verify they fail

uv run pytest tests/test_smolagents_adapter.py::test_filters_python_interpreter_and_final_answer_by_default tests/test_smolagents_adapter.py::test_include_internal_tools_returns_them -v

Expected: test_filters_python_interpreter_and_final_answer_by_default FAILs (returns the unfiltered list); test_include_internal_tools_returns_them PASSes (no filter applied yet)

Step 3: Add filter constants and apply them

Edit src/pytest_llm_eval/adapters/smolagents.py so the file reads:

"""Adapter for smolagents agents (ToolCallingAgent, CodeAgent, ...)."""

from __future__ import annotations

import asyncio
from typing import Any

_INTERNAL_TOOLS = frozenset({"python_interpreter", "final_answer"})


class SmolagentsAdapter:
    """Wrap a smolagents agent to conform to the agent callable contract.

    Duck-typed: works with any object exposing ``.run(task, reset=...)`` and
    ``.memory.steps``. Smolagents's sync ``run`` is offloaded with
    ``asyncio.to_thread`` so the event loop stays responsive.

    Args:
        agent: A smolagents agent (e.g. ``ToolCallingAgent``, ``CodeAgent``).
        include_internal_tools: When ``True``, smolagents-internal pseudo-tools
            (``python_interpreter``, ``final_answer``) are included in the
            returned tool-call list. Defaults to ``False``.

    Example:
        ```python
        from smolagents import ToolCallingAgent, InferenceClientModel
        from pytest_llm_eval.adapters.smolagents import SmolagentsAdapter

        model = InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct")
        agent = ToolCallingAgent(tools=[...], model=model)

        @pytest.fixture
        def llm_eval_agent():
            return SmolagentsAdapter(agent)
        ```
    """

    def __init__(self, agent: Any, *, include_internal_tools: bool = False) -> None:
        """Store the smolagents agent and the internal-tool filter setting."""
        self._agent = agent
        self._include_internal_tools = include_internal_tools

    async def __call__(self, history: list[dict[str, Any]]) -> tuple[str, list[str]]:
        """Run the agent against the latest user message and return (reply, tool_calls)."""
        user_msg = history[-1]["content"]
        reset = len(history) == 1
        prev = len(self._agent.memory.steps)
        result = await asyncio.to_thread(self._agent.run, user_msg, reset=reset)
        new_steps = self._agent.memory.steps[prev:] if not reset else self._agent.memory.steps
        names = [
            tc.name
            for step in new_steps
            for tc in getattr(step, "tool_calls", None) or []
        ]
        if not self._include_internal_tools:
            names = [n for n in names if n not in _INTERNAL_TOOLS]
        return str(result), names

Step 4: Run all adapter tests to verify

uv run pytest tests/test_smolagents_adapter.py -v

Expected: 7 PASS

Step 5: Run the full suite to confirm no regressions

uv run pytest tests/ -q

Expected: previous count + 7 = 73 passed (was 66 before this task)

Step 6: Run pre-commit

uv run pre-commit run --all-files

Expected: all hooks pass

Step 7: Commit

git add src/pytest_llm_eval/adapters/smolagents.py tests/test_smolagents_adapter.py
git commit -m "feat: filter smolagents-internal tool calls with opt-in flag"

Task 4: Register the `smolagents` optional extra¶

Files: - Modify: pyproject.toml

Step 1: Add the optional extra

Edit pyproject.toml so [project.optional-dependencies] reads:

[project.optional-dependencies]
langchain = ["langchain-core>=0.3"]
openai = ["openai>=1.0"]
smolagents = ["smolagents>=1.0"]
xdist = ["pytest-xdist>=3.0"]

Step 2: Confirm the extra resolves

uv pip install --dry-run -e ".[smolagents]" 2>&1 | tail -5

Expected: dry-run output lists smolagents (version ≥ 1.0) among the would-install packages, no resolution errors.

Step 3: Run pre-commit (catches TOML formatting issues)

uv run pre-commit run --all-files

Expected: all hooks pass

Step 4: Commit

git add pyproject.toml
git commit -m "feat: add smolagents optional dependency extra"

Task 5: Document the adapter¶

Files: - Modify: docs/adapters.md - Modify: docs/index.md - Modify: README.md

Step 1: Add the SmolagentsAdapter section to docs/adapters.md

Insert this block immediately after the existing ## OpenAIAdapter section (and before ## Writing a custom adapter):

## `SmolagentsAdapter`

Wraps a [smolagents](https://github.com/huggingface/smolagents) agent — `ToolCallingAgent`, `CodeAgent`, or any duck-typed agent exposing `.run()` and `.memory.steps`.

```python
from smolagents import ToolCallingAgent, InferenceClientModel
from pytest_llm_eval.adapters.smolagents import SmolagentsAdapter
import pytest

model = InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct")
agent = ToolCallingAgent(tools=[...], model=model)

@pytest.fixture
def llm_eval_agent():
    return SmolagentsAdapter(agent)
```

Install the optional extra for smolagents support:

=== "pip"

    ```bash
    pip install "pytest-llm-eval[smolagents]"
    ```

=== "uv"

    ```bash
    uv add "pytest-llm-eval[smolagents]"
    ```

The adapter offloads the sync `agent.run` to a worker thread with `asyncio.to_thread`. It detects the first turn of a transcript via `len(history) == 1` and passes `reset=True` so each transcript starts with fresh agent memory; subsequent turns pass `reset=False` to continue the conversation.

Tool-call names are collected from new entries in `agent.memory.steps`. Smolagents-internal pseudo-tools (`python_interpreter`, used by `CodeAgent`, and `final_answer`, the termination tool) are filtered by default — pass `include_internal_tools=True` to see them.

!!! note "CodeAgent and tool-call assertions"
    `CodeAgent` runs tools by executing generated Python; smolagents records only the `python_interpreter` step, not the inner tool calls. If you need fine-grained tool-call assertions with `ToolCallEvaluator`, use `ToolCallingAgent`.

**Constructor:**

| Parameter                | Type   | Default | Description                                                                  |
|--------------------------|--------|---------|------------------------------------------------------------------------------|
| `agent`                  | `Any`  | required | A smolagents agent (`ToolCallingAgent`, `CodeAgent`, or duck-typed equivalent) |
| `include_internal_tools` | `bool` | `False`  | When `True`, return `python_interpreter` and `final_answer` in tool calls    |

Step 2: Add a Smolagents tab to docs/index.md

In docs/index.md, find the === "OpenAI SDK" block inside the "Supported frameworks" section. Immediately after the closing of that block (and before === "Custom"), insert:

=== "Smolagents"

    === "pip"

        ```bash
        pip install "pytest-llm-eval[smolagents]"
        ```

    === "uv"

        ```bash
        uv add "pytest-llm-eval[smolagents]"
        ```

    ```python
    from smolagents import ToolCallingAgent, InferenceClientModel
    from pytest_llm_eval.adapters.smolagents import SmolagentsAdapter

    agent = ToolCallingAgent(tools=[...], model=InferenceClientModel(model_id="..."))

    @pytest.fixture
    def llm_eval_agent():
        return SmolagentsAdapter(agent)
    ```

Step 3: Update the framework table in README.md

Edit README.md to update the framework table. Find the existing table:

| Framework | Extra | Adapter |
|---|---|---|
| [pydantic-ai](https://ai.pydantic.dev/) | _(default)_ | `pytest_llm_eval.adapters.pydantic_ai.PydanticAIAdapter` |
| [LangChain / LangGraph](https://python.langchain.com/) | `langchain` | `pytest_llm_eval.adapters.langchain.LangChainAdapter` |
| [OpenAI SDK](https://github.com/openai/openai-python) | `openai` | `pytest_llm_eval.adapters.openai.OpenAIAdapter` |

Replace it with:

| Framework | Extra | Adapter |
|---|---|---|
| [pydantic-ai](https://ai.pydantic.dev/) | _(default)_ | `pytest_llm_eval.adapters.pydantic_ai.PydanticAIAdapter` |
| [LangChain / LangGraph](https://python.langchain.com/) | `langchain` | `pytest_llm_eval.adapters.langchain.LangChainAdapter` |
| [OpenAI SDK](https://github.com/openai/openai-python) | `openai` | `pytest_llm_eval.adapters.openai.OpenAIAdapter` |
| [smolagents](https://github.com/huggingface/smolagents) | `smolagents` | `pytest_llm_eval.adapters.smolagents.SmolagentsAdapter` |

Then find the install snippet right under that table:

pip install "pytest-llm-eval[langchain]"
pip install "pytest-llm-eval[openai]"
# or with uv:
uv add "pytest-llm-eval[langchain]"

Replace it with:

pip install "pytest-llm-eval[langchain]"
pip install "pytest-llm-eval[openai]"
pip install "pytest-llm-eval[smolagents]"
# or with uv:
uv add "pytest-llm-eval[langchain]"
uv add "pytest-llm-eval[smolagents]"

Step 4: Build the docs to verify rendering

uv run zensical build

Expected: Build finished with no errors.

Step 5: Run pre-commit

uv run pre-commit run --all-files

Expected: all hooks pass

Step 6: Run the full test suite one final time

uv run pytest tests/ -q

Expected: 73 passed, 0 failed

Step 7: Commit

git add docs/adapters.md docs/index.md README.md
git commit -m "docs: document SmolagentsAdapter and add to framework lists"

Smolagents Adapter Implementation Plan¶

File Structure¶

Task 1: Adapter shell with reset behaviour¶

Task 2: Tool-call extraction from memory.steps¶

Task 3: Filter smolagents-internal tool calls¶

Task 4: Register the smolagents optional extra¶

Task 5: Document the adapter¶

Task 2: Tool-call extraction from `memory.steps`¶

Task 4: Register the `smolagents` optional extra¶