Smolagents Adapter Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add a SmolagentsAdapter so users can drop a smolagents agent into pytest-llm-eval the same way they would a pydantic-ai, LangChain, or OpenAI agent.
Architecture: A duck-typed adapter wraps any object exposing .run(task, reset=...) and .memory.steps. It maps our async (history) -> (reply, tool_calls) contract onto smolagents's sync API by running agent.run in a worker thread. First-turn detection (len(history) == 1) drives reset=True/reset=False so multi-turn transcripts work. Tool calls are gathered from new entries in agent.memory.steps; smolagents internals (python_interpreter, final_answer) are filtered by default.
Tech Stack: Python 3.11+, asyncio.to_thread for thread offloading, pytest/pytest-asyncio for tests using a hand-rolled fake agent (no real smolagents dependency in the test path).
File Structure¶
- Create:
src/pytest_llm_eval/adapters/smolagents.py—SmolagentsAdapterclass with reset logic, tool-call extraction, and internal-tool filtering. - Create:
tests/test_smolagents_adapter.py— unit tests against a duck-typed fake agent built fromtypes.SimpleNamespace. - Modify:
pyproject.toml— addsmolagents = ["smolagents>=1.0"]optional extra. - Modify:
docs/adapters.md— add theSmolagentsAdaptersection with install tabs and constructor table. - Modify:
docs/index.md— add a "Smolagents" tab to the "Supported frameworks" tabbed block. - Modify:
README.md— add smolagents to the framework table and add a[smolagents]install line.
Task 1: Adapter shell with reset behaviour¶
Files:
- Create: tests/test_smolagents_adapter.py
- Create: src/pytest_llm_eval/adapters/smolagents.py
- Step 1: Write the failing tests
Create tests/test_smolagents_adapter.py:
import types
from typing import Any
import pytest
from pytest_llm_eval.adapters.smolagents import SmolagentsAdapter
def _make_fake_agent(reply: Any = "ok", new_steps: list[Any] | None = None):
"""Build a duck-typed fake smolagents agent that records `run` calls."""
fake = types.SimpleNamespace()
fake.memory = types.SimpleNamespace(steps=[])
fake.calls: list[tuple[str, bool]] = []
def run(task: str, reset: bool = True) -> Any:
fake.calls.append((task, reset))
if reset:
fake.memory.steps = []
for step in new_steps or []:
fake.memory.steps.append(step)
return reply
fake.run = run
return fake
async def test_first_turn_passes_reset_true():
fake = _make_fake_agent()
adapter = SmolagentsAdapter(fake)
history = [{"role": "user", "content": "hello"}]
await adapter(history)
assert fake.calls == [("hello", True)]
async def test_subsequent_turn_passes_reset_false():
fake = _make_fake_agent()
adapter = SmolagentsAdapter(fake)
history = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "hi there"},
{"role": "user", "content": "follow up"},
]
await adapter(history)
assert fake.calls == [("follow up", False)]
async def test_returns_reply_string():
fake = _make_fake_agent(reply=42)
adapter = SmolagentsAdapter(fake)
reply, _ = await adapter([{"role": "user", "content": "hi"}])
assert reply == "42"
- Step 2: Run tests to verify they fail
Expected: ModuleNotFoundError: No module named 'pytest_llm_eval.adapters.smolagents'
- Step 3: Implement the adapter shell
Create src/pytest_llm_eval/adapters/smolagents.py:
"""Adapter for smolagents agents (ToolCallingAgent, CodeAgent, ...)."""
from __future__ import annotations
import asyncio
from typing import Any
class SmolagentsAdapter:
"""Wrap a smolagents agent to conform to the agent callable contract.
Duck-typed: works with any object exposing ``.run(task, reset=...)`` and
``.memory.steps``. Smolagents's sync ``run`` is offloaded with
``asyncio.to_thread`` so the event loop stays responsive.
Args:
agent: A smolagents agent (e.g. ``ToolCallingAgent``, ``CodeAgent``).
include_internal_tools: When ``True``, smolagents-internal pseudo-tools
(``python_interpreter``, ``final_answer``) are included in the
returned tool-call list. Defaults to ``False``.
Example:
```python
from smolagents import ToolCallingAgent, InferenceClientModel
from pytest_llm_eval.adapters.smolagents import SmolagentsAdapter
model = InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct")
agent = ToolCallingAgent(tools=[...], model=model)
@pytest.fixture
def llm_eval_agent():
return SmolagentsAdapter(agent)
```
"""
def __init__(self, agent: Any, *, include_internal_tools: bool = False) -> None:
"""Store the smolagents agent and the internal-tool filter setting."""
self._agent = agent
self._include_internal_tools = include_internal_tools
async def __call__(self, history: list[dict[str, Any]]) -> tuple[str, list[str]]:
"""Run the agent against the latest user message and return (reply, tool_calls)."""
user_msg = history[-1]["content"]
reset = len(history) == 1
result = await asyncio.to_thread(self._agent.run, user_msg, reset=reset)
return str(result), []
- Step 4: Run tests to verify they pass
Expected: 3 PASS
- Step 5: Commit
git add src/pytest_llm_eval/adapters/smolagents.py tests/test_smolagents_adapter.py
git commit -m "feat: add SmolagentsAdapter shell with reset detection"
Task 2: Tool-call extraction from memory.steps¶
Files:
- Modify: tests/test_smolagents_adapter.py
- Modify: src/pytest_llm_eval/adapters/smolagents.py
- Step 1: Write the failing tests
Append to tests/test_smolagents_adapter.py:
def _step(*tool_call_names: str) -> Any:
"""Build a fake step with a `.tool_calls` list of objects exposing `.name`."""
return types.SimpleNamespace(
tool_calls=[types.SimpleNamespace(name=n) for n in tool_call_names]
)
def _step_no_tool_calls() -> Any:
"""Build a fake step that has no `tool_calls` attribute (e.g. a planning step)."""
return types.SimpleNamespace()
async def test_extracts_new_tool_calls_only():
fake = _make_fake_agent(new_steps=[_step("web_search"), _step("create_booking")])
fake.memory.steps.append(_step("ignored_prior_step"))
adapter = SmolagentsAdapter(fake)
history = [
{"role": "user", "content": "first"},
{"role": "assistant", "content": "ok"},
{"role": "user", "content": "second"},
]
_, tool_calls = await adapter(history)
assert tool_calls == ["web_search", "create_booking"]
async def test_handles_steps_without_tool_calls():
fake = _make_fake_agent(
new_steps=[_step_no_tool_calls(), _step("create_booking"), _step_no_tool_calls()]
)
adapter = SmolagentsAdapter(fake)
_, tool_calls = await adapter([{"role": "user", "content": "hi"}])
assert tool_calls == ["create_booking"]
- Step 2: Run tests to verify they fail
uv run pytest tests/test_smolagents_adapter.py::test_extracts_new_tool_calls_only tests/test_smolagents_adapter.py::test_handles_steps_without_tool_calls -v
Expected: both FAIL with AssertionError (current implementation returns [])
- Step 3: Implement extraction
Replace the body of __call__ in src/pytest_llm_eval/adapters/smolagents.py with:
async def __call__(self, history: list[dict[str, Any]]) -> tuple[str, list[str]]:
"""Run the agent against the latest user message and return (reply, tool_calls)."""
user_msg = history[-1]["content"]
reset = len(history) == 1
prev = len(self._agent.memory.steps)
result = await asyncio.to_thread(self._agent.run, user_msg, reset=reset)
new_steps = self._agent.memory.steps[prev:] if not reset else self._agent.memory.steps
tool_calls = [
tc.name
for step in new_steps
for tc in getattr(step, "tool_calls", None) or []
]
return str(result), tool_calls
Note the if not reset else self._agent.memory.steps branch: when reset=True the agent clears its own memory before adding new steps, so the pre-run snapshot index would be wrong — every step in memory.steps after the run is new.
- Step 4: Run all adapter tests to verify
Expected: 5 PASS
- Step 5: Commit
git add src/pytest_llm_eval/adapters/smolagents.py tests/test_smolagents_adapter.py
git commit -m "feat: extract tool calls from smolagents memory.steps"
Task 3: Filter smolagents-internal tool calls¶
Files:
- Modify: tests/test_smolagents_adapter.py
- Modify: src/pytest_llm_eval/adapters/smolagents.py
- Step 1: Write the failing tests
Append to tests/test_smolagents_adapter.py:
async def test_filters_python_interpreter_and_final_answer_by_default():
fake = _make_fake_agent(
new_steps=[
_step("python_interpreter"),
_step("create_booking"),
_step("final_answer"),
]
)
adapter = SmolagentsAdapter(fake)
_, tool_calls = await adapter([{"role": "user", "content": "hi"}])
assert tool_calls == ["create_booking"]
async def test_include_internal_tools_returns_them():
fake = _make_fake_agent(
new_steps=[
_step("python_interpreter"),
_step("create_booking"),
_step("final_answer"),
]
)
adapter = SmolagentsAdapter(fake, include_internal_tools=True)
_, tool_calls = await adapter([{"role": "user", "content": "hi"}])
assert tool_calls == ["python_interpreter", "create_booking", "final_answer"]
- Step 2: Run tests to verify they fail
uv run pytest tests/test_smolagents_adapter.py::test_filters_python_interpreter_and_final_answer_by_default tests/test_smolagents_adapter.py::test_include_internal_tools_returns_them -v
Expected: test_filters_python_interpreter_and_final_answer_by_default FAILs (returns the unfiltered list); test_include_internal_tools_returns_them PASSes (no filter applied yet)
- Step 3: Add filter constants and apply them
Edit src/pytest_llm_eval/adapters/smolagents.py so the file reads:
"""Adapter for smolagents agents (ToolCallingAgent, CodeAgent, ...)."""
from __future__ import annotations
import asyncio
from typing import Any
_INTERNAL_TOOLS = frozenset({"python_interpreter", "final_answer"})
class SmolagentsAdapter:
"""Wrap a smolagents agent to conform to the agent callable contract.
Duck-typed: works with any object exposing ``.run(task, reset=...)`` and
``.memory.steps``. Smolagents's sync ``run`` is offloaded with
``asyncio.to_thread`` so the event loop stays responsive.
Args:
agent: A smolagents agent (e.g. ``ToolCallingAgent``, ``CodeAgent``).
include_internal_tools: When ``True``, smolagents-internal pseudo-tools
(``python_interpreter``, ``final_answer``) are included in the
returned tool-call list. Defaults to ``False``.
Example:
```python
from smolagents import ToolCallingAgent, InferenceClientModel
from pytest_llm_eval.adapters.smolagents import SmolagentsAdapter
model = InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct")
agent = ToolCallingAgent(tools=[...], model=model)
@pytest.fixture
def llm_eval_agent():
return SmolagentsAdapter(agent)
```
"""
def __init__(self, agent: Any, *, include_internal_tools: bool = False) -> None:
"""Store the smolagents agent and the internal-tool filter setting."""
self._agent = agent
self._include_internal_tools = include_internal_tools
async def __call__(self, history: list[dict[str, Any]]) -> tuple[str, list[str]]:
"""Run the agent against the latest user message and return (reply, tool_calls)."""
user_msg = history[-1]["content"]
reset = len(history) == 1
prev = len(self._agent.memory.steps)
result = await asyncio.to_thread(self._agent.run, user_msg, reset=reset)
new_steps = self._agent.memory.steps[prev:] if not reset else self._agent.memory.steps
names = [
tc.name
for step in new_steps
for tc in getattr(step, "tool_calls", None) or []
]
if not self._include_internal_tools:
names = [n for n in names if n not in _INTERNAL_TOOLS]
return str(result), names
- Step 4: Run all adapter tests to verify
Expected: 7 PASS
- Step 5: Run the full suite to confirm no regressions
Expected: previous count + 7 = 73 passed (was 66 before this task)
- Step 6: Run pre-commit
Expected: all hooks pass
- Step 7: Commit
git add src/pytest_llm_eval/adapters/smolagents.py tests/test_smolagents_adapter.py
git commit -m "feat: filter smolagents-internal tool calls with opt-in flag"
Task 4: Register the smolagents optional extra¶
Files:
- Modify: pyproject.toml
- Step 1: Add the optional extra
Edit pyproject.toml so [project.optional-dependencies] reads:
[project.optional-dependencies]
langchain = ["langchain-core>=0.3"]
openai = ["openai>=1.0"]
smolagents = ["smolagents>=1.0"]
xdist = ["pytest-xdist>=3.0"]
- Step 2: Confirm the extra resolves
Expected: dry-run output lists smolagents (version ≥ 1.0) among the would-install packages, no resolution errors.
- Step 3: Run pre-commit (catches TOML formatting issues)
Expected: all hooks pass
- Step 4: Commit
Task 5: Document the adapter¶
Files:
- Modify: docs/adapters.md
- Modify: docs/index.md
- Modify: README.md
- Step 1: Add the SmolagentsAdapter section to
docs/adapters.md
Insert this block immediately after the existing ## OpenAIAdapter section (and before ## Writing a custom adapter):
## `SmolagentsAdapter`
Wraps a [smolagents](https://github.com/huggingface/smolagents) agent — `ToolCallingAgent`, `CodeAgent`, or any duck-typed agent exposing `.run()` and `.memory.steps`.
```python
from smolagents import ToolCallingAgent, InferenceClientModel
from pytest_llm_eval.adapters.smolagents import SmolagentsAdapter
import pytest
model = InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct")
agent = ToolCallingAgent(tools=[...], model=model)
@pytest.fixture
def llm_eval_agent():
return SmolagentsAdapter(agent)
```
Install the optional extra for smolagents support:
=== "pip"
```bash
pip install "pytest-llm-eval[smolagents]"
```
=== "uv"
```bash
uv add "pytest-llm-eval[smolagents]"
```
The adapter offloads the sync `agent.run` to a worker thread with `asyncio.to_thread`. It detects the first turn of a transcript via `len(history) == 1` and passes `reset=True` so each transcript starts with fresh agent memory; subsequent turns pass `reset=False` to continue the conversation.
Tool-call names are collected from new entries in `agent.memory.steps`. Smolagents-internal pseudo-tools (`python_interpreter`, used by `CodeAgent`, and `final_answer`, the termination tool) are filtered by default — pass `include_internal_tools=True` to see them.
!!! note "CodeAgent and tool-call assertions"
`CodeAgent` runs tools by executing generated Python; smolagents records only the `python_interpreter` step, not the inner tool calls. If you need fine-grained tool-call assertions with `ToolCallEvaluator`, use `ToolCallingAgent`.
**Constructor:**
| Parameter | Type | Default | Description |
|--------------------------|--------|---------|------------------------------------------------------------------------------|
| `agent` | `Any` | required | A smolagents agent (`ToolCallingAgent`, `CodeAgent`, or duck-typed equivalent) |
| `include_internal_tools` | `bool` | `False` | When `True`, return `python_interpreter` and `final_answer` in tool calls |
- Step 2: Add a Smolagents tab to
docs/index.md
In docs/index.md, find the === "OpenAI SDK" block inside the "Supported frameworks" section. Immediately after the closing of that block (and before === "Custom"), insert:
=== "Smolagents"
=== "pip"
```bash
pip install "pytest-llm-eval[smolagents]"
```
=== "uv"
```bash
uv add "pytest-llm-eval[smolagents]"
```
```python
from smolagents import ToolCallingAgent, InferenceClientModel
from pytest_llm_eval.adapters.smolagents import SmolagentsAdapter
agent = ToolCallingAgent(tools=[...], model=InferenceClientModel(model_id="..."))
@pytest.fixture
def llm_eval_agent():
return SmolagentsAdapter(agent)
```
- Step 3: Update the framework table in
README.md
Edit README.md to update the framework table. Find the existing table:
| Framework | Extra | Adapter |
|---|---|---|
| [pydantic-ai](https://ai.pydantic.dev/) | _(default)_ | `pytest_llm_eval.adapters.pydantic_ai.PydanticAIAdapter` |
| [LangChain / LangGraph](https://python.langchain.com/) | `langchain` | `pytest_llm_eval.adapters.langchain.LangChainAdapter` |
| [OpenAI SDK](https://github.com/openai/openai-python) | `openai` | `pytest_llm_eval.adapters.openai.OpenAIAdapter` |
Replace it with:
| Framework | Extra | Adapter |
|---|---|---|
| [pydantic-ai](https://ai.pydantic.dev/) | _(default)_ | `pytest_llm_eval.adapters.pydantic_ai.PydanticAIAdapter` |
| [LangChain / LangGraph](https://python.langchain.com/) | `langchain` | `pytest_llm_eval.adapters.langchain.LangChainAdapter` |
| [OpenAI SDK](https://github.com/openai/openai-python) | `openai` | `pytest_llm_eval.adapters.openai.OpenAIAdapter` |
| [smolagents](https://github.com/huggingface/smolagents) | `smolagents` | `pytest_llm_eval.adapters.smolagents.SmolagentsAdapter` |
Then find the install snippet right under that table:
pip install "pytest-llm-eval[langchain]"
pip install "pytest-llm-eval[openai]"
# or with uv:
uv add "pytest-llm-eval[langchain]"
Replace it with:
pip install "pytest-llm-eval[langchain]"
pip install "pytest-llm-eval[openai]"
pip install "pytest-llm-eval[smolagents]"
# or with uv:
uv add "pytest-llm-eval[langchain]"
uv add "pytest-llm-eval[smolagents]"
- Step 4: Build the docs to verify rendering
Expected: Build finished with no errors.
- Step 5: Run pre-commit
Expected: all hooks pass
- Step 6: Run the full test suite one final time
Expected: 73 passed, 0 failed
- Step 7: Commit