The three loop topologies (ReAct, plan-and-execute, orchestrator/subagent), why your agent goes in circles, and what Runner.run is actually doing for you.
The first time I shipped an agent into something that resembled production, it called the same tool 17 times. Same name, slightly different arguments each call. The token meter was a slot machine. The Teams channel filled up with retries before anyone noticed. The final answer, when it eventually arrived, was the same answer the second tool call had already produced.
A week earlier I had watched the opposite failure with the opposite agent: one tool call, then a confident final answer, even though the task obviously needed three more. Same missing thing, both times. Neither agent had a working loop.
This post is about that loop. What it is, why your agent sometimes goes in circles inside it, and what your SDK is actually doing for you when it hides the loop behind something called Runner.run.
TL;DR
- The loop is the agent. Tools are inert until something keeps calling the model with the latest tool result.
- Three shapes cover almost everything: ReAct (think, act, observe, decide), Plan-and-Execute (planner emits steps, executor walks them), and Orchestrator/Subagent (one agent dispatches to specialists). The first one is the default; the others are bolted on top.
- SDKs hide the loop inside
Runner.runor its equivalent. Bugs live where it’s hidden.- Cap iterations, detect repeats, and decide your termination condition before you ship. The agent that won’t stop is more expensive than the agent that stops early.
Three failure modes, in the order most people hit them.
It doesn’t know when to stop. The model keeps calling tools because nothing told it what “done” looks like. Each tool result reveals something new, the system prompt is silent on termination, and there is no iteration cap on the outside, so the loop runs until something else breaks: a rate limit, a budget alert, a tired developer hitting Ctrl-C. This is the failure that ReAct’s explicit while iter < MAX and final-answer detection were designed to fix. Without them, every agent is one verbose tool result away from a runaway.
It can’t see the whole task at once. For multi-step work (triage every failing test in a repo, write a coverage report across a directory, walk a long checklist), the agent loses the early sub-goals once the context fills with intermediate tool output. It finishes the steps it can still see and stops, leaving the rest undone, and from outside it looks like the agent gave up. The fix is to materialise the plan somewhere durable. Plan-and-Execute does this the simplest way it can: a planner agent produces a numbered list, and an executor walks it. The plan is the memory the loop didn’t have.
One agent shouldn’t do everything. When the same agent has tools for shell, web search, SQL, and a calendar, the prompt it needs to reason well about all four becomes a brick wall. The model gets worse at every individual tool because the instructions for each one are competing for attention. The fix is to split the work. An orchestrator that knows how to dispatch, and specialist subagents that each see only the tools they need. We will sketch the shape here and build it properly in Post 7.
flowchart TB
classDef model fill:#f3f4f6,stroke:#374151,color:#111827;
classDef act fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a;
classDef tool fill:#fce7f3,stroke:#be185d,color:#831843;
classDef done fill:#dcfce7,stroke:#15803d,color:#14532d;
subgraph A["ReAct"]
A1((Model))
A2["Tool call"]
A3["Observation"]
A4{Done?}
A5([Answer])
A1 --> A2 --> A3 --> A4
A4 -- no --> A1
A4 -- yes --> A5
class A1 model
class A2 act
class A3 tool
class A5 done
end
subgraph B["Plan-and-Execute"]
B1((Planner))
B2["Step 1
Step 2
Step 3"]
B3((Executor))
B4([Answer])
B1 --> B2 --> B3 --> B4
B3 -. per step .-> B2
class B1 model
class B3 model
class B2 act
class B4 done
end
subgraph C["Orchestrator / Subagent"]
C1((Orchestrator))
C2((Subagent A))
C3((Subagent B))
C4["Aggregate"]
C5([Answer])
C1 --> C2 --> C4
C1 --> C3 --> C4
C4 --> C5
class C1 model
class C2 model
class C3 model
class C4 act
class C5 done
end
A ~~~ B
B ~~~ C
ReAct on the top is the loop you already have. The model decides each next step from what it just saw. Plan-and-Execute in the middle pushes the decisions to the front: a planner produces the list, an executor walks it. Orchestrator/Subagent on the bottom separates dispatch from work: one model decides who handles what, several models do the parts they each see clearly. We will build the first two and tease the third; An upcoming post in the series takes orchestration apart properly.
Strip the SDKs away and the loop is small. One function per turn, plus a while wrapper with a budget. The whole runnable file is at raw_react.py
; the agent it drives investigates a small seeded sample_repo/ (three deliberately-failing pytests against a buggy Calculator) so the loop has something real to chew on.
Here is one turn: call the model with the conversation so far and the tool spec; if the response has tool calls, dispatch them and append the results; otherwise the response is your final answer.
def main() -> int:
client = OpenAI()
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "Triage the failing tests in sample_repo/."},
]
last_action = None
for i in range(MAX_ITERS):
resp = client.chat.completions.create(
model=MODEL, messages=messages, tools=TOOLS,
)
msg = resp.choices[0].message
if not msg.tool_calls:
print(f"iter {i}: final answer\n")
print(msg.content)
return 0
...
The for i in range(MAX_ITERS) is the budget; without it the agent has no upper bound on cost. The if not msg.tool_calls: break is the termination condition; without it the loop keeps calling the model after the model has already finished. And the messages.append(msg) (inside the dispatch branch, shown next) is what makes the model’s next call see the result of the previous tool, which is what makes this a loop and not a sequence of unrelated turns.
The dispatch branch fills in the rest:
messages.append(msg)
for call in msg.tool_calls:
args = json.loads(call.function.arguments or "{}")
action = f"{call.function.name}({args})"
print(f"iter {i}: calling {action}")
if action == last_action:
print(f"iter {i}: stuck (same call as last iter); breaking")
return 0
last_action = action
messages.append(
{
"role": "tool",
"tool_call_id": call.id,
"content": dispatch(call.function.name, args),
}
)
print("(stopped: hit MAX_ITERS without a final answer)")
return 0
The stuck detector is one extra line and it earns its keep on the first runaway you hit.
If the model called run_pytest() last iteration and is calling run_pytest() again with no new context, nothing useful is going to happen on iteration N+1, so break out. Real production guards are smarter (compare structured args, look at last K calls, surface to a human), but the one-liner stops the bleeding.
Grab the companion bundle: agentic-loops-why-your-agent-goes-in-circles.zip
(raw_react.py, triage.py, the seeded sample_repo/, and a pyproject.toml). Unzip, uv sync, set your key, and watch the loop:
uv run python raw_react.py
Trimmed stdout from a real run:
iter 0: calling run_pytest({})
iter 1: calling read_file({'path': 'tests/test_calculator.py'})
iter 1: calling read_file({'path': 'calculator.py'})
iter 2: final answer
Summary of failing tests and root causes
Failures observed:
- tests/test_calculator.py::test_add_returns_sum
- tests/test_calculator.py::test_subtract_returns_difference
- tests/test_calculator.py::test_divide_by_zero_returns_inf
Root cause clusters:
1) Off-by-one errors in Calculator.add and Calculator.subtract (return a + b + 1 and a - b + 1)
2) Missing zero-division handling in Calculator.divide (raises ZeroDivisionError instead of returning math.inf)
The two iter 1 lines aren’t a bug. The model returned both read_file calls in a single response (parallel tool calls), and the dispatcher prints one line per call within the iteration. Each loop iteration is one model call; a model call can carry several tool calls.
A few lines of stdout, and the loop is visible. The agent decided what to do, did it, looked at the result, decided again, decided it was done, returned an answer. SDKs hide every line of this. Once you have read it, you can read the SDK source.
The SDK does the loop for you. You declare an Agent with its tools, hand it the user’s input, and Runner.run walks the iterations until the model returns a final answer (or hits max_turns). Two things to set deliberately:
max_turns=10 so the loop has a budget you control,Both go into triage.py, the same agent that drives the rest of this post:
REACT_PROMPT = """You investigate failing pytests in sample_repo/.
Use run_pytest to see the failures, read_file to inspect each failing
test and the implementation it imports, then cluster the failures by
root cause. When you have clustered all failing tests and produced a
summary, stop.
"""
async def run_react(user_input: str) -> str:
agent = Agent(
name="triage",
instructions=REACT_PROMPT,
tools=[run_pytest, read_file],
model=MODEL,
)
result = await Runner.run(agent, user_input, max_turns=MAX_TURNS)
return result.final_output
The full file is at triage.py
. Install and run:
uv sync
uv run python triage.py --mode react
Trimmed transcript from a real run:
triage> Here's the triage of failing tests, clustered by root cause:
Cluster 1: Off-by-one in arithmetic methods
- Affected tests:
- tests/test_calculator.py::test_add_returns_sum (got 6 instead of 5)
- tests/test_calculator.py::test_subtract_returns_difference (got 3 instead of 2)
- Root cause:
- sample_repo/calculator.py
- add returns a + b + 1
- subtract returns a - b + 1
Cluster 2: No zero-division handling in divide
- Affected test:
- tests/test_calculator.py::test_divide_by_zero_returns_inf (raised ZeroDivisionError)
- Root cause:
- sample_repo/calculator.py
- divide directly performs a / b without handling b == 0; test expects math.inf when dividing by zero
One Runner.run call, several iterations behind the scenes. That is the same loop you read in raw_react.py ten minutes ago, with the bookkeeping (message list, tool dispatch, termination check) moved inside the SDK.
You don’t have to take that on faith. Here is the actual while from openai-agents 0.17.0, in src/agents/run.py:
while True: # line 757
# ... run guardrails, call model, process tools ...
if isinstance(turn_result.next_step, NextStepFinalOutput): # line 950
# ... run output guardrails, build RunResult ...
return _finalize_result(result) # line 999
# ...
current_turn += 1 # line 1046
if max_turns is not None and current_turn > max_turns: # line 1047
max_turns_error = MaxTurnsExceeded(
f"Max turns ({max_turns}) exceeded"
)
raise max_turns_error # line 1070
Same shape, same while, same max_turns check, same break-on-final-output. The SDK adds plumbing (guardrails, session management, hooks, retries), but the loop at the centre is the one you would write yourself if you had not installed the SDK.
ReAct decides each step while the loop is running. Plan-and-Execute pushes the decisions to the front: a planner agent produces a numbered list, an executor walks it. The plan is the part of the agent’s memory that does not get overwritten by tool output, which is exactly the failure mode it was invented to fix.
The wiring is two agents and one driver:
PLANNER_PROMPT = """You are a planner. Given the user's task, output a
short numbered list of 3-5 steps that an executor agent will walk in
order. Each step is one sentence and names which tool to use
(run_pytest or read_file). Output only the list."""
EXECUTOR_PROMPT = """You are the executor. Walk one step of a plan.
Use run_pytest or read_file as the step requires. Reply in one short
paragraph summarising what the step found."""
async def run_plan(user_input):
planner = Agent(name="planner", instructions=PLANNER_PROMPT, model=MODEL)
executor = Agent(
name="executor",
instructions=EXECUTOR_PROMPT,
tools=[run_pytest, read_file],
model=MODEL,
)
plan = await Runner.run(planner, user_input, max_turns=2)
steps = [
line.strip().lstrip("0123456789.- )")
for line in plan.final_output.splitlines()
if line.strip() and any(c.isalpha() for c in line)
]
notes = []
for i, step in enumerate(steps, 1):
prompt = f"Original task: {user_input}\nThis step: {step}\nDo it."
out = await Runner.run(executor, prompt, max_turns=8)
notes.append(f"step {i}: {out.final_output}")
return "\n\n".join(notes)
The list comprehension strips numbering off each line and skips blanks. Print statements that surface the parsed plan to stdout are elided here for brevity; see triage.py for the full driver.
Run it and you see the plan first, then the steps:
uv run python triage.py --mode plan
plan (4 steps):
1. Run the test suite to see which tests fail and capture error messages and stack traces (run_pytest).
2. Open each failing test file and lines referenced in the failures to understand the assertions and expected behavior (read_file).
3. Open the source files referenced in the stack traces to locate the code paths causing the failures (read_file).
4. Re-run the test suite to verify insights and see if additional/contextual failures appear (run_pytest).
executing step 1: Run the test suite to see which tests fail and capture error messages and stack traces (run_pytest).
... (steps 2, 3, 4 elided)
triage> step 1: Pytest ran and 3 tests failed in sample_repo/tests/test_calculator.py: test_add_returns_sum expected 5 but got 6 from Calculator.add(2, 3); test_subtract_returns_difference expected 2 but got 3 from Calculator.subtract(5, 3); and test_divide_by_zero_returns_inf raised ZeroDivisionError at calculator.py:9 (return a / b) when dividing by zero instead of returning math.inf.
step 3: Pytest shows 3 failures from tests/test_calculator.py pointing into calculator.py: add returns 6 (adds +1), subtract returns 3 (adds +1), and divide raises ZeroDivisionError on divide(1,0). Opening the files confirms that add and subtract incorrectly add 1, and divide directly returns a / b without zero handling; tests expect exact sums/differences and divide-by-zero to return math.inf.
(step 2 and step 4 elided for brevity)
When this beats ReAct: the steps are predictable, you want an audit trail, or you need a human approval gate between “we have a plan” and “we are doing the plan.” When it doesn’t: step 2’s output doesn’t match what step 3 expects, and the executor has no way to notice. Plan drift is the failure mode here, and we will name it in the footgun section. Multi-agent topologies generalise this further (one orchestrator, several specialist subagents, a dispatch policy); Post 7 takes that apart properly.
| SDK | Where the loop lives | How you bound it |
|---|---|---|
| OpenAI Agents SDK | Inside Runner.run |
max_turns= |
| Anthropic Messages API | You write the loop yourself | Your own iteration counter |
| Google ADK | Inside the runner | Step limit on the runner |
Three different vocabularies, one shape. Append the model’s tool calls to the message list, run the tools, append the results, call the model again, stop when the model returns a final answer or you hit your cap. Whether the SDK draws the box around Runner.run, hands you a runner with a different name, or hands you nothing at all and asks you to call client.messages.create in a while, the work underneath is the same.
Useful starting points: the OpenAI Agents SDK Running guide , Anthropic’s tool-use guide (you write the loop), and Google ADK’s runtime docs .
Three things to keep on a sticky note next to the ones from Posts 3 and 4.
No iteration budget is a runaway cost. Every loop needs a cap. Set max_turns (or whatever your SDK calls it). When the cap fires, raise loudly; do not silently truncate the conversation and pretend the agent finished. The agent that hit the cap has not finished, and a quietly truncated answer is worse than a visible failure because nobody knows to look.
Same observation, same action. The model re-tries an identical call when nothing changed because it has no memory of what it just tried. The cheap fix is the one-line stuck detector from raw_react.py: compare the current action to the previous one, break out if they match. The thoughtful fix compares against the last K calls or against a structural hash of the args. The careful fix surfaces the loop to a human. Pick one before you ship; do not let “it’ll figure it out” be the policy.
Plan drift. In Plan-and-Execute, step N’s output may not match what step N+1 expects, and the executor walks past the mismatch silently. The plan said “for each failing test, read the test and the implementation it imports.” Step 1 (run pytest) hit an import error before any of the tests ran, so the failure list is empty. Step 2 reads an empty list and keeps going. The executor reports “no failing tests found” and the agent confidently concludes the suite is healthy. Either validate between steps or let the executor escalate to the planner when a step’s output looks wrong.
We will go deeper on guardrails for runaway agents, escalation, and human-in-the-loop in Post 8. Production observability of loops (per-iteration token cost, retry budgets, kill switches) is Post 10’s territory. For now: cap your turns, detect repeats, and make sure the agent that hit the cap is the agent the operator can see.
The next post pulls everything from all of the previous posts into a single tutorial. One agent, end to end, built with an SDK and understood without one. Tools, memory, the loop, and the small-but-load-bearing decisions about prompt, model choice, and termination that turn a working demo into something you would actually trust with a task.
If you have a runaway-agent story that bit you in production, write to me at sumit at allthingsagentic dot org . The next posts’ examples lean toward what people are actually struggling with.