ReAct vs. CodeAct: Two Core Approaches to Agent Tool Calling

Introduction: Why Agents Need Tool Calling

Although large language models possess powerful reasoning capabilities, without external tool support, they cannot perform practical tasks like searching the web or manipulating local files. Therefore, how to make LLMs "know which tool to call" and "successfully call it" has become the central challenge in building Agents.

This article provides an in-depth analysis of two mainstream Agent tool-calling architectures — ReAct and CodeAct — covering everything from paper principles to hands-on code, helping you understand their design philosophies, trade-offs, and real-world use cases.

ReAct and CodeAct Overview

ReAct: The Alternating Cycle of Reasoning and Action

Core Idea: Reasoning + Acting

ReAct was published in October 2022, with its name derived from the combination of Reasoning and Acting. Its key insight is that reasoning alone or acting alone each has obvious shortcomings:

Pure reasoning models (e.g., Chain of Thought): Can think but cannot access external tools, potentially reaching wrong conclusions due to gaps in training data
Pure action models (e.g., early WebGPT): Can call tools but lack reasoning ability, unable to connect search results together
ReAct's solution: Combine both — the model first reasons (Thought) → then executes an action (Action) → receives an observation (Observation) → and repeats the cycle

The Chain of Thought (CoT) mentioned here is a prompting technique proposed by Google in January 2022. By adding guiding phrases like "Let's think step by step" to prompts, it encourages the model to break complex problems into intermediate reasoning steps. CoT significantly improved model performance on tasks like mathematical reasoning and logical judgment, but its fundamental limitation is that all reasoning happens internally — it can only leverage knowledge seen during training and cannot access real-time information or perform external operations. ReAct builds on CoT by adding the critical capability of "interacting with the external world."

ReAct's Workflow

Each round of ReAct consists of three parts:

Thought: The model's thinking process, determining which tool is needed
Action: The actual external tool call
Observation: The result returned by the tool, serving as context input for the next round

A classic example from the paper involves querying "What other devices can Apple Remote control?" The model first searches for Apple Remote, discovers it was designed for Front Row software, then searches for Front Row, and ultimately finds the answer: "keyboard function keys." The entire process demonstrates the alternation between reasoning and tool calling.

Key Technical Details: From Prompt Engineering to Function Calling

You might not have noticed that ReAct is essentially a prompt engineering technique that doesn't require model fine-tuning. When it was proposed in 2022, OpenAI's Function Calling hadn't been released yet (Function Calling launched in June 2023), so the original implementation relied entirely on:

Constraining model output format through prompts
Parsing output strings to separate Thought and Action
Executing the corresponding tool based on parsed results

The downside of this approach is that it demands strong instruction-following ability from the model, and parsing success rates drop when output formats are unstable.

Modern ReAct Implementation: Combined with Function Calling

Function Calling is a capability OpenAI introduced in June 2023 alongside the GPT-3.5/GPT-4 API. It allows developers to define available tools in API requests using JSON Schema (function names, parameter types, descriptions, etc.). When generating responses, the model determines whether a tool call is needed, and if so, outputs structured JSON parameters instead of natural language text. This fundamentally solved the instability of early ReAct's reliance on string parsing — the output format is guaranteed at the API level, and developers no longer need to use regular expressions to "guess" which tool the model wants to call. Today, virtually all major LLM APIs (Claude, Gemini, Qwen, etc.) support similar tool-calling mechanisms.

In practice, you can combine ReAct with modern Function Calling: retain the model's ability to output Thoughts, but delegate the Action portion to Function Calling's structured JSON output. The core implementation logic is as follows:

Wrap the OpenAI API, passing in tool definitions (including name, description, parameter)
Call the LLM in the Agent's main loop
Check finish_reason: if it's stop, terminate; if it's tool_calls, execute the tool
Add the tool execution result as an Observation to the context and enter the next loop iteration

In testing, when asked to calculate the complex expression (48÷6) + (15-9)×5 + 12, Gemini 2.0 sequentially calls addition, subtraction, multiplication, and division tools, ultimately arriving at the correct answer of 65. But the problem is obvious — too many calls, since each step depends on the result of the previous one.

CodeAct: Replacing Tool Calls with Code Execution

Design Motivation: Solving ReAct's Three Pain Points

CodeAct was proposed by Apple's machine learning research team. Its core idea is: since LLMs are good at writing code, why not let them directly output code and execute it in a sandbox?

CodeAct specifically addresses three pain points of ReAct:

Limited tool set: ReAct requires all tools to be predefined — even simple math operations need to be wrapped as tools. Programming languages come with rich standard libraries built in
Poor composability: ReAct cannot nest multiple tools in a single call and must execute step by step. Code can freely nest function calls
Limited expressiveness: Code can implement loops, conditional logic, and other complex patterns, far exceeding the fixed function-call paradigm

The Relationship Between CodeAct and Manus

The driving force behind the CodeAct paper is a co-founder of Manus (a globally renowned AI Agent product). Before the MCP protocol emerged, Manus used the CodeAct approach for tool calling, which speaks to CodeAct's practical value in industry.

How CodeAct Works

CodeAct's architecture replaces ReAct's "tool calling" with "sandbox code execution":

The System Prompt instructs the model to output JavaScript/Python code blocks
The model outputs results via console.log (or Python's print)
Code is executed in a sandbox environment, with output returned as Observations
If the code throws an error, the error message is also passed back as context, and the model self-corrects

Key Considerations for Sandbox Implementation

Code execution security is the core challenge CodeAct faces in production environments. A sandbox refers to a restricted execution environment isolated from the host system, preventing malicious or erroneous code from causing damage. In practice, common sandbox solutions include: Docker container isolation (e.g., E2B, Modal, and other cloud services), WebAssembly runtimes (e.g., Wasmer), and restricted interpreter environments (e.g., Python's RestrictedPython). Production-grade Agents typically implement multiple layers of protection including execution timeouts, memory limits, network access whitelists, and filesystem isolation, ensuring that even if model-generated code contains vulnerabilities, it won't compromise the main system's security.

In code implementation, the core sandbox design includes:

Injecting tool functions into the global scope so model-generated code can call them directly
Overriding console.log to capture all output as return values
Using eval to execute code (for demonstration only; production environments require a secure sandbox)
Restoring the original console.log after execution completes

ReAct vs. CodeAct: Practical Comparison

When calculating the same expression (48÷6) + (15-9)×5 + 12, the difference between the two approaches is very intuitive:

ReAct approach: Requires multiple tool calls (first division, then subtraction, then multiplication, then addition...), progressing step by step
CodeAct approach: The model outputs nested code in one shot — sum(divide(48,6), multiply(subtract(15,9),5), 12) — and gets the result in a single execution

The paper also demonstrates more complex scenarios: finding and comparing product prices across multiple countries. ReAct needs to query each country individually and calculate exchange rates one by one; CodeAct simply writes a for loop to iterate over all countries, completing all the work in a single Action. Experimental data shows that CodeAct's success rate is significantly higher than JSON-based Action (i.e., the ReAct pattern).

Selection Guidelines for the Two Approaches

Dimension	ReAct	CodeAct
Tool Definition	All tools must be predefined	Can directly use language standard libraries
Number of Calls	Multiple iterative calls	Can complete complex logic in one shot
Output Stability	Function Calling guarantees JSON format	Requires parsing Markdown code blocks
Error Handling	Handled at the tool level	Code errors can be auto-repaired
Model Requirements	Lower	Requires strong code generation ability
Security	Tool calls are controllable	Requires a secure sandbox environment

In short, if your scenario has a limited number of tools and high security requirements, ReAct + Function Calling is the safer choice. If your tasks involve complex data processing and multi-step compositional logic, CodeAct's efficiency advantage becomes very significant.

From Tool Calling to Complete Agent Architecture

Using ReAct or CodeAct as a foundation, and adding capabilities like Memory, RAG, and long context, you can build powerful Agent architectures similar to AutoGPT. The MCP (Model Context Protocol) provides Agents with a unified tool ecosystem. MCP is an open protocol launched by Anthropic in late 2024, designed to standardize how LLMs connect with external tools and data sources — similar to how USB-C provides a universal interface for hardware devices, MCP provides AI Agents with unified mechanisms for tool discovery, invocation, and authentication. Through MCP, developers can wrap any service (database queries, API calls, file operations, etc.) as a standardized MCP Server, and any MCP-compatible Agent can plug-and-play with these tools without writing custom adapter code for each one. Whether using ReAct or CodeAct, tool implementations can integrate with MCP to access richer external capabilities.

Various Agent tool-calling methods and architectures are still evolving rapidly. As two foundational paradigms, understanding the principles and trade-offs of ReAct and CodeAct is an essential step toward mastering Agent development.