ReAct vs. CodeAct: Two Core Approaches to Agent Tool Calling

Deep dive comparing ReAct and CodeAct: two fundamental paradigms for Agent tool calling.
This article compares two mainstream Agent tool-calling architectures: ReAct implements tool calling through a "Reasoning → Action → Observation" loop and ensures output stability with Function Calling, but suffers from excessive call counts and poor composability. CodeAct lets the model output code for sandbox execution, completing complex logic in one shot with higher efficiency, but demands stronger code generation ability and a secure sandbox. Each has its ideal use cases, and together they form the foundational paradigms for building Agents.
Introduction: Why Agents Need Tool Calling
Although large language models possess powerful reasoning capabilities, without external tool support, they cannot perform practical tasks like searching the web or manipulating local files. Therefore, how to make LLMs "know which tool to call" and "successfully call it" has become the central challenge in building Agents.
This article provides an in-depth analysis of two mainstream Agent tool-calling architectures — ReAct and CodeAct — covering everything from paper principles to hands-on code, helping you understand their design philosophies, trade-offs, and real-world use cases.

ReAct: The Alternating Cycle of Reasoning and Action
Core Idea: Reasoning + Acting
ReAct was published in October 2022, with its name derived from the combination of Reasoning and Acting. Its key insight is that reasoning alone or acting alone each has obvious shortcomings:
- Pure reasoning models (e.g., Chain of Thought): Can think but cannot access external tools, potentially reaching wrong conclusions due to gaps in training data
- Pure action models (e.g., early WebGPT): Can call tools but lack reasoning ability, unable to connect search results together
- ReAct's solution: Combine both — the model first reasons (Thought) → then executes an action (Action) → receives an observation (Observation) → and repeats the cycle
The Chain of Thought (CoT) mentioned here is a prompting technique proposed by Google in January 2022. By adding guiding phrases like "Let's think step by step" to prompts, it encourages the model to break complex problems into intermediate reasoning steps. CoT significantly improved model performance on tasks like mathematical reasoning and logical judgment, but its fundamental limitation is that all reasoning happens internally — it can only leverage knowledge seen during training and cannot access real-time information or perform external operations. ReAct builds on CoT by adding the critical capability of "interacting with the external world."
ReAct's Workflow
Each round of ReAct consists of three parts:
- Thought: The model's thinking process, determining which tool is needed
- Action: The actual external tool call
- Observation: The result returned by the tool, serving as context input for the next round
A classic example from the paper involves querying "What other devices can Apple Remote control?" The model first searches for Apple Remote, discovers it was designed for Front Row software, then searches for Front Row, and ultimately finds the answer: "keyboard function keys." The entire process demonstrates the alternation between reasoning and tool calling.
Key Technical Details: From Prompt Engineering to Function Calling
You might not have noticed that ReAct is essentially a prompt engineering technique that doesn't require model fine-tuning. When it was proposed in 2022, OpenAI's Function Calling hadn't been released yet (Function Calling launched in June 2023), so the original implementation relied entirely on:
- Constraining model output format through prompts
- Parsing output strings to separate Thought and Action
- Executing the corresponding tool based on parsed results
The downside of this approach is that it demands strong instruction-following ability from the model, and parsing success rates drop when output formats are unstable.
Modern ReAct Implementation: Combined with Function Calling
Function Calling is a capability OpenAI introduced in June 2023 alongside the GPT-3.5/GPT-4 API. It allows developers to define available tools in API requests using JSON Schema (function names, parameter types, descriptions, etc.). When generating responses, the model determines whether a tool call is needed, and if so, outputs structured JSON parameters instead of natural language text. This fundamentally solved the instability of early ReAct's reliance on string parsing — the output format is guaranteed at the API level, and developers no longer need to use regular expressions to "guess" which tool the model wants to call. Today, virtually all major LLM APIs (Claude, Gemini, Qwen, etc.) support similar tool-calling mechanisms.
In practice, you can combine ReAct with modern Function Calling: retain the model's ability to output Thoughts, but delegate the Action portion to Function Calling's structured JSON output. The core implementation logic is as follows:
- Wrap the OpenAI API, passing in tool definitions (including name, description, parameter)
- Call the LLM in the Agent's main loop
- Check
finish_reason: if it'sstop, terminate; if it'stool_calls, execute the tool - Add the tool execution result as an Observation to the context and enter the next loop iteration
In testing, when asked to calculate the complex expression (48÷6) + (15-9)×5 + 12, Gemini 2.0 sequentially calls addition, subtraction, multiplication, and division tools, ultimately arriving at the correct answer of 65. But the problem is obvious — too many calls, since each step depends on the result of the previous one.
CodeAct: Replacing Tool Calls with Code Execution
Design Motivation: Solving ReAct's Three Pain Points
CodeAct was proposed by Apple's machine learning research team. Its core idea is: since LLMs are good at writing code, why not let them directly output code and execute it in a sandbox?
CodeAct specifically addresses three pain points of ReAct:
- Limited tool set: ReAct requires all tools to be predefined — even simple math operations need to be wrapped as tools. Programming languages come with rich standard libraries built in
- Poor composability: ReAct cannot nest multiple tools in a single call and must execute step by step. Code can freely nest function calls
- Limited expressiveness: Code can implement loops, conditional logic, and other complex patterns, far exceeding the fixed function-call paradigm
The Relationship Between CodeAct and Manus
The driving force behind the CodeAct paper is a co-founder of Manus (a globally renowned AI Agent product). Before the MCP protocol emerged, Manus used the CodeAct approach for tool calling, which speaks to CodeAct's practical value in industry.
How CodeAct Works
CodeAct's architecture replaces ReAct's "tool calling" with "sandbox code execution":
- The System Prompt instructs the model to output JavaScript/Python code blocks
- The model outputs results via
console.log(or Python'sprint) - Code is executed in a sandbox environment, with output returned as Observations
- If the code throws an error, the error message is also passed back as context, and the model self-corrects
Key Considerations for Sandbox Implementation
Code execution security is the core challenge CodeAct faces in production environments. A sandbox refers to a restricted execution environment isolated from the host system, preventing malicious or erroneous code from causing damage. In practice, common sandbox solutions include: Docker container isolation (e.g., E2B, Modal, and other cloud services), WebAssembly runtimes (e.g., Wasmer), and restricted interpreter environments (e.g., Python's RestrictedPython). Production-grade Agents typically implement multiple layers of protection including execution timeouts, memory limits, network access whitelists, and filesystem isolation, ensuring that even if model-generated code contains vulnerabilities, it won't compromise the main system's security.
In code implementation, the core sandbox design includes:
- Injecting tool functions into the global scope so model-generated code can call them directly
- Overriding
console.logto capture all output as return values - Using
evalto execute code (for demonstration only; production environments require a secure sandbox) - Restoring the original
console.logafter execution completes
ReAct vs. CodeAct: Practical Comparison
When calculating the same expression (48÷6) + (15-9)×5 + 12, the difference between the two approaches is very intuitive:
- ReAct approach: Requires multiple tool calls (first division, then subtraction, then multiplication, then addition...), progressing step by step
- CodeAct approach: The model outputs nested code in one shot —
sum(divide(48,6), multiply(subtract(15,9),5), 12)— and gets the result in a single execution
The paper also demonstrates more complex scenarios: finding and comparing product prices across multiple countries. ReAct needs to query each country individually and calculate exchange rates one by one; CodeAct simply writes a for loop to iterate over all countries, completing all the work in a single Action. Experimental data shows that CodeAct's success rate is significantly higher than JSON-based Action (i.e., the ReAct pattern).
Selection Guidelines for the Two Approaches
| Dimension | ReAct | CodeAct |
|---|---|---|
| Tool Definition | All tools must be predefined | Can directly use language standard libraries |
| Number of Calls | Multiple iterative calls | Can complete complex logic in one shot |
| Output Stability | Function Calling guarantees JSON format | Requires parsing Markdown code blocks |
| Error Handling | Handled at the tool level | Code errors can be auto-repaired |
| Model Requirements | Lower | Requires strong code generation ability |
| Security | Tool calls are controllable | Requires a secure sandbox environment |
In short, if your scenario has a limited number of tools and high security requirements, ReAct + Function Calling is the safer choice. If your tasks involve complex data processing and multi-step compositional logic, CodeAct's efficiency advantage becomes very significant.
From Tool Calling to Complete Agent Architecture
Using ReAct or CodeAct as a foundation, and adding capabilities like Memory, RAG, and long context, you can build powerful Agent architectures similar to AutoGPT. The MCP (Model Context Protocol) provides Agents with a unified tool ecosystem. MCP is an open protocol launched by Anthropic in late 2024, designed to standardize how LLMs connect with external tools and data sources — similar to how USB-C provides a universal interface for hardware devices, MCP provides AI Agents with unified mechanisms for tool discovery, invocation, and authentication. Through MCP, developers can wrap any service (database queries, API calls, file operations, etc.) as a standardized MCP Server, and any MCP-compatible Agent can plug-and-play with these tools without writing custom adapter code for each one. Whether using ReAct or CodeAct, tool implementations can integrate with MCP to access richer external capabilities.
Various Agent tool-calling methods and architectures are still evolving rapidly. As two foundational paradigms, understanding the principles and trade-offs of ReAct and CodeAct is an essential step toward mastering Agent development.
Related articles
TutorialsCursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization
A complete methodology for open-source project customization based on real-world experience, detailing the Cursor+Codex dual-IDE workflow, seven-stage process, MVP validation, and AI source code reading techniques.
TutorialsCursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes
Build a full-stack blog in 50 minutes using Cursor IDE's multi-Agent mode with Next.js, Clerk auth, and Supabase. Learn the 4-phase AI Agent workflow and key integration pitfalls.
TutorialsBuilding an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration
Cursor engineer Eric shares practical insights on building an AI software factory: automation levels, guardrail design, parallel Agent management, and scaling to 1000+ Agents for 24/7 development.