Practical Guide to Building Multi-Agent Collaborative Applications with CrewAI + FastAPI
Practical Guide to Building Multi-Agen…
Practical guide to building multi-Agent systems with CrewAI and packaging them as FastAPI services
This article details how to build a multi-Agent collaborative system using the CrewAI framework, explaining the four core concepts of Agent, Task, Process, and Crew. Through a research report generation case study, it demonstrates the complete workflow from defining Agent roles to FastAPI service packaging. The project supports three model integration methods: GPT, domestic Chinese LLMs (One API forwarding), and local Ollama. Real-world testing shows GPT-4o-mini performs best, while local 7B models are insufficient.
Introduction
In the field of AI Agent development, enabling multiple Agents with different roles and skills to work collaboratively has always been a core challenge. Multi-Agent Systems (MAS) represent a classic research direction in artificial intelligence, with the core idea of decomposing complex problems into multiple subtasks completed collaboratively by intelligent agents with different specializations. Before the rise of large language models, MAS was primarily applied in areas like robot control and distributed computing. With the emergence of powerful LLMs like GPT-4, LLM-based Agents gained natural language understanding, reasoning, and tool-calling capabilities, making it possible to build truly "intelligent" multi-agent systems. Frameworks like CrewAI, AutoGen, and LangGraph are products of this trend, and CrewAI was specifically designed to solve this problem — it allows developers to define multiple Agents, assign them different roles, goals, and tasks, and ultimately accomplish complex workflows through collaboration.
This article provides a detailed walkthrough of building a multi-agent collaborative system using CrewAI, combined with FastAPI to package it as an externally accessible API service. The solution supports three integration modes: GPT, domestic Chinese LLMs (Tongyi Qianwen), and local open-source models (Ollama), offering strong practicality and extensibility.
CrewAI Core Concepts Explained
To use CrewAI effectively, you first need to understand its four core concepts: Agent, Task, Process, and Crew.
Agent: Roles in the Team
An Agent is an autonomous controllable unit in CrewAI, analogous to a team member. Each Agent has three key attributes:
- Role: The Agent's functional position in the team, such as "Data Researcher" or "Report Analyst"
- Goal: The specific objective the Agent needs to achieve
- Backstory: Provides contextual information to help the Agent better understand its positioning
Under the hood, these three attributes are concatenated into a System Prompt injected into the LLM's conversation context, thereby "shaping" the Agent's behavioral style and professional inclination. This is why the quality of the backstory directly impacts the Agent's output — the more detailed and specific the role definition, the more likely the model will produce outputs that match expectations.
Task: Specific Work Units
A Task is a specific piece of work assigned to an Agent, containing attributes like task description, expected output, assigned Agent, and available tool list. A key feature is that Tasks support context passing — the output of a previous Task can serve as input for the next Task, providing the foundation for building chained workflows.
Process: Task Coordination Mechanism
Process is responsible for coordinating Agent task execution, similar to a project manager's role. CrewAI provides two execution mechanisms:
- Sequential Process: Tasks execute in a predetermined order, with the output of the previous task serving as context for the next
- Hierarchical Process: A designated manager Agent oversees task allocation and execution, dynamically assigning tasks based on each Agent's capabilities
The hierarchical process draws on the classic "Plan-and-Execute" Agent architecture: the manager Agent first decomposes and plans the overall goal into subtasks, then dynamically assigns subtasks to the most suitable executor Agents, and finally aggregates results. This pattern is particularly suited for complex scenarios where task boundaries are unclear and execution strategies need dynamic adjustment.
Crew: The Collaborative Whole
A Crew represents a collection of Agents collaborating to complete tasks. It combines the Agent list, Task list, and Process strategy together, defining the overall workflow.

Development Environment and LLM Configuration
Three LLM Integration Options
This project supports three LLM integration methods, allowing developers to choose flexibly based on actual needs:
Option 1: GPT Models (via Proxy)
Access OpenAI's GPT series models (e.g., GPT-4o-mini) through an API proxy. This approach offers fast response times and stable results, suitable for scenarios requiring high output quality.
Option 2: Domestic Chinese LLMs (One API Forwarding)
One API is an open-source OpenAI interface management and distribution system. Its core principle is adapting various model providers' APIs into a unified OpenAI-format interface specification. Since OpenAI's API has become the de facto industry standard, most AI development frameworks (including CrewAI) natively support the OpenAI interface. Through the One API middleware layer, developers can integrate Tongyi Qianwen, Wenxin Yiyan, Zhipu GLM, and other domestic models without modifying any business code. This adapter layer design is extremely valuable in engineering practice, especially for scenarios requiring multi-model comparison testing or flexible switching between different providers. Deployment is straightforward — download the compiled package for your system from GitHub, execute it to start the service, which runs on port 3000 by default.
Option 3: Local Open-Source Models (Ollama)
Ollama makes running large models on consumer-grade hardware possible through support for quantized model formats like GGUF (GPT-Generated Unified Format). Quantization technology compresses model weights from FP32/FP16 to low-precision formats like INT4/INT8, reducing memory usage by 50%-75% at the cost of slight precision loss. For example, Llama 3.1 7B requires only about 4-5GB of VRAM after quantization. Ollama also includes a built-in OpenAI-compatible REST API (default port 11434), allowing frameworks like CrewAI to seamlessly connect to local models simply by setting the OPENAI_BASE_URL environment variable, without additional adapter code. Ollama is a lightweight cross-platform tool that, once installed, allows downloading and launching models through simple command-line operations without relying on external APIs, making it suitable for scenarios with data privacy requirements.
Environment Setup Key Points
The development environment requires Anaconda (for Python virtual environment management) and PyCharm (IDE). The project uses Python 3.11, with core dependencies including crewai, crewai-tools, fastapi, etc.
Practical Case: Research Report Generation System
Case Architecture Design
This case extends the official CrewAI starter example. The core functionality is: the user inputs a topic, and the system automatically completes two phases of work — information research and report writing.

The system defines two Agents:
| Agent | Role | Responsibility |
|---|---|---|
| Researcher | Senior Data Researcher | Explore cutting-edge developments on the topic, identify the 10 most relevant key points |
| Report Analyst | Report Writing Expert | Expand research results into a complete analytical report |
The corresponding two Tasks execute sequentially: the research task's output serves as the input context for the report task, forming a complete work chain.
Crew Core Code Analysis
The Crew implementation is encapsulated in a class, defining Agents and Tasks through the decorator pattern. CrewAI uses Python Decorators to define Agents and Tasks — a typical declarative programming style. The @agent, @task, @crew decorators essentially perform metaprogramming annotations on methods. At runtime, the framework automatically collects these annotated methods through reflection to build the Agent list and Task execution graph. The advantage of this design pattern is clear code structure and separation of concerns, while the framework handles complex logic like dependency injection and execution order management behind the scenes — developers only need to focus on defining business logic. Similar patterns are widely used in FastAPI route definitions, pytest test discovery, and other scenarios.
@agent
def research(self) -> Agent:
# Load configuration from agents.yaml, create Researcher Agent
return Agent(config=self.agents_config['research'], verbose=True)
@agent
def reporting_analyst(self) -> Agent:
# Create Report Analyst Agent
return Agent(config=self.agents_config['reporting_analyst'], verbose=True)
@task
def research_task(self) -> Task:
return Task(config=self.tasks_config['research_task'])
@crew
def crew(self) -> Crew:
# Combine Agents and Tasks, use sequential execution process
return Crew(agents=self.agents, tasks=self.tasks, process=Process.sequential)
The specific parameters for Agents and Tasks are managed through YAML configuration files, including role descriptions, goals, backstories, task descriptions, and expected outputs, achieving separation of configuration and code.

FastAPI Service Packaging
Packaging CrewAI as an API service is a major highlight of this case. The core logic is:
- On service startup: Initialize environment variables based on the configured model type (OpenAI/One API/Ollama)
- Receive POST request: Parse the Topic parameter from the user
- Execute Crew: Call
crew().kickoff(inputs={'topic': topic})to start multi-Agent collaboration - Return results: Support both streaming and non-streaming response modes

Model switching is implemented through configuration flags, requiring no changes to business code:
# Switch LLM through model_type flag
if model_type == 'oneapi':
# Use One API to forward to domestic models
elif model_type == 'ollama':
# Use local Ollama model
else:
# Default to OpenAI GPT model
This design allows developers to freely switch between different LLMs without modifying any business logic code.
Comparison of Three Model Results
Under the same task (researching cutting-edge developments in AI/LLMs), the three models showed notable performance differences:
| Model | Speed | Output Quality | Instruction Following |
|---|---|---|---|
| GPT-4o-mini | Fastest | Excellent, detailed content | Strictly output 10 items as required |
| Tongyi Qianwen Max | Relatively fast | Good | Output 15 items (exceeded requirements) |
| Llama 3.1 (7B) | Relatively slow | Average | Only output 3 items, unsatisfactory |
These real-world test results reveal a core capability dimension of LLMs — Instruction Following. Research shows that a model's instruction-following ability is positively correlated with parameter scale, though not linearly. Models that have undergone RLHF (Reinforcement Learning from Human Feedback) or DPO (Direct Preference Optimization) alignment training can demonstrate strong instruction-following capabilities even with relatively fewer parameters. The 7B local model's poor performance in complex multi-step instruction scenarios is primarily due to its relatively limited context understanding and instruction decomposition capabilities. Based on test results, GPT-4o-mini performs best in both speed and quality; Tongyi Qianwen Max is generally usable but slightly deviates in instruction following; the local 7B model is clearly insufficient due to parameter scale and hardware resource limitations. If using local models, it's recommended to choose versions with 13B+ parameters, or try models specifically optimized for instruction following like Qwen2 that perform better in Chinese scenarios.
Summary and Recommendations
CrewAI provides a clear abstraction framework for multi-Agent collaboration. Combined with FastAPI, you can quickly build deployable AI services. In practice, several points are worth noting:
- Model selection is crucial: Agent performance is highly dependent on the underlying LLM's capabilities — always conduct thorough evaluation testing before production use
- Separation of configuration and code: Use YAML configuration files to manage Agent and Task parameters for easier maintenance and adjustment
- Flexible model switching: Through middleware layers like One API that unify interfaces, you can conveniently switch and compare between different models
- Pipeline extension: For more complex scenarios, leverage CrewAI's Pipeline feature to chain or parallelize multiple Crews, building more sophisticated workflows
Once you've mastered CrewAI's core concepts and FastAPI service packaging methods, you can extend more Agent roles and task types on this foundation to build multi-Agent collaborative applications suited to your business scenarios.
Key Takeaways
- CrewAI builds multi-agent collaborative systems through four core concepts: Agent, Task, Process, and Crew, supporting both sequential and hierarchical task execution mechanisms
- The project supports three integration methods: GPT, domestic Chinese LLMs (via One API forwarding), and local open-source models (Ollama), with seamless switching through configuration flags
- Combining FastAPI to package CrewAI as an API service with streaming and non-streaming response support, providing standardized HTTP interfaces
- Real-world testing shows GPT-4o-mini performs best, Tongyi Qianwen Max is usable but slightly deviates in instruction following, while local 7B models are clearly insufficient due to limited RLHF alignment and parameter scale
- YAML configuration files manage Agent roles and Task parameters, achieving separation of configuration and code for easier maintenance and extension
Related articles
TutorialsCursor + Codex Dual-IDE Collaboration: A Practical Methodology for Open-Source Project Customization
A complete methodology for open-source project customization based on real-world experience, detailing the Cursor+Codex dual-IDE workflow, seven-stage process, MVP validation, and AI source code reading techniques.
TutorialsCursor Multi-Agent in Practice: Building a Full-Stack Next.js Blog in 50 Minutes
Build a full-stack blog in 50 minutes using Cursor IDE's multi-Agent mode with Next.js, Clerk auth, and Supabase. Learn the 4-phase AI Agent workflow and key integration pitfalls.
TutorialsBuilding an AI Software Factory from Scratch: A Cursor Engineer's Hands-On Experience with Multi-Agent Collaboration
Cursor engineer Eric shares practical insights on building an AI software factory: automation levels, guardrail design, parallel Agent management, and scaling to 1000+ Agents for 24/7 development.