Practical Guide to Building Multi-Agent Collaborative Applications with CrewAI + FastAPI

Introduction

In the field of AI Agent development, enabling multiple Agents with different roles and skills to work collaboratively has always been a core challenge. Multi-Agent Systems (MAS) represent a classic research direction in artificial intelligence, with the core idea of decomposing complex problems into multiple subtasks completed collaboratively by intelligent agents with different specializations. Before the rise of large language models, MAS was primarily applied in areas like robot control and distributed computing. With the emergence of powerful LLMs like GPT-4, LLM-based Agents gained natural language understanding, reasoning, and tool-calling capabilities, making it possible to build truly "intelligent" multi-agent systems. Frameworks like CrewAI, AutoGen, and LangGraph are products of this trend, and CrewAI was specifically designed to solve this problem — it allows developers to define multiple Agents, assign them different roles, goals, and tasks, and ultimately accomplish complex workflows through collaboration.

This article provides a detailed walkthrough of building a multi-agent collaborative system using CrewAI, combined with FastAPI to package it as an externally accessible API service. The solution supports three integration modes: GPT, domestic Chinese LLMs (Tongyi Qianwen), and local open-source models (Ollama), offering strong practicality and extensibility.

CrewAI Core Concepts Explained

To use CrewAI effectively, you first need to understand its four core concepts: Agent, Task, Process, and Crew.

Agent: Roles in the Team

An Agent is an autonomous controllable unit in CrewAI, analogous to a team member. Each Agent has three key attributes:

Role: The Agent's functional position in the team, such as "Data Researcher" or "Report Analyst"
Goal: The specific objective the Agent needs to achieve
Backstory: Provides contextual information to help the Agent better understand its positioning

Under the hood, these three attributes are concatenated into a System Prompt injected into the LLM's conversation context, thereby "shaping" the Agent's behavioral style and professional inclination. This is why the quality of the backstory directly impacts the Agent's output — the more detailed and specific the role definition, the more likely the model will produce outputs that match expectations.

Task: Specific Work Units

A Task is a specific piece of work assigned to an Agent, containing attributes like task description, expected output, assigned Agent, and available tool list. A key feature is that Tasks support context passing — the output of a previous Task can serve as input for the next Task, providing the foundation for building chained workflows.

Process: Task Coordination Mechanism

Process is responsible for coordinating Agent task execution, similar to a project manager's role. CrewAI provides two execution mechanisms:

Sequential Process: Tasks execute in a predetermined order, with the output of the previous task serving as context for the next
Hierarchical Process: A designated manager Agent oversees task allocation and execution, dynamically assigning tasks based on each Agent's capabilities

The hierarchical process draws on the classic "Plan-and-Execute" Agent architecture: the manager Agent first decomposes and plans the overall goal into subtasks, then dynamically assigns subtasks to the most suitable executor Agents, and finally aggregates results. This pattern is particularly suited for complex scenarios where task boundaries are unclear and execution strategies need dynamic adjustment.

Crew: The Collaborative Whole

A Crew represents a collection of Agents collaborating to complete tasks. It combines the Agent list, Task list, and Process strategy together, defining the overall workflow.

CrewAI Core Concepts

Development Environment and LLM Configuration

Three LLM Integration Options

This project supports three LLM integration methods, allowing developers to choose flexibly based on actual needs:

Option 1: GPT Models (via Proxy)

Access OpenAI's GPT series models (e.g., GPT-4o-mini) through an API proxy. This approach offers fast response times and stable results, suitable for scenarios requiring high output quality.

Option 2: Domestic Chinese LLMs (One API Forwarding)

One API is an open-source OpenAI interface management and distribution system. Its core principle is adapting various model providers' APIs into a unified OpenAI-format interface specification. Since OpenAI's API has become the de facto industry standard, most AI development frameworks (including CrewAI) natively support the OpenAI interface. Through the One API middleware layer, developers can integrate Tongyi Qianwen, Wenxin Yiyan, Zhipu GLM, and other domestic models without modifying any business code. This adapter layer design is extremely valuable in engineering practice, especially for scenarios requiring multi-model comparison testing or flexible switching between different providers. Deployment is straightforward — download the compiled package for your system from GitHub, execute it to start the service, which runs on port 3000 by default.

Option 3: Local Open-Source Models (Ollama)

Ollama makes running large models on consumer-grade hardware possible through support for quantized model formats like GGUF (GPT-Generated Unified Format). Quantization technology compresses model weights from FP32/FP16 to low-precision formats like INT4/INT8, reducing memory usage by 50%-75% at the cost of slight precision loss. For example, Llama 3.1 7B requires only about 4-5GB of VRAM after quantization. Ollama also includes a built-in OpenAI-compatible REST API (default port 11434), allowing frameworks like CrewAI to seamlessly connect to local models simply by setting the OPENAI_BASE_URL environment variable, without additional adapter code. Ollama is a lightweight cross-platform tool that, once installed, allows downloading and launching models through simple command-line operations without relying on external APIs, making it suitable for scenarios with data privacy requirements.

Environment Setup Key Points

The development environment requires Anaconda (for Python virtual environment management) and PyCharm (IDE). The project uses Python 3.11, with core dependencies including crewai, crewai-tools, fastapi, etc.

Practical Case: Research Report Generation System

Case Architecture Design

This case extends the official CrewAI starter example. The core functionality is: the user inputs a topic, and the system automatically completes two phases of work — information research and report writing.

Project Source Code Structure

The system defines two Agents:

Agent	Role	Responsibility
Researcher	Senior Data Researcher	Explore cutting-edge developments on the topic, identify the 10 most relevant key points
Report Analyst	Report Writing Expert	Expand research results into a complete analytical report

The corresponding two Tasks execute sequentially: the research task's output serves as the input context for the report task, forming a complete work chain.

Crew Core Code Analysis

The Crew implementation is encapsulated in a class, defining Agents and Tasks through the decorator pattern. CrewAI uses Python Decorators to define Agents and Tasks — a typical declarative programming style. The @agent, @task, @crew decorators essentially perform metaprogramming annotations on methods. At runtime, the framework automatically collects these annotated methods through reflection to build the Agent list and Task execution graph. The advantage of this design pattern is clear code structure and separation of concerns, while the framework handles complex logic like dependency injection and execution order management behind the scenes — developers only need to focus on defining business logic. Similar patterns are widely used in FastAPI route definitions, pytest test discovery, and other scenarios.

@agent
def research(self) -> Agent:
    # Load configuration from agents.yaml, create Researcher Agent
    return Agent(config=self.agents_config['research'], verbose=True)

@agent  
def reporting_analyst(self) -> Agent:
    # Create Report Analyst Agent
    return Agent(config=self.agents_config['reporting_analyst'], verbose=True)

@task
def research_task(self) -> Task:
    return Task(config=self.tasks_config['research_task'])

@crew
def crew(self) -> Crew:
    # Combine Agents and Tasks, use sequential execution process
    return Crew(agents=self.agents, tasks=self.tasks, process=Process.sequential)

The specific parameters for Agents and Tasks are managed through YAML configuration files, including role descriptions, goals, backstories, task descriptions, and expected outputs, achieving separation of configuration and code.

Crew Execution Chain

FastAPI Service Packaging

Packaging CrewAI as an API service is a major highlight of this case. The core logic is:

On service startup: Initialize environment variables based on the configured model type (OpenAI/One API/Ollama)
Receive POST request: Parse the Topic parameter from the user
Execute Crew: Call crew().kickoff(inputs={'topic': topic}) to start multi-Agent collaboration
Return results: Support both streaming and non-streaming response modes

FastAPI Service Call Flow

Model switching is implemented through configuration flags, requiring no changes to business code:

# Switch LLM through model_type flag
if model_type == 'oneapi':
    # Use One API to forward to domestic models
elif model_type == 'ollama':
    # Use local Ollama model
else:
    # Default to OpenAI GPT model

This design allows developers to freely switch between different LLMs without modifying any business logic code.

Comparison of Three Model Results

Under the same task (researching cutting-edge developments in AI/LLMs), the three models showed notable performance differences:

Model	Speed	Output Quality	Instruction Following
GPT-4o-mini	Fastest	Excellent, detailed content	Strictly output 10 items as required
Tongyi Qianwen Max	Relatively fast	Good	Output 15 items (exceeded requirements)
Llama 3.1 (7B)	Relatively slow	Average	Only output 3 items, unsatisfactory

These real-world test results reveal a core capability dimension of LLMs — Instruction Following. Research shows that a model's instruction-following ability is positively correlated with parameter scale, though not linearly. Models that have undergone RLHF (Reinforcement Learning from Human Feedback) or DPO (Direct Preference Optimization) alignment training can demonstrate strong instruction-following capabilities even with relatively fewer parameters. The 7B local model's poor performance in complex multi-step instruction scenarios is primarily due to its relatively limited context understanding and instruction decomposition capabilities. Based on test results, GPT-4o-mini performs best in both speed and quality; Tongyi Qianwen Max is generally usable but slightly deviates in instruction following; the local 7B model is clearly insufficient due to parameter scale and hardware resource limitations. If using local models, it's recommended to choose versions with 13B+ parameters, or try models specifically optimized for instruction following like Qwen2 that perform better in Chinese scenarios.

Summary and Recommendations

CrewAI provides a clear abstraction framework for multi-Agent collaboration. Combined with FastAPI, you can quickly build deployable AI services. In practice, several points are worth noting:

Model selection is crucial: Agent performance is highly dependent on the underlying LLM's capabilities — always conduct thorough evaluation testing before production use
Separation of configuration and code: Use YAML configuration files to manage Agent and Task parameters for easier maintenance and adjustment
Flexible model switching: Through middleware layers like One API that unify interfaces, you can conveniently switch and compare between different models
Pipeline extension: For more complex scenarios, leverage CrewAI's Pipeline feature to chain or parallelize multiple Crews, building more sophisticated workflows

Once you've mastered CrewAI's core concepts and FastAPI service packaging methods, you can extend more Agent roles and task types on this foundation to build multi-Agent collaborative applications suited to your business scenarios.

Key Takeaways

CrewAI builds multi-agent collaborative systems through four core concepts: Agent, Task, Process, and Crew, supporting both sequential and hierarchical task execution mechanisms
The project supports three integration methods: GPT, domestic Chinese LLMs (via One API forwarding), and local open-source models (Ollama), with seamless switching through configuration flags
Combining FastAPI to package CrewAI as an API service with streaming and non-streaming response support, providing standardized HTTP interfaces
Real-world testing shows GPT-4o-mini performs best, Tongyi Qianwen Max is usable but slightly deviates in instruction following, while local 7B models are clearly insufficient due to limited RLHF alignment and parameter scale
YAML configuration files manage Agent roles and Task parameters, achieving separation of configuration and code for easier maintenance and extension