Frontend AI Full-Stack Development in Practice: Building Multimodal Applications with PNPM MonoRepo Architecture

Why Big Tech Companies Use MonoRepo Architecture for AI Full-Stack Development

With the surge in demand for AI full-stack positions, how can frontend developers break into this field? One key answer is mastering engineering architecture based on PNPM MonoRepo. From the source code organization of React and Vue3, to AI product engineering at companies like ByteDance and Alibaba, MonoRepo has become the de facto standard.

MonoRepo (monolithic repository) was first practiced at scale by Google in the early 2000s, with over 2 billion lines of code stored in a single unified repository. Facebook, Microsoft, Twitter, and other tech giants subsequently adopted similar strategies. The core philosophy of this architectural pattern is: code visibility and accessibility promote collaboration and reduce redundant work. In the JavaScript ecosystem, Lerna was the first popular MonoRepo management tool (2015), but as project scales grew, its performance bottlenecks became apparent, giving rise to next-generation tools like PNPM Workspace, Nx, and TurboRepo.

bilibili source

In the traditional Multi-Repo approach, each project has its own independent Git repository, making the synchronization cost of shared packages extremely high—after modifying a Utils package, you need to publish to NPM, then install updates in each project one by one, with missed synchronizations occurring frequently. MonoRepo consolidates multiple projects into a single workspace with unified dependency management, atomic commits, and instant updates, completely solving these pain points.

MonoRepo vs Multi-Repo Core Comparison

Dimension	Multi-Repo	MonoRepo
Code Reuse	Copy-paste or publish private NPM packages	Direct references, instant effect
Dependency Management	Each repo manages independently, version conflicts common	Unified management, strong consistency
Development Flow	Must publish before integration testing	Save and update instantly, easy debugging
CI/CD	Each repo configured independently	Incremental builds, only compile changed packages

Both React and Vue3 use MonoRepo—React's Packages directory contains dozens of sub-packages (such as react-dom, react-reconciler, scheduler, etc., each with clear responsibilities), while Vue3 is cleanly divided into three major modules: compiler (compiler-core, compiler-dom), reactivity (reactivity), and runtime (runtime-core, runtime-dom). This organizational approach allows core teams to iterate on subsystems independently while ensuring cross-version compatibility.

Technology Selection: PNPM + TurboRepo as the Ideal Combination

For MonoRepo toolchain selection, the recommended combination is PNPM Workspace + TurboRepo:

PNPM: Uses hard links so that regardless of how many projects exist, the same version of a dependency is stored only once on disk, greatly saving space; it also solves NPM's phantom dependency problem

PNPM stands for Performant NPM, and its core innovation is Content-Addressable Storage. Traditional npm/yarn copies a complete node_modules for each project, while PNPM maintains a unified global storage directory (typically at ~/.pnpm-store), with all projects pointing to the same physical files through Hard Links. Hard links are a filesystem-level concept where multiple filenames point to the same inode on disk, consuming no additional space. Additionally, PNPM constructs a strict node_modules structure through Symlinks, ensuring projects can only access dependencies explicitly declared in package.json, fundamentally eliminating Phantom Dependencies—where a project accidentally uses undeclared dependencies that were indirectly installed by other packages.

TurboRepo: Handles task orchestration, build caching, and incremental updates, only building sub-packages that have changed

TurboRepo was acquired and is maintained by Vercel. Its core capabilities are intelligent task scheduling and remote caching. It analyzes dependency relationships in package.json to build a Directed Acyclic Graph (DAG), determining the parallel execution order of tasks. The build caching mechanism is based on content hashing: TurboRepo calculates a hash value for each package's source files, environment variables, and dependency versions—if the hash hasn't changed, it directly reuses the previous build artifacts, skipping the compilation process. In CI/CD scenarios, Remote Caching allows team members to share build results—a package built by one developer doesn't need to be rebuilt by others, which can reduce CI time by 60%-80% in large MonoRepos.

The two complement each other—PNPM manages dependencies, TurboRepo manages builds—this is the industry-recognized best practice.

Architecture Design Principles for AI Full-Stack Applications

Designing a frontend-driven AI full-stack application requires progressive thinking across five layers:

Step 1: Sub-Package Planning

# pnpm-workspace.yml
packages:
  - 'packages/*'
  - 'apps/*'

Packages Layer (shared capabilities):

ai-engine: AI engine core, encapsulating model calls and chain processing
types: TypeScript type definitions
utils: General utility functions
config: Shared configurations for ESLint, CommitLint, etc.

Apps Layer (application entry points):

web: Frontend interface
server: Backend service (NestJS)

Step 2: Define Business Direction

Current mainstream AI application business directions include:

Text-to-text, text-to-image, image-to-image (Figma AI, V0, etc.)
Text-to-SQL (BI intelligent analytics)
Text-to-Code (code generation)
Text-to-music, text-to-video, auto-editing, digital humans

Step 3: Choose Development Mode

Workflow Mode: Human-defined nodes and processes with strong determinism, suitable for enterprise-level orchestration engines.

Agent Mode: Possesses autonomous awareness, capable of intent recognition and tool calling, more flexible but with more uncontrollable factors.

Workflow mode is essentially a Deterministic Finite State Machine (FSM), where each node's inputs, outputs, and transition conditions are determined at design time. Typical examples include the visual orchestration interfaces of Dify and Coze. Its advantages are predictability, debuggability, and auditability, making it suitable for enterprise scenarios like customer service flows and data processing pipelines. Agent mode is based on the ReAct (Reasoning + Acting) paradigm, where the model autonomously decides the next action at each step: observe environment → reason → select tool → execute action → observe result, forming a loop. An Agent's core components include a Planner, Tools, Memory system, and Reflection mechanism. The two modes are not mutually exclusive—a common production approach is to use Workflow as the outer skeleton while embedding Agents within specific nodes to handle uncertain tasks.

Step 4: Technology Stack Selection

Layer	Recommended Solution
Frontend	React / Vue
Backend	NestJS / Next.js
Database	PostgreSQL + Prisma ORM
Service Orchestration	Docker + Docker Compose
AI Framework	LangChain.js / LangGraph.js
Model Deployment	Ollama (local) / SaaS API

NestJS is a progressive server-side framework based on Node.js, heavily influenced by Angular's architectural philosophy, employing Dependency Injection (DI), decorator patterns, and modular design. It natively supports TypeScript and provides a Controller-Service-Module layered architecture, making it ideal for building enterprise-level API services. In AI full-stack scenarios, NestJS's advantages include: built-in WebSocket support (for streaming output via SSE/Server-Sent Events), powerful middleware and interceptor mechanisms (for request rate limiting and token billing), and seamless integration with Prisma ORM (for conversation history persistence). Compared to Express's free-form approach, NestJS's constrained architecture is better suited for multi-person collaborative AI product engineering.

Hands-On: Complete Call Chain for a Multimodal AI Application

Using a multimodal conversation application as an example, the complete call chain is: User → Frontend → NestJS Service → AI Core Package → Ollama Model Service.

AI Engine Layer Core Implementation

Encapsulate model calls in packages/ai-engine:

import { ChatOllama } from '@langchain/community/chat_models/ollama';

const llm = new ChatOllama({ model: 'qwen3-vl:2b' });

export async function invoke(prompt: string) {
  const res = await llm.invoke(prompt);
  return res.content;
}

LangChain was created by Harrison Chase in 2022 as an AI application development framework, initially as a Python version, later expanding to a JavaScript/TypeScript version (LangChain.js). Its core abstractions include: Chain (chaining multiple LLM operations together), Agent (intelligent agents with tool-calling and reasoning capabilities), Memory (conversation memory management), and Retriever (for RAG scenarios). LangGraph.js is its advanced version, introducing finite state machine concepts that allow developers to define complex multi-step AI workflows as graphs, supporting conditional branches, loops, and human-in-the-loop nodes, making it particularly suitable for building production-grade Agent systems.

In multimodal scenarios, image input is supported: read the image as a Buffer, convert to Base64, pass the image URL and text prompt through HumanMessage's content array, and the model can analyze the image content and return computed results.

Ollama is an open-source local large model runtime, similar to how Docker encapsulates containers—it simplifies model downloading, quantization, and inference into simple command-line operations. Developers only need to run ollama run qwen3-vl:2b to start a local inference service compatible with the OpenAI API format. Ollama supports GGUF-format quantized models, using llama.cpp as the underlying inference engine, capable of running on consumer-grade GPUs or even pure CPU environments. The qwen3-vl:2b mentioned in this article is a 2B parameter quantized version of Alibaba's Qwen vision-language model, supporting multimodal image-text input, suitable for local development and debugging. Compared to calling cloud APIs, local deployment offers zero cost, no network latency, and controllable data privacy.

Service Layer Invocation

Expose the API through a Node.js HTTP service in apps/server:

import { invoke } from '@miaoma/ai-engine';
import http from 'node:http';

const server = http.createServer(async (req, res) => {
  const result = await invoke('帮我计算12加1');
  res.end(JSON.stringify(result));
});

server.listen(3200);

The key point is: the AI engine package is installed into the Server via workspace:*, so after modifying the engine code, a rebuild takes effect immediately without publishing an NPM package. workspace:* is a PNPM-specific protocol prefix that tells the package manager to link directly to the corresponding package in the local workspace rather than downloading from the NPM Registry. This means during development, any modifications to the ai-engine package are immediately visible to the server after rebuilding, achieving a true "save and it works" development experience.

Architectural Thinking Matters More Than AI Coding Tools

Many developers report that AI coding tools (like Cursor, Claude Code) provide limited help, and the core reason is a lack of architectural thinking. Whether you can clearly decompose a large project into modules, plan the development pipeline, and write planning documents (like PLANNING.md) directly determines the effectiveness of AI Coding.

Once you master this architectural capability, you can not only guide AI to develop efficiently but also demonstrate system design skills in interviews—this is the true core competitiveness for frontend developers in the AI era.

Summary

The core path for frontend AI full-stack development: MonoRepo engineering architecture → AI engine layer encapsulation (LangChain/custom SDK) → Service API exposure → Frontend orchestration interface. Mastering this pipeline, combined with experience implementing Workflow orchestration engines, provides significant competitive advantage in job hunting amid the current explosion in AI position demand.

Key Takeaways

PNPM MonoRepo + TurboRepo is the standard engineering architecture for big tech AI full-stack projects, solving the difficulties of code reuse and high dependency synchronization costs in multi-repo setups
The complete chain for AI full-stack applications is: User → Frontend → NestJS Service → AI Core Package → Ollama Model Service, with five progressive layers forming a closed loop
Recommended tech stack: React + NestJS + PostgreSQL/Prisma + Docker + LangChain.js, combined with Ollama local deployment for multimodal conversations
Architectural thinking (module decomposition, sub-package planning, development pipeline design) is more important than AI coding tools themselves—it's the prerequisite for effective Vibe Coding
Core design of Workflow orchestration engines includes plugin-based node registration, execution context management, and strategy pattern validators, demonstrating practical application of advanced design patterns