Agent Middleware: Adding Interceptors to Model Calls

When building AI Agents, we often need to insert custom logic before and after model calls — such as logging, security checks, or even intercepting dangerous requests outright. This is where the Middleware mechanism comes into play. It acts as an interceptor, allowing you to flexibly inject any logic you want into the lifecycle of an Agent's model calls.

This article walks you through two practical examples — a logging middleware and a security check middleware — to help you thoroughly understand how Middleware works and how to use it.

What Is Agent Middleware?

Middleware is essentially an application of Aspect-Oriented Programming (AOP) within the Agent framework. It allows you to insert custom logic at two key points — Before Model and After Model — without modifying the Agent's core code.

Aspect-Oriented Programming (AOP) was first proposed by Gregor Kiczales' team at Xerox PARC in 1997. Its core idea is to separate cross-cutting concerns from the main business logic. In traditional software development, features like logging, transaction management, and security validation tend to be scattered across various modules, leading to code duplication and increased coupling. AOP addresses this by defining pointcuts and advice, enabling these cross-cutting concerns to be centrally managed and uniformly woven in. In AI Agent frameworks, model calls serve as natural pointcuts — each LLM inference represents a clear execution boundary, making it ideal for inserting additional logic before and after. This shares the same design philosophy as interceptors in Java Spring, middleware in Python Django, and the middleware chain in Node.js Express — just different implementations of the same concept.

The benefits of this design are obvious:

Decoupling: Cross-cutting concerns like logging and security checks are separated from business logic
Composability: Multiple Middlewares can be stacked together without interfering with each other
Flexible control: A Middleware can act as a mere "Observer" (logging) or as a "Guardian" (intercepting requests)

Example 1: Logging Middleware

The goal of the first Middleware is simple — record key information before and after model calls to facilitate debugging and tracking conversation progress.

Before Model: Logging Before the Call

Define a class that inherits from AgentMiddleware and implement the before_model method. Before the model call, print the length of the current message list so you can clearly see which round of conversation you're on.

Before Model triggers before the model call

There's a key detail here: the return type is None. Because the logging middleware is just an "Observer" — it doesn't need to intervene in any process. Returning None tells the framework: "I'm just watching. Carry on."

After Model: Logging After the Call

The after_model method executes after the model responds. We can grab the last message, truncate it to the first 50 characters, and print it for a quick preview of the model's response.

Grab the last message and print the first 50 characters

This approach is extremely useful during debugging, allowing you to quickly understand the model's behavior without opening full logs.

Example 2: Security Check Middleware

The second Middleware goes beyond just "watching" — it's a true Guardian, responsible for intercepting dangerous operations.

The Key Difference: Change in Return Type

It also inherits from AgentMiddleware, but the focus is on the before_model method implementation. The biggest difference from the logging middleware is: the return type is Dict | None, not just None.

Return type is Dict or None

Why this design? Because the security check middleware may need to actively intervene in the process. When it detects a problem, it needs to return a dictionary to tell the framework what to do. This design pattern is known in computer science as "Short-Circuit Evaluation" or "Early Return." In the context of an Agent framework, this means the middleware can completely bypass the LLM inference process and directly inject a preset response. This capability is critically important in production environments: on one hand, it prevents sensitive content from being sent to external APIs (reducing data leak risks); on the other hand, it saves unnecessary API call costs (every LLM call incurs token fees). From an architectural perspective, this is similar to an HTTP middleware returning 403 Forbidden to reject a request outright, rather than letting it reach the backend service.

Implementation of the Interception Logic

The specific check logic works as follows:

Get the content of the last user message
Check whether it contains sensitive keywords like "delete" or "danger"
If a match is found, return a dictionary containing control directives

The returned dictionary contains two key fields:

jump_to: "end": Tells the framework to skip the model call and end the current flow immediately
An AI Message: Injects a preset reply message informing the user that the operation has been terminated

Inject an AI Message telling the user the operation was terminated

The meaning of this return value is crystal clear: Don't call the model. End immediately. And add this safety notice to the conversation. If the check passes without issues, it returns None, meaning "No objections from me — proceed as normal."

Combining Middlewares: Giving Your Agent Multiple Capabilities

The final step is to combine both Middlewares and pass them to create_agent:

agent = create_agent(
    model=model,
    middleware=[LoggingMiddleware(), SecurityMiddleware()]
)

This way, the Agent simultaneously has both logging and security interception capabilities. On each model call, the request passes through every Middleware's before_model in sequence, and the response passes through every Middleware's after_model in sequence.

When multiple Middlewares are combined, their execution follows an order similar to the Onion Model. During the request phase (before_model), they execute in registration order; during the response phase (after_model), they execute in reverse order, forming a symmetric call stack. This pattern is very common in web frameworks — for example, Koa.js's middleware mechanism. In the example above, the before_model execution order is Logging → Security, while the after_model execution order is Security → Logging. This means that if the security check middleware intercepts a request during the before_model phase, the subsequent model call and all after_model logic will not execute. Understanding this execution order is crucial for correctly designing the registration order of your middlewares.

Two Roles of Middleware: Observer and Guardian

Through these two examples, we can summarize two typical roles for Middleware:

Role	Return Value	Typical Scenarios
Observer	Always returns `None`	Logging, performance monitoring, data analytics
Guardian	Conditionally returns `Dict`	Security checks, permission validation, content filtering

The elegance of this design pattern lies in: distinguishing middleware responsibilities through different return value types. Returning None means "let it pass"; returning Dict means "I'm taking control." This allows the framework to support vastly different use cases through a unified interface.

In real-world projects, you can extend this to create many more types of Middleware, such as:

Token counting middleware: Track the number of tokens consumed per call
Caching middleware: Return cached results directly for identical queries
Rate limiting middleware: Control model call frequency
Content moderation middleware: Perform compliance checks on model output

When deploying these middlewares in production, several key engineering concerns need to be addressed. First, performance overhead: each middleware adds call latency, and middlewares involving I/O operations (such as writing to log databases or calling external moderation APIs) should use asynchronous execution. Second, error handling: exceptions within a middleware itself should not crash the entire Agent — robust try-catch mechanisms and fallback strategies are essential. Third, observability: the execution state of middlewares themselves also needs to be monitored, forming meta-monitoring. Additionally, in multi-Agent collaboration scenarios (such as frameworks like CrewAI and AutoGen), middlewares also need to handle interception of inter-Agent message passing, further increasing complexity. Currently, mainstream Agent frameworks like LangChain's Callbacks and OpenAI Agents SDK's Hooks mechanism are essentially different implementation forms of the middleware concept.

The Middleware mechanism makes Agent capability extension modular and pluggable — it's an indispensable piece of infrastructure for building production-grade AI Agents.

Key Takeaways

Middleware is an implementation of AOP in Agent frameworks, inserting custom logic before and after model calls through before_model and after_model hooks
The Observer pattern (returning None) is suited for non-intrusive scenarios like logging and monitoring
The Guardian pattern (returning Dict) is suited for scenarios requiring flow interception, such as security checks and permission validation
Multiple middlewares execute in Onion Model order, where registration order determines execution priority
In production environments, attention must be paid to middleware performance overhead, error isolation, and observability