Spring AI + MCP Protocol in Practice: Tool Calling, Security Authentication, and Enterprise Deployment

At Spring I/O 2026, James Ward and Maximilian Schellhorn from AWS delivered a talk on the deep integration of MCP (Model Context Protocol) with Spring AI. From protocol fundamentals to enterprise-grade scaling, from security authentication to context optimization, this presentation covered virtually every aspect of the MCP ecosystem. This article systematically distills the core takeaways to help Java developers quickly master MCP in practice.

Why We Need the MCP Protocol

Although Large Language Models (LLMs) possess powerful reasoning capabilities, they are fundamentally stateless—they cannot proactively fetch real-time information or directly execute external operations. Building a truly functional AI Agent requires three core components: memory (conversation tracking and context management), knowledge (additional documents and data), and tools (the ability to fetch information and perform actions on demand).

The basic flow of Tool Calling isn't complicated: after a user asks a question, the LLM determines which tool to call, the application layer executes the actual call, and the result is returned to the LLM for final summarization. This mechanism originated from OpenAI's Function Calling concept introduced in 2023—the LLM doesn't execute functions directly but instead outputs a structured JSON object declaring which function it wants to call and what parameters to pass, leaving the application layer responsible for actual execution and returning results. This design completely decouples the LLM's reasoning capabilities from external systems' execution capabilities, forming the foundational paradigm for building AI Agents.

The problem is that if every developer has to manually write various tool functions (fetching weather, querying customers, managing tasks, etc.) in their own codebase, it's both inefficient and hard to maintain.

MCP was born to solve this problem—through a standardized JSON-RPC protocol, it enables any MCP client to connect to any MCP server, achieving plug-and-play tool integration. JSON-RPC is a lightweight remote procedure call protocol that uses JSON as its data format, defining standard structures for requests (containing method names and parameters) and responses (containing results or errors). MCP chose JSON-RPC over REST or gRPC primarily because of its bidirectional communication capability and minimal protocol overhead—a complete call only requires three fields: method, params, and id, making it naturally suited for cross-language, cross-platform communication scenarios.

MCP in Practice with Spring AI

Basic Tool Definition and Invocation

Creating an MCP server in Spring AI is remarkably concise. Simply add the spring-ai-starter-mcp-server-webflux dependency, then define tools using the @McpTool annotation. The Spring AI team created a Java version of the MCP SDK and wrapped Spring-specific features on top of it.

The presenters demonstrated a simple addition tool and verified its invocation in AWS Kuro coding assistant, IntelliJ IDE, and MCP Inspector respectively. MCP's standardization means the same server can seamlessly connect to different clients—this is the core value of the protocol.

MCP Tool Calling Demo

Advanced Tool Features Explained

Beyond basic tool calling, Spring AI's MCP implementation supports several advanced features:

Output Schema Definition: By setting generateOutputSchema = true, tools can return structured type objects rather than raw text, allowing clients to perform type validation accordingly.

Logging and Progress Push: By injecting McpSyncRequestContext, the server can send log messages and progress updates to clients, which is especially useful for long-running tasks.

Elicitation (Dynamic Prompting): This is a relatively new MCP feature. When certain parameters aren't always required but need user input under specific conditions (e.g., when a user hasn't set a preferred airline), the server can dynamically request input from the user through the elicitation mechanism, then continue executing the tool call.

Sampling: This is a more advanced feature that allows the MCP server to inversely call the LLM on the client side. When the server itself doesn't have LLM access but needs AI capabilities, it can borrow the host's LLM through the sampling mechanism.

Sampling Flow Demo

Resources, Prompts, and MCP Apps

Resources represent content such as files and database records, available in both static and dynamic forms. Unlike tools, which are model-controlled (the model decides when to call them), how resources are used depends on the application layer—for example, Claude Desktop displays remote resources to users as if they were local files.

Prompts (prompt templates) provide a shortcut for MCP servers to offer predefined prompt templates to users. This is valuable for guiding users through specific workflows, similar to a command-line --help.

MCP Apps is the latest extension capability, allowing HTML content to be rendered in Agents that support rich UI. The presenters demonstrated a shopping list application that uses MCP Resources to deliver HTML and triggers rendering through tool calls. Shopify showcased a similar product search scenario at the MCP Developer Summit—users can interact with external systems without leaving the chat environment.

Transport Methods and Remote Deployment

MCP's transport layer is decoupled from the protocol layer. Local mode communicates via stdio (standard input/output), while remote mode uses Streamable HTTP. For stateful operations requiring server-initiated push (such as sampling and elicitation), a Server-Sent Events (SSE) channel is established. SSE is a technology defined in the HTML5 specification for server-to-client unidirectional data push, implemented via HTTP long connections. Unlike WebSocket's full-duplex communication, SSE only supports server-to-client unidirectional streaming, but its advantages lie in protocol simplicity, automatic reconnection support, and native compatibility with HTTP infrastructure (proxies, load balancers, etc.). In MCP's Streamable HTTP transport mode, SSE handles server-initiated push scenarios (such as progress notifications and log messages), while client-to-server requests still use standard HTTP POST.

Local vs Remote MCP Server Comparison

The presenters explicitly recommended prioritizing remote MCP servers for enterprise environments, primarily because:

Security: Locally installing random MCP servers carries supply chain attack risks. A supply chain attack is when attackers inject malicious code by tampering with software dependencies, build tools, or distribution channels. In recent years, package management ecosystems like npm and PyPI have frequently encountered such attacks. In the MCP context, locally installed MCP servers are typically distributed as npm packages or binaries—if a popular MCP server package is hijacked and injected with malicious code, all developer machines that installed it could be compromised. Remote MCP servers significantly reduce the attack surface by centralizing the execution environment—users only need to trust the service provider, not the entire distribution chain.
Ease of use: Users only need to configure a URL to connect to the service
Maintainability: Centralized updates and deployment, no need to distribute local binaries
Scalability: Supports horizontal scaling and load balancing

Enterprise Security: OAuth Authentication Flow

MCP's authorization framework is based on the OAuth standard. OAuth 2.0 is the most widely used authorization framework on the internet, allowing third-party applications to access protected resources with user authorization without exposing user credentials. In MCP's security architecture, OAuth solves the problem of "who has permission to call which tools"—this is critical for enterprise deployment, as you wouldn't want just anyone executing database operations or triggering business processes through an MCP server.

The complete authentication flow works as follows: the client's initial request returns a 401 with a www-authenticate header pointing to protected resource metadata; the client obtains authorization server information; after completing client registration, it receives a JWT Token; finally, it uses the Token to access the MCP server. JWT (JSON Web Token) encodes user identity and permission information into a self-contained, signed JSON structure, allowing the server to verify token validity without querying a database. This stateless verification characteristic naturally aligns with MCP's distributed architecture.

MCP Security Authentication Architecture

Client registration is currently the most debated aspect. The presenters introduced three mainstream approaches:

Pre-registered clients: Best suited for internal enterprise environments, providing precise control over which clients can access the system
Dynamic client registration: Now considered insufficiently secure (public endpoints are easily abused)
Client ID metadata documents (new approach): The client ID itself is an HTTPS URL, and the authorization server can verify the client information within it. Spring Authorization Server is currently developing support for this.

Challenges and Solutions for Horizontal Scaling

Horizontally scaling MCP servers faces a core challenge: MCP protocol initialization requests and subsequent requests need to be routed to the same server instance. The traditional solution is Sticky Sessions, where a load balancer ensures all requests from the same client are routed to the same backend server, typically implemented through cookies or IP hashing. However, this approach conflicts with cloud-native architecture's elastic scaling philosophy: when an instance is scaled down or restarted, sessions bound to that instance are lost. In Serverless environments (such as AWS Lambda, Azure Functions), instance lifecycles are entirely platform-controlled, making Sticky Sessions nearly impossible to implement reliably.

There are currently three viable solution paths:

Wait for protocol evolution: There are already SEP proposals to eliminate the initialization call, making the protocol itself stateless
Enable stateless mode: Spring AI provides a stateless configuration option, sacrificing stateful features like sampling and elicitation
Infrastructure-level solutions: Such as Agent Core Runtime allocating independent sandboxes for each MCP session

The presenters noted that most current MCP servers are actually stateless, but as demand for long-running stateful interactions grows, both the protocol and infrastructure need to continue evolving to support these scenarios.

Context Efficiency Optimization: MCP Is Not Dead

Addressing the community narrative that "MCP is dead, use CLI instead," the presenters offered an in-depth rebuttal. When 10 MCP servers each carry 50 tools, tool descriptions can indeed consume a significant portion of the context window (reaching 31% in the demo), leading to decreased accuracy and increased costs.

Here it's important to understand the relationship between context windows and token economics. The context window is the maximum number of tokens an LLM can process in a single inference. Although modern models have expanded context windows to 128K or even larger, context usage directly impacts inference costs (billed per token) and response quality. Research shows that when irrelevant information is excessive in the context, the model's attention becomes diluted, causing key information to be overlooked—the so-called "Lost in the Middle" phenomenon. When 500 tool descriptions occupy 31% of the context window, not only do API costs per request increase significantly, but the model's accuracy in selecting the correct tool also noticeably decreases.

The CLI approach is viable for local development scenarios but has clear limitations: it cannot be used in remote enterprise deployments, lacks standardized authorization mechanisms, and LLMs using CLI often require multiple trial-and-error attempts (--help → try parameters → adjust parameters), actually consuming no less context.

The real solution is making the MCP protocol itself more efficient:

Tool filtering: Only provide the Agent with the subset of tools it actually needs, manageable through Spring's tool filtering mechanism or a centralized gateway
Progressive discovery: Use the "Tool Search Tool" pattern, exposing only a single search tool in the initial context and dynamically loading relevant tools on demand
Code Mode (experimental): Let the Agent write code to batch-execute tool calls, avoiding multiple round-trip requests and only retrieving final results

Conclusion

MCP is evolving from a simple tool calling protocol into a complete AI Agent integration standard. Spring AI's deep integration enables Java developers to build and deploy MCP servers with minimal barriers. From OAuth security authentication to horizontal scaling strategies, from stateful interactions to context efficiency optimization, the entire MCP ecosystem is rapidly maturing. For teams building AI Agents, now is an excellent time to dive deep into and adopt MCP.