How to Build Multi-Agent AI Systems in 3 Days: AWS Strands

Three days. That’s all it took to go from building a deep research agent to creating sophisticated AWS Strands multi-server agents orchestrating three specialized MCP servers. Using SAP Generative AI Hub with Amazon Bedrock, we built a financial intelligence demo that reduces analysis time by 30%. Not because the technology is simple, but because the right architectural decisions; combining AWS Strands agent orchestration with enterprise platforms, make complex scenarios tractable.

When Michelle Mei-Li Pfister and I sat down to create our Devtoberfest session on building multi-tool research agents, we wanted to showcase something real. Not a toy example destined to gather dust. We wanted to demonstrate how SAP’s Generative AI Hub integration with AWS Bedrock, combined with the AWS Strands SDK, enables enterprise developers to build production-grade agentic systems, fast.

The SAP Generative AI Hub in SAP AI Core provides enterprise-grade access to leading foundation models. By integrating with Amazon Bedrock to deliver models like Anthropic’s Claude 3.5 and Amazon Titan through a unified interface, it centrally enforces content filtering, SAP-specific risk mitigation, and safety guardrails across the SAP ecosystem.

📊 Demo Performance Snapshot

Financial Analysis Time Reduction: Our demo system can reduce comprehensive financial analysis and report generation across multiple systems by approximately 30%, with individual stock analysis showing 10-20% efficiency gains.

Orchestration Efficiency: AWS Strands SDK’s automatic metrics tracking enables monitoring of token usage, latency, tool execution times, and success rates—essential for production readiness assessment.

The real story isn’t just the tech stack. It’s what happens when you combine proven enterprise platforms with modern agent orchestration patterns.

Watch the Full Session

Special thanks to Michelle Mei-Li Pfister for co-creating this session, and to Nora von Thenen for championing this work and making this Devtoberfest session possible.

The Foundation: Deep Research with Tavily and AWS Strands

Our first notebook demonstrates a research agent that can search, extract, crawl, and synthesize information from the web. By leveraging the Tavily API for web intelligence and AWS Strands Agents SDK, we created an agent that orchestrates multiple capabilities:

deep_researcher_agent = Agent(
    model=bedrock_model,
    system_prompt=SYSTEM_PROMPT,
    tools=[
        web_search,
        web_extract,
        web_crawl,
        format_research_response,
    ],
)

💡 Key Insight

AWS Strands takes a model-driven approach to building AI agents. Rather than defining complex workflows, you provide a model, system prompt, and tools. The framework embraces state-of-the-art model capabilities to plan, chain thoughts, call tools, and reflect. This shifts complexity from code into the LLM’s weights.

Multiple teams at AWS use Strands for production AI agents, including Amazon Q Developer, AWS Glue, and VPC Reachability Analyzer. When internal AWS teams trust a framework for production workloads, that’s a signal worth noting.

The Research Agent Architecture

The system prompt guides behavior without constraining it. Like the architectural patterns in building web services on AWS, we define clear interfaces while allowing flexibility in implementation.

What makes this interesting? The agent reasons about which tools to use and when. It adapts its research strategy based on what it discovers. This emergent behavior comes from the model’s reasoning capabilities, not from explicit programming.

AWS Strands Built-In Observability

One of AWS Strands’ production-ready features is comprehensive observability using OpenTelemetry standards. The framework automatically tracks:

Metric Category	What It Tracks	Production Value
Token Usage	Input tokens, output tokens, and total token consumption	Cost optimization and budget control
Performance Metrics	Latency and execution time measurements	SLA compliance monitoring
Tool Usage	Call counts, success rates, and execution times	Reliability and failure detection
Event Loop Cycles	Number of reasoning cycles and their durations	Agent efficiency optimization

Table 1: AWS Strands Built-in Observability Metrics

This telemetry integrates with AWS X-Ray for distributed tracing and Amazon CloudWatch for real-time monitoring. You can set up CloudWatch alarms on metrics like tool error rates or latency per agent call to alert operations teams of anomalies.

The Innovation: Multi-Server Financial Intelligence

After building the research agent, we had a working foundation. But financial analysis requires coordinating multiple specialized systems. That’s where the Model Context Protocol (MCP) enters the picture.

Architecture in Action: Four-Layer Intelligence

The diagram above illustrates our production-ready architecture. Enterprise users connect through an AWS Strands Agent powered by SAP GenAI Hub and Anthropic Claude, which orchestrates requests through the AWS Strands MCP Client. The MCP Session Manager maintains persistent connections across three specialized financial MCP servers; handling real-time data, document analysis, and risk analytics; before delivering comprehensive outputs including investment reports, risk matrices, and sentiment analysis to end users.

Understanding MCP: The Missing Standard

Anthropic open-sourced the Model Context Protocol in November 2024 to address a fundamental challenge: connecting AI assistants to data systems. Before MCP, each data source required custom integration. This created an “N×M” problem; every new model needed connectors to every data source.

MCP provides a universal standard. One protocol, any model, any data source. Major AI providers including OpenAI and Google DeepMind adopted it within months of the announcement.

🔌 MCP: The USB-C of AI Integration

Just as USB-C standardized device connectivity, MCP standardizes AI-data integration. Rather than building N×M custom connectors, you implement the protocol once and connect to any MCP-compatible system.

MCP uses a client-server architecture built on JSON-RPC 2.0. Servers expose three primitives: Tools (executable functions), Resources (structured data), and Prompts (instruction templates). Clients coordinate access to these capabilities on behalf of AI agents.

Three Specialized MCP Servers

For our demo, we built three specialized MCP servers, each handling distinct aspects of financial intelligence:

Server	Port	Implementation	Purpose	Key Tools
Financial Data	8001	FastAPI (Manual JSON-RPC)	Real-time market data	Stock quotes, company fundamentals, health scoring
Document Analysis	8002	FastMCP Framework	Sentiment analysis	PDF parsing, report analysis, metric extraction
Analytics & Reporting	8003	FastMCP Framework	Advanced analytics	Comparison charts, risk assessment, trend analysis

Table 2: Three-Server Financial Intelligence Architecture

Why Two Implementation Approaches?

We deliberately chose different implementations for our demo to demonstrate both approaches:

FastAPI (Manual): Full control over the JSON-RPC protocol. Perfect for learning MCP fundamentals or when you need precise control over message handling.
FastMCP Framework: Streamlined development with automatic protocol handling. Better for rapid prototyping and when you want to focus on tool logic rather than protocol details.

Let’s dive deeper into those 2 open source frameworks:

Aspect	Manual FastAPI (Port 8001)	FastMCP Framework (Ports 8002/8003)
Implementation Complexity	Full JSON-RPC 2.0 protocol implementation required	Automatic protocol handling via decorators
Code Lines for Basic Server	~150-200 lines	~50-75 lines
Development Speed	Slower, explicit control	3-4x faster time-to-market
Protocol Understanding	Deep MCP protocol knowledge required	Abstracted away, focus on business logic
Best For	Learning MCP fundamentals, custom requirements	Production deployments, rapid prototyping
Production Readiness	Requires additional error handling and logging	Built-in production features

Table 3: FastAPI vs FastMCP Implementation Comparison

Both approaches are production-viable. The choice depends on your specific requirements around control versus development velocity.

However, for a full end-to-end platform to develop and orchestrate MCP servers at scale, please review My AgentCore Guide!

The Session Manager Pattern

Managing connections to three MCP servers presented an interesting challenge. We needed persistent connections that work seamlessly across Jupyter cells without context manager complexity.

The solution? A custom MCPSessionManager that uses Python’s ExitStack to maintain persistent client connections:

mcp_manager = MCPSessionManager()

# Establish persistent connections to all servers
mcp_manager.start_sessions({
    "financial_data": "http://127.0.0.1:8001/mcp",
    "document_analysis": "http://127.0.0.1:8002/mcp",
    "analytics_reporting": "http://127.0.0.1:8003/mcp"
})

# Aggregate tools from all servers
all_financial_tools = mcp_manager.get_all_tools()

# Create unified agent with cross-server capabilities
financial_intelligence_agent = Agent(
    model=financial_model,
    tools=all_financial_tools,
    system_prompt=financial_expert_prompt
)

mcp_manager = MCPSessionManager()

# Establish persistent connections to all servers
mcp_manager.start_sessions({
    "financial_data": "http://127.0.0.1:8001/mcp",
    "document_analysis": "http://127.0.0.1:8002/mcp",
    "analytics_reporting": "http://127.0.0.1:8003/mcp"
})

# Aggregate tools from all servers
all_financial_tools = mcp_manager.get_all_tools()

# Create unified agent with cross-server capabilities
financial_intelligence_agent = Agent(
    model=financial_model,
    tools=all_financial_tools,
    system_prompt=financial_expert_prompt
)

🔑 Key Design Decision

The session manager eliminates connection boilerplate while maintaining enterprise requirements for connection pooling, error recovery, and audit logging. This pattern scales from demo notebooks to production deployments.

Cross-Server Orchestration in Action

The real power emerges when a single query triggers coordination across all three servers. When asked for comprehensive investment analysis, the agent automatically:

Fetches current stock data (Financial Data Server)
Analyzes sentiment from recent reports (Document Analysis Server)
Calculates risk metrics and generates visualizations (Analytics Server)
Synthesizes findings into an executive-ready report

No explicit orchestration logic. No hardcoded workflows. The agent reasons about which tools to use and coordinates across servers automatically.

Secure Enterprise Integration with SAP GenAI Hub

Throughout our demo, the SAP Generative AI Hub handles critical security and governance requirements. This isn’t just API access it’s a comprehensive orchestration layer that:

Enforces Content Filtering: Screens both input prompts and LLM outputs for inappropriate content
Implements Data Masking: Anonymizes personal and confidential information before sending to models
Provides Centralized Governance: Maintains consistent policies across the SAP ecosystem
Enables Compliance: Supports regulatory requirements through built-in safeguards

The Generative AI Hub sits between your application and Amazon Bedrock. This ensures data protection, regulatory compliance, and responsible AI usage across all SAP integrations. But never forget the basics of security at scale.

When to Use This Architecture

✅ Yes, Build This When You:

Coordinate 3+ specialized systems or data sources
Need rapid prototyping with a clear path to production
Value maintainability and model-driven flexibility
Have diverse tool capabilities that benefit from specialization
Want to leverage standard protocols (MCP) for future extensibility
Need built-in observability for production monitoring

What’s Next

Our demo system demonstrates cross-server orchestration at scale. The real power emerges when you consider enterprise possibilities:

SAP Integration: Connect MCP servers directly to SAP business processes
Multi-Tenant Deployments: Serve multiple organizations from shared MCP infrastructure
Hybrid Architectures: Combine on-premises SAP systems with cloud-native AI services
Domain-Specific Agents: Build specialized agents for procurement, finance, HR; each orchestrating their own MCP ecosystem

Try It Yourself

Both notebooks are available in our GitHub repository. The progression from research agent to multi-server orchestration provides a practical learning path for enterprise developers.

Key takeaways:

Start Simple: Build single-agent systems first (notebook 05)
Learn the Protocol: Implement manual MCP to understand fundamentals
Scale Thoughtfully: Use FastMCP and session managers for production (notebook 06)
Secure by Design: Implement proper authentication, authorization, and audit logging
Monitor Everything: Leverage AWS Strands’ built-in observability for production readiness

Frequently Asked Questions

What is AWS Strands Agents SDK?

AWS Strands is an open-source SDK that takes a model-driven approach to building AI agents. Rather than requiring complex workflow definitions, you provide a model, system prompt, and tools. The framework leverages modern LLM capabilities for planning, reasoning, and tool coordination. AWS teams including Amazon Q Developer and AWS Glue use Strands in production.

How does MCP protocol enable agent orchestration?

MCP (Model Context Protocol) provides a standard for connecting AI assistants to data systems. Instead of building custom integrations for each data source, you implement the MCP protocol once. The protocol uses JSON-RPC 2.0 with a client-server architecture, enabling any MCP client to communicate with any MCP server. This eliminates the N×M integration problem.

What are the production requirements for multi-server agents?

Production multi-server agent systems require: comprehensive observability (traces, metrics, logs); persistent session management with error recovery; proper authentication and authorization across all servers; rate limiting and cost controls; audit logging for compliance; and integration with enterprise monitoring systems like CloudWatch or X-Ray. AWS Strands provides built-in observability using OpenTelemetry standards.

How does SAP GenAI Hub improve security?

SAP Generative AI Hub sits between your application and foundation models, providing centralized governance. It enforces content filtering on inputs and outputs, implements data masking for sensitive information, maintains consistent policies across the SAP ecosystem, and supports regulatory compliance requirements. This orchestration layer ensures responsible AI usage across all SAP integrations.

The future of enterprise AI isn’t monolithic models doing everything. It’s distributed intelligence: specialized agents working together, orchestrated through open protocols, secured by enterprise platforms.

Three days proved it’s possible. Now it’s your turn.

Abraham Arellano Tavara is a Senior Solutions Architect at AWS, specializing in enterprise AI solutions and SAP integrations. Connect on LinkedIn for more insights on building production-grade agentic systems.

Building Agentic Orchestration with AWS Strands and SAP GenAI Hub