Spring AI with Amazon Bedrock: Production Architecture for Enterprise Java

Q: Should I use the AWS SDK directly or Spring AI for Bedrock integration?

Spring AI if you're in a Spring Boot application. The raw BedrockRuntimeClient works, but you lose Spring's auto-configuration, dependency injection, testability with mock models, and the ability to swap providers without rewriting your service layer. Spring AI's ChatModel abstraction means your business logic doesn't care whether it's talking to Bedrock, OpenAI, or a local model.

Q: Does Spring AI support streaming responses from Amazon Bedrock?

Yes. The BedrockConverseStreamingChatModel supports streaming via the Converse API. Use ChatClient.stream() to get a Flux of responses for real-time UIs. This is particularly useful for chat interfaces where you want token-by-token display rather than waiting for the full response.

Q: How do I control AI costs with Spring AI and Amazon Bedrock?

Three mechanisms: set max-tokens per request to cap individual calls, use Spring AI's token metadata in response objects to track usage per request, and route low-complexity tasks to cheaper models. Monitor cacheWriteInputTokens and cacheReadInputTokens in response metadata to understand prompt caching savings. Set up CloudWatch alarms on Bedrock invocation metrics for budget enforcement.

Every tutorial on Spring AI and Amazon Bedrock starts the same way: add a dependency, hardcode an access key, call a model, print “Hello World.” Then the tutorial ends and you’re on your own for everything that actually matters in production—credential management, cost control, observability, error handling, testing.

This guide skips the hello world. If you’re an enterprise Java team evaluating Spring AI with Bedrock, or you’ve already built a prototype and need to harden it for production, this is the guide.

Why Spring AI Over the Raw AWS SDK

You can call Bedrock directly with the BedrockRuntimeClient from the AWS SDK. It works. But if you’re in a Spring Boot application, there are real reasons to prefer Spring AI’s abstraction layer:

Portability. Spring AI’s ChatModel interface is provider-agnostic. Your service layer calls chatModel.call() regardless of whether the backing model is Bedrock, Azure OpenAI, or a local Ollama instance. When your team decides to evaluate a different provider next quarter, you change configuration—not code.

Testability. You can inject a mock ChatModel in your tests without standing up an AWS connection. With the raw SDK, you’re either mocking the Bedrock client (which means your tests couple to AWS internals) or hitting real endpoints in CI (which means flaky tests and a growing AWS bill).

Spring ecosystem integration. Auto-configuration, health indicators, Micrometer metrics, structured logging—Spring AI plugs into the same observability stack you already have. The raw SDK gives you none of this by default.

Function calling. Spring AI’s tool abstraction lets you define functions as Spring beans and wire them into AI conversations without building the JSON schema mapping yourself.

The tradeoff is an additional abstraction layer. If you need low-level Bedrock features that Spring AI hasn’t exposed yet, you might need to drop down to the SDK for those specific calls. In practice, the Converse API covers the vast majority of enterprise use cases.

Project Setup

Dependencies

Add the Spring AI Bedrock Converse starter. If you’re using the Spring AI BOM (recommended), you don’t need to specify a version:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.3</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-bedrock-converse</artifactId>
    </dependency>
</dependencies>

Or with Gradle:

dependencies {
    implementation platform('org.springframework.ai:spring-ai-bom:1.0.3')
    implementation 'org.springframework.ai:spring-ai-starter-model-bedrock-converse'
}

Configuration

Here’s a production-oriented application.yml configuration:

spring:
  ai:
    bedrock:
      aws:
        region: us-east-1
        timeout: 30s
        connection-timeout: 5s
    converse:
      chat:
        options:
          model: us.anthropic.claude-sonnet-4-20250514
          temperature: 0.3
          max-tokens: 2048

A few things to note:

No access keys in config. In production, use IAM roles. The Bedrock auto-configuration uses the default AWS credential provider chain, which resolves instance profiles, ECS task roles, and EKS IRSA automatically. For local development, configure AWS SSO with aws configure sso and set AWS_PROFILE.
Low temperature for enterprise use cases. Most enterprise applications (document processing, classification, extraction) want deterministic output. Set temperature to 0.1–0.3, not the default 0.8.
Explicit timeout. The default 5-minute timeout is far too long for a synchronous API call. Set it based on your SLA. For chat, 30 seconds is usually reasonable; for batch processing, you might go longer.

Enable Model Access in Bedrock

Before your application can call a model, you need to enable it in the AWS Console. Go to Amazon Bedrock → Model access → Request access for the models you plan to use. This is a one-time step per model per region, but it catches people every time.

The Converse API: What Changed

Spring AI originally had per-model clients (one for Claude, one for Titan, one for Llama). The Converse API replaced all of them with a single unified client. This matters because:

One dependency, any model. Switch from Claude to Nova by changing a config property.
Consistent feature set. Tool calling, streaming, system messages, and multimodal input work the same way regardless of the underlying model.
Simpler upgrades. When AWS adds a new model, you don’t wait for a new Spring AI module.

If you’re looking at older tutorials that use spring-ai-bedrock-ai-anthropic-spring-boot-starter or similar per-model starters, those are deprecated. Use spring-ai-starter-model-bedrock-converse for all new development.

Building a Production Service Layer

Here’s how a real service layer looks—not a controller-calls-model demo, but a service that handles errors, provides metadata, and is testable:

@Service
public class DocumentSummaryService {

    private final ChatClient chatClient;

    public DocumentSummaryService(ChatModel chatModel) {
        this.chatClient = ChatClient.builder(chatModel)
            .defaultSystem("""
                You are a document summarizer for a financial services firm.
                Summarize the provided document in 3-5 bullet points.
                Focus on material facts, dates, and obligations.
                Do not include opinions or speculation.
                """)
            .build();
    }

    public SummaryResult summarize(String documentText) {
        ChatResponse response = chatClient.prompt()
            .user(documentText)
            .call()
            .chatResponse();

        String summary = response.getResult().getOutput().getText();

        // Extract token usage for cost tracking
        Usage usage = response.getMetadata().getUsage();

        return new SummaryResult(
            summary,
            usage.getPromptTokens(),
            usage.getCompletionTokens()
        );
    }
}

public record SummaryResult(
    String summary,
    long inputTokens,
    long outputTokens
) {}

Key decisions:

System prompt in the builder, not per-request. The system prompt defines the model’s role and constraints. Set it once when the service is constructed.
Return token counts. You’ll need these for cost dashboards and budget enforcement. Don’t discard response metadata.
Use ChatClient, not ChatModel directly. ChatClient is the higher-level API that supports fluent configuration, tool binding, and advisors. ChatModel is the lower-level interface you’d use for custom implementations.

Function Calling for Internal APIs

Function calling (tool use) is where AI in enterprise Java gets genuinely useful. Instead of the model guessing at data, it calls your internal APIs to get real answers.

@Service
public class OrderLookupTools {

    private final OrderRepository orderRepository;

    public OrderLookupTools(OrderRepository orderRepository) {
        this.orderRepository = orderRepository;
    }

    @Tool(description = "Look up an order by order ID. Returns order status, items, and shipping information.")
    public OrderDetails getOrder(
            @ToolParam(description = "The order ID, e.g. ORD-12345") String orderId) {
        return orderRepository.findByOrderId(orderId)
            .map(order -> new OrderDetails(
                order.getId(),
                order.getStatus().name(),
                order.getItems().size(),
                order.getShippingAddress().getCity()
            ))
            .orElseThrow(() -> new OrderNotFoundException(orderId));
    }
}

Then wire the tools into your chat client:

@Service
public class CustomerSupportService {

    private final ChatClient chatClient;
    private final OrderLookupTools orderTools;

    public CustomerSupportService(ChatModel chatModel, OrderLookupTools orderTools) {
        this.orderTools = orderTools;
        this.chatClient = ChatClient.builder(chatModel)
            .defaultSystem("You are a customer support agent. Use the available tools to look up real order data. Never guess at order status or tracking information.")
            .build();
    }

    public String handleQuery(String customerMessage) {
        return chatClient.prompt()
            .user(customerMessage)
            .tools(orderTools)
            .call()
            .content();
    }
}

The model decides when to call getOrder based on the conversation. If a customer asks “where’s my order ORD-12345?”, the model invokes the tool, gets real data, and responds with facts instead of hallucinations. The @Tool annotation and @ToolParam descriptions are critical—they become the function schema that the model uses to decide what to call and how.

Streaming for Chat Interfaces

For real-time chat UIs, streaming sends tokens as they’re generated instead of waiting for the complete response:

@RestController
@RequestMapping("/api/chat")
public class ChatController {

    private final ChatClient chatClient;

    public ChatController(ChatModel chatModel) {
        this.chatClient = ChatClient.create(chatModel);
    }

    @GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> streamChat(@RequestParam String message) {
        return chatClient.prompt()
            .user(message)
            .stream()
            .content();
    }
}

This uses Server-Sent Events (SSE), which works with any frontend framework. The response streams token-by-token over HTTP—the user sees text appear in real time rather than staring at a spinner for 5 seconds.

Cost Governance

AI costs in production can grow fast if you’re not paying attention. Here’s how to stay in control:

Model Selection by Use Case

Don’t use a single model for everything. Define profiles per use case:

spring:
  profiles:
    group:
      classification: classification-model
      summarization: summarization-model

---
spring:
  config:
    activate:
      on-profile: classification-model
  ai:
    bedrock:
      converse:
        chat:
          options:
            model: amazon.nova-micro-v1:0
            max-tokens: 100

---
spring:
  config:
    activate:
      on-profile: summarization-model
  ai:
    bedrock:
      converse:
        chat:
          options:
            model: us.anthropic.claude-sonnet-4-20250514
            max-tokens: 2048

Route cheap, high-volume tasks (intent classification, sentiment analysis) to smaller models. Reserve expensive models for tasks that need them.

Token Budget Tracking

Use the token counts from response metadata to build a cost dashboard:

@Component
public class AiCostTracker {

    private final MeterRegistry meterRegistry;

    public AiCostTracker(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
    }

    public void recordUsage(String model, String operation, Usage usage) {
        meterRegistry.counter("ai.tokens.input",
            "model", model,
            "operation", operation
        ).increment(usage.getPromptTokens());

        meterRegistry.counter("ai.tokens.output",
            "model", model,
            "operation", operation
        ).increment(usage.getCompletionTokens());
    }
}

Feed these metrics into your existing monitoring stack (Prometheus, CloudWatch, Datadog) and set alerts when spending exceeds thresholds.

Observability

AI calls are I/O-heavy and nondeterministic. You need visibility into what’s happening.

Structured Logging

Log every AI call with enough context to debug issues later:

@Aspect
@Component
public class AiCallLoggingAspect {

    private static final Logger log = LoggerFactory.getLogger(AiCallLoggingAspect.class);

    @Around("@within(org.springframework.stereotype.Service) && execution(* *(..))")
    public Object logAiCalls(ProceedingJoinPoint joinPoint) throws Throwable {
        // Only instrument methods that return AI-related types
        long start = System.currentTimeMillis();
        try {
            Object result = joinPoint.proceed();
            long duration = System.currentTimeMillis() - start;

            if (result instanceof ChatResponse response) {
                Usage usage = response.getMetadata().getUsage();
                log.info("AI call completed: method={} duration={}ms inputTokens={} outputTokens={}",
                    joinPoint.getSignature().getName(),
                    duration,
                    usage.getPromptTokens(),
                    usage.getCompletionTokens());
            }
            return result;
        } catch (Exception e) {
            long duration = System.currentTimeMillis() - start;
            log.error("AI call failed: method={} duration={}ms error={}",
                joinPoint.getSignature().getName(), duration, e.getMessage());
            throw e;
        }
    }
}

Health Indicators

Add a health check that verifies Bedrock connectivity at startup and during runtime:

@Component
public class BedrockHealthIndicator implements HealthIndicator {

    private final ChatModel chatModel;

    public BedrockHealthIndicator(ChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @Override
    public Health health() {
        try {
            chatModel.call("ping");
            return Health.up()
                .withDetail("provider", "bedrock")
                .build();
        } catch (Exception e) {
            return Health.down()
                .withDetail("provider", "bedrock")
                .withException(e)
                .build();
        }
    }
}

Security and Compliance

Enterprise AI integrations need guardrails. Here’s what matters:

IAM Least Privilege

Create a policy that only allows the models your application uses:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-*",
                "arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-micro-v1:0"
            ]
        }
    ]
}

Don’t grant bedrock:*. Scope it to the specific models and actions your application needs.

PII Filtering

Filter sensitive data before it hits the model. Bedrock Guardrails can handle some of this, but for enterprise compliance, do it in your application layer where you have full control:

@Component
public class PiiFilter {

    private static final Pattern SSN_PATTERN =
        Pattern.compile("\\b\\d{3}-\\d{2}-\\d{4}\\b");
    private static final Pattern EMAIL_PATTERN =
        Pattern.compile("\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z]{2,}\\b",
            Pattern.CASE_INSENSITIVE);

    public String redact(String input) {
        String redacted = SSN_PATTERN.matcher(input).replaceAll("[SSN REDACTED]");
        redacted = EMAIL_PATTERN.matcher(redacted).replaceAll("[EMAIL REDACTED]");
        return redacted;
    }
}

Call this before passing user input to the model. Log the redacted version, not the original.

Data Residency

Bedrock processes data in the AWS region you configure. For compliance (GDPR, SOC 2, HIPAA), ensure:

Your spring.ai.bedrock.aws.region matches your data residency requirements.
Not all models are available in all regions. Check the Bedrock console for model availability in your target region.
Cross-region inference profiles (e.g., us.anthropic.claude-*) route to the cheapest available region. For strict residency, use the region-specific model ID instead.

Testing

Unit Tests with Mock Models

Test your service logic without calling Bedrock:

@ExtendWith(MockitoExtension.class)
class DocumentSummaryServiceTest {

    @Mock
    ChatModel chatModel;

    DocumentSummaryService service;

    @BeforeEach
    void setUp() {
        service = new DocumentSummaryService(chatModel);
    }

    @Test
    void summarizeReturnsBulletPoints() {
        String expectedSummary = "• Revenue increased 15%\n• New product launched Q3";

        ChatResponse mockResponse = mock(ChatResponse.class, RETURNS_DEEP_STUBS);
        when(mockResponse.getResult().getOutput().getText()).thenReturn(expectedSummary);
        when(mockResponse.getMetadata().getUsage())
            .thenReturn(new Usage(150L, 50L));

        when(chatModel.call(any(Prompt.class))).thenReturn(mockResponse);

        SummaryResult result = service.summarize("Annual report text...");

        assertThat(result.summary()).contains("Revenue increased");
        assertThat(result.inputTokens()).isEqualTo(150L);
    }
}

Integration Tests with Testcontainers and LocalStack

For integration tests that verify the full call chain without hitting real AWS:

@SpringBootTest
@Testcontainers
class BedrockIntegrationTest {

    @Container
    static LocalStackContainer localstack = new LocalStackContainer(
        DockerImageName.parse("localstack/localstack:latest"))
        .withServices(LocalStackContainer.Service.STS);

    @DynamicPropertySource
    static void configureProperties(DynamicPropertyRegistry registry) {
        registry.add("spring.ai.bedrock.aws.region", () -> "us-east-1");
        registry.add("spring.ai.bedrock.aws.access-key", () -> "test");
        registry.add("spring.ai.bedrock.aws.secret-key", () -> "test");
    }

    // Test your configuration wiring, credential resolution,
    // and error handling paths here
}

For tests that need to verify actual model behavior (prompt quality, response format), use a dedicated AWS account with budget alerts and run those tests in a separate CI stage—not on every push.

Migration Path: Raw SDK to Spring AI

If you’re already using BedrockRuntimeClient directly, here’s how to migrate incrementally:

Add the Spring AI starter alongside your existing SDK usage. They can coexist.
Create a new service using ChatClient for one use case. Run it in parallel with the old implementation.
Compare outputs. Verify the Spring AI version produces equivalent results.
Migrate remaining call sites one at a time. Don’t do a big-bang rewrite.
Remove the raw SDK dependency once all call sites are migrated.

The key insight: you don’t have to migrate everything at once. Spring AI and the raw SDK use the same underlying AWS credentials and can coexist in the same application.

Common Mistakes

Hardcoding access keys in application.yml. Use IAM roles. Always. If your keys leak to a public repo, someone will run up a five-figure bill mining crypto before you notice.

Using the default 0.8 temperature for structured tasks. If you’re extracting data from documents or classifying text, you want deterministic output. Set temperature to 0.1–0.3.

Not setting max-tokens. The default is 500, which is fine for short responses but will silently truncate longer outputs. Set it explicitly based on your expected output length.

Ignoring token counts. Every response includes usage metadata. Track it from day one. Retrofitting cost monitoring after your CFO asks why the AWS bill doubled is not a fun afternoon.

One model for all use cases. A model that’s great at summarization is overkill for yes/no classification. Match model capability to task complexity, and your costs will thank you.

Frequently Asked Questions

Should I use the AWS SDK directly or Spring AI for Bedrock integration?

Spring AI if you’re in a Spring Boot application. The raw BedrockRuntimeClient works, but you lose Spring’s auto-configuration, dependency injection, testability with mock models, and the ability to swap providers without rewriting your service layer. Spring AI’s ChatModel abstraction means your business logic doesn’t care whether it’s talking to Bedrock, OpenAI, or a local model.

Which Bedrock model should I use for enterprise Java applications?

It depends on the task. For high-volume classification or routing, use a smaller model like Amazon Nova Micro or Nova Lite to keep costs down. For complex reasoning, summarization, or code generation, use Claude Sonnet or Nova Pro. Define model selection per use case in your configuration, not hard-coded across the application.

How do I handle credentials for Spring AI Bedrock in production?

Never use access keys in production. Use IAM roles attached to your compute—ECS task roles, EKS service accounts with IRSA, or EC2 instance profiles. Spring AI’s Bedrock auto-configuration uses the default AWS credential chain, which resolves IAM roles automatically. For local development, use AWS SSO with named profiles.

Does Spring AI support streaming responses from Amazon Bedrock?

Yes. Use ChatClient.stream() to get a Flux<String> of responses for real-time UIs. This works through the Converse API and is particularly useful for chat interfaces where you want token-by-token display rather than waiting for the full response.

How do I control AI costs with Spring AI and Amazon Bedrock?

Three mechanisms: set max-tokens per request to cap individual calls, track token usage via response metadata for cost dashboards, and route low-complexity tasks to cheaper models. Set up CloudWatch alarms on Bedrock invocation metrics for budget enforcement.

For a broader look at how AI fits into Java development workflows beyond model integration, see our guide on AI code review for Java teams and our walkthrough of AI-assisted Java development with Claude Code.

Java Modernization Readiness Assessment

15 questions your team should answer before starting a migration. Takes 10 minutes. Could save you months.