You can’t fix what you can’t see. And yet a surprising number of Spring Boot applications running in production have exactly zero monitoring beyond “the load balancer health check returns 200.” Then something goes wrong—memory leaks slowly, request latency climbs, a third-party integration starts timing out—and you’re debugging by reading logs and guessing.

Spring Boot ships with everything you need to build serious observability from day one. Actuator exposes health and metrics endpoints. Micrometer provides a vendor-neutral metrics API. Prometheus scrapes and stores those metrics. Grafana visualizes them. This guide walks through setting all of it up in a way that actually holds up in production.

What Actuator Gives You Out of the Box

Add the dependency:

Maven:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Gradle:

implementation 'org.springframework.boot:spring-boot-starter-actuator'

Restart the application and hit http://localhost:8080/actuator. You’ll get a list of enabled endpoints. By default, only /actuator/health and /actuator/info are exposed over HTTP. Everything else is available via JMX, or you can expose it explicitly.

The endpoints worth knowing:

EndpointWhat it tells you
/actuator/healthApplication health status and component statuses
/actuator/metricsAll registered Micrometer metrics
/actuator/metrics/{name}Specific metric with tags and values
/actuator/infoApplication metadata (version, build time, git commit)
/actuator/envActive configuration properties
/actuator/loggersCurrent log levels; changeable at runtime
/actuator/threaddumpJVM thread states snapshot
/actuator/heapdumpDownloadable heap dump
/actuator/prometheusMetrics in Prometheus exposition format

Most tutorials show you how to expose all of them with management.endpoints.web.exposure.include=*. Don’t do that in production. /actuator/env leaks your configuration. /actuator/heapdump lets anyone download your heap. /actuator/shutdown can kill your application if POST is enabled (it’s disabled by default, but still). Expose what you need, nothing more.

A sensible production configuration:

management:
  endpoints:
    web:
      exposure:
        include: health, info, metrics, prometheus
      base-path: /actuator
  endpoint:
    health:
      show-details: when-authorized
      show-components: when-authorized
    prometheus:
      enabled: true
  server:
    port: 8081  # Run management on a separate port

Running management on a separate port (8081) is the cleanest approach. Your application traffic stays on 8080; your monitoring infrastructure talks to 8081. You firewall 8081 to your monitoring subnet and your main port never exposes internal state.

Health Indicators

The /actuator/health endpoint aggregates health checks from every registered HealthIndicator. Spring Boot registers these automatically based on what’s on your classpath:

  • DataSourceHealthIndicator if you have a datasource configured — pings the database
  • RedisHealthIndicator for Redis
  • RabbitHealthIndicator for RabbitMQ
  • DiskSpaceHealthIndicator — always present, checks available disk space
  • PingHealthIndicator — always UP, used as the simplest liveness check

A response with details enabled looks like this:

{
  "status": "UP",
  "components": {
    "db": {
      "status": "UP",
      "details": {
        "database": "PostgreSQL",
        "validationQuery": "isValid()"
      }
    },
    "diskSpace": {
      "status": "UP",
      "details": {
        "total": 499963174912,
        "free": 312891580416,
        "threshold": 10485760,
        "exists": true
      }
    },
    "redis": {
      "status": "UP",
      "details": {
        "version": "7.2.4"
      }
    }
  }
}

If any component reports DOWN, the overall status is DOWN and the HTTP response code becomes 503. Load balancers and Kubernetes liveness probes use this status—so a database connectivity blip will pull your instance from the load balancer rotation if you’re not careful. More on that below.

Custom Health Indicators

You’ll often want health checks that reflect your own application’s state—is the external payment gateway reachable? Is there space in a queue? Is a background job still running?

@Component
public class PaymentGatewayHealthIndicator implements HealthIndicator {

    private final PaymentGatewayClient client;

    public PaymentGatewayHealthIndicator(PaymentGatewayClient client) {
        this.client = client;
    }

    @Override
    public Health health() {
        try {
            GatewayStatus status = client.ping();
            if (status.isOperational()) {
                return Health.up()
                        .withDetail("responseTime", status.getResponseTimeMs() + "ms")
                        .withDetail("region", status.getRegion())
                        .build();
            }
            return Health.down()
                    .withDetail("reason", status.getDegradedReason())
                    .build();
        } catch (Exception e) {
            return Health.down()
                    .withDetail("error", e.getMessage())
                    .build();
        }
    }
}

Spring picks this up automatically. It appears in /actuator/health under paymentGateway (the class name, minus the suffix, camelCased).

One thing to be careful about: if your custom health indicator performs a live check against an external dependency, a degraded dependency takes your application’s health endpoint DOWN, which can trigger Kubernetes to kill and restart your pod. That’s usually not what you want when a third-party payment API is having a bad morning.

The solution is Kubernetes-specific health groups. Spring Boot 3.x supports liveness and readiness groups out of the box:

management:
  endpoint:
    health:
      group:
        liveness:
          include: ping, diskSpace
        readiness:
          include: db, redis, paymentGateway

Your liveness probe hits /actuator/health/liveness—only fails if the application itself is broken. Your readiness probe hits /actuator/health/readiness—fails if the application can’t serve traffic. A failed readiness probe removes the pod from service without killing it. That’s the right behavior when an external dependency is down.

Micrometer: The Metrics Layer

Micrometer is the metrics abstraction that sits between your code and whatever monitoring backend you use—Prometheus, Datadog, CloudWatch, Dynatrace. You write counter.increment() and Micrometer handles the translation to whatever the backend expects.

Spring Boot auto-configures Micrometer when it’s on the classpath. If you add spring-boot-starter-actuator, you get micrometer-core transitively. Spring Boot then registers a bunch of metrics automatically:

  • JVM metrics — heap usage, GC pauses, thread counts, class loading
  • HTTP server metrics — request count, duration, status codes per endpoint
  • Datasource metrics — connection pool size, acquire time, timeouts
  • System metrics — CPU usage, file descriptors, uptime
  • Cache metrics — hit rate, miss rate, evictions (if you’re using Spring Cache)

Hit /actuator/metrics and you’ll see a list of every registered metric name. Hit /actuator/metrics/jvm.memory.used to see the current heap usage with tags broken down by memory area.

Prometheus Integration

Prometheus scrapes metrics by polling an HTTP endpoint at a configurable interval. Spring Boot exposes that endpoint via the Actuator Prometheus endpoint, but you need an additional dependency to get metrics in Prometheus exposition format:

Maven:

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Gradle:

implementation 'io.micrometer:micrometer-registry-prometheus'

No version needed—Spring Boot manages the version through the BOM.

With this on the classpath and prometheus in your exposure list, /actuator/prometheus starts returning metrics in Prometheus text format:

# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{area="heap",id="G1 Eden Space",} 2.9360128E7
jvm_memory_used_bytes{area="heap",id="G1 Old Gen",} 4.5088768E7
jvm_memory_used_bytes{area="heap",id="G1 Survivor Space",} 1048576.0
jvm_memory_used_bytes{area="nonheap",id="Metaspace",} 8.1620992E7

# HELP http_server_requests_seconds Duration of HTTP server request handling
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/api/orders",} 1423.0
http_server_requests_seconds_sum{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/api/orders",} 14.832

Now configure Prometheus to scrape it. A minimal prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'spring-boot-app'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['host.docker.internal:8081']  # Use your app's management host:port
    scrape_interval: 10s

If you’re running multiple instances, list them all under targets, or better, use service discovery:

scrape_configs:
  - job_name: 'spring-boot-app'
    metrics_path: '/actuator/prometheus'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: "true"
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

Add the annotation to your pod spec and Prometheus finds it automatically:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/path: "/actuator/prometheus"
  prometheus.io/port: "8081"

Application Metadata in Metrics

Tag your metrics with application metadata so you can filter by service, version, and environment in Prometheus queries. Configure this in application.yaml:

management:
  metrics:
    tags:
      application: ${spring.application.name}
      environment: ${spring.profiles.active:default}
      version: ${build.version:unknown}

Every metric Micrometer registers will include these tags. In Grafana you can then filter by application="order-service" or compare version="1.4.2" versus version="1.4.3" side by side.

Grafana Dashboards

Grafana connects to Prometheus as a data source and lets you build dashboards. Once connected, you have two options: build dashboards from scratch using PromQL, or import pre-built ones.

For Spring Boot specifically, import dashboard 4701 from grafana.com. It covers JVM metrics comprehensively—heap, GC, threads, class loading, CPU. This is the JVM Micrometer dashboard maintained by the Micrometer team. It works out of the box with the tags Micrometer generates.

For HTTP request metrics, dashboard 12900 (Spring Boot Statistics) gives you request rate, error rate, and latency percentiles by endpoint.

Useful PromQL queries to write yourself:

# Request rate per second (averaged over 5 minutes)
rate(http_server_requests_seconds_count{application="order-service"}[5m])

# 95th percentile latency
histogram_quantile(0.95,
  rate(http_server_requests_seconds_bucket{application="order-service"}[5m])
)

# Error rate (5xx responses)
rate(http_server_requests_seconds_count{
  application="order-service",
  status=~"5.."
}[5m])

# Heap utilization percentage
jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} * 100

# GC pause time per second
rate(jvm_gc_pause_seconds_sum[5m])

Build alerts in Grafana (or Prometheus alerting rules) around these. Error rate above 1% is usually worth paging someone. Heap utilization above 85% sustained is a warning sign. P95 latency doubling from baseline warrants investigation.

Custom Metrics

The auto-configured metrics cover infrastructure. Business metrics—orders placed per minute, payment failures, queue depth, active sessions—you have to add yourself.

Inject MeterRegistry anywhere you need to record metrics:

@Service
public class OrderService {

    private final Counter ordersPlaced;
    private final Counter ordersFailed;
    private final Timer orderProcessingTimer;
    private final AtomicInteger activeOrders;

    public OrderService(MeterRegistry registry) {
        this.ordersPlaced = Counter.builder("orders.placed")
                .description("Total number of orders successfully placed")
                .tag("region", "us-east")
                .register(registry);

        this.ordersFailed = Counter.builder("orders.failed")
                .description("Total number of failed order attempts")
                .register(registry);

        this.orderProcessingTimer = Timer.builder("orders.processing.duration")
                .description("Time taken to process an order end to end")
                .publishPercentiles(0.5, 0.95, 0.99)
                .register(registry);

        this.activeOrders = registry.gauge("orders.active",
                new AtomicInteger(0));
    }

    public Order placeOrder(OrderRequest request) {
        activeOrders.incrementAndGet();
        try {
            return orderProcessingTimer.record(() -> {
                Order order = processOrderInternal(request);
                ordersPlaced.increment();
                return order;
            });
        } catch (Exception e) {
            ordersFailed.increment();
            throw e;
        } finally {
            activeOrders.decrementAndGet();
        }
    }
}

The four metric types you’ll use most often:

Counter — monotonically increasing count. Orders placed, emails sent, errors thrown. Never goes down (reset to zero on restart). Use rate(counter[5m]) in Prometheus to get events per second.

Gauge — current value that can go up or down. Active connections, queue depth, cache size, thread pool utilization. Useful for point-in-time snapshots.

Timer — measures duration and count of events. Automatically tracks count, sum, and optionally percentiles. Best for measuring operation latency. Recording percentiles client-side (via publishPercentiles) is less accurate than histogram quantiles but cheaper.

DistributionSummary — like Timer but for any unit, not just duration. Useful for payload sizes, batch sizes, number of items processed.

A Cleaner Pattern for Multiple Metrics

If you have many metrics in a service, pulling them all through the constructor gets messy. Use @PostConstruct or implement MeterBinder:

@Component
public class OrderMetrics implements MeterBinder {

    private Counter ordersPlaced;
    private Counter ordersFailed;
    private Timer processingTimer;

    @Override
    public void bindTo(MeterRegistry registry) {
        this.ordersPlaced = Counter.builder("orders.placed")
                .description("Successfully placed orders")
                .register(registry);

        this.ordersFailed = Counter.builder("orders.failed")
                .description("Failed order attempts")
                .register(registry);

        this.processingTimer = Timer.builder("orders.processing.duration")
                .publishPercentiles(0.5, 0.95, 0.99)
                .register(registry);
    }

    public void recordOrderPlaced() { ordersPlaced.increment(); }
    public void recordOrderFailed() { ordersFailed.increment(); }
    public Timer.Sample startTimer() { return Timer.start(); }
    public void stopTimer(Timer.Sample sample) { sample.stop(processingTimer); }
}

Inject OrderMetrics into your services rather than MeterRegistry directly. This keeps metric definitions in one place and gives you a typed API instead of magic strings scattered through your codebase.

Securing the Management Endpoints

If you can’t run on a separate port, at minimum put the actuator endpoints behind Spring Security:

@Configuration
@EnableWebSecurity
public class ActuatorSecurityConfig {

    @Bean
    public SecurityFilterChain actuatorSecurity(HttpSecurity http) throws Exception {
        http
            .securityMatcher(EndpointRequest.toAnyEndpoint())
            .authorizeHttpRequests(auth -> auth
                .requestMatchers(EndpointRequest.to(HealthEndpoint.class)).permitAll()
                .requestMatchers(EndpointRequest.to(PrometheusEndpoint.class))
                    .hasRole("MONITORING")
                .anyRequest().hasRole("ADMIN")
            )
            .httpBasic(Customizer.withDefaults());
        return http.build();
    }
}

Then configure your Prometheus scrape config with credentials:

scrape_configs:
  - job_name: 'spring-boot-app'
    metrics_path: '/actuator/prometheus'
    basic_auth:
      username: prometheus
      password: ${PROMETHEUS_SCRAPE_PASSWORD}
    static_configs:
      - targets: ['app:8080']

Or use a bearer token if your auth setup uses JWT.

Production Checklist

Things that bite teams who skipped ahead to “working” without thinking about production:

Separate management port. management.server.port=8081. Keep operational endpoints off the main application port.

Expose only what you need. health, info, metrics, prometheus. Not env, not heapdump, not threaddump unless you have a specific reason.

Kubernetes health groups. Configure liveness and readiness groups separately. Don’t let a degraded external dependency kill your pod.

Tag your metrics with application, environment, version. Without these tags you can’t filter in Grafana when you have more than one service.

Set Prometheus scrape interval appropriate to your SLAs. 15s is fine for most things. If you’re tracking sub-second latency, consider 5s—but be aware this increases Prometheus storage and query load.

Alert on meaningful signals. Error rate, P95 latency, heap utilization, GC pause time. Don’t alert on every metric; you’ll get pager fatigue and start ignoring alerts.

Test your health checks. Kill your database connection, stop a dependency, fill your disk. Verify the health endpoint responds correctly and your orchestrator does the right thing.

Spring Boot monitoring isn’t complicated to set up. The hard part is deciding which metrics matter for your application and building alerts around them that actually signal problems rather than noise. Start with the auto-configured metrics, add business metrics for the operations that matter most to your domain, and build your dashboards around the questions you’d ask at 3am when something is wrong.