AI Code Review for Java Teams: A Practical Workflow

Code review is where good intentions go to die. You spend two days writing a feature, open the PR, and your brain immediately goes blind to every problem in it. You read your own code the way you meant to write it, not the way you actually wrote it. Your reviewer catches the obvious stuff — the naming, the missing null check — but the N+1 query buried in the service layer? That one ships to production.

I’ve been using AI as a first pass before submitting PRs for about six months now. It doesn’t replace review — nothing replaces having another engineer look at your code. But it consistently catches the category of bugs I miss when I’m too close to the work. This article covers the workflow I use with Java 21 and Spring Boot 3.x projects.

Why Self-Review Fails and AI Helps

The problem with reviewing your own code is that you know what you were trying to do. That knowledge fills in gaps. You skim the lines where the logic gets tricky because you already understand it. You skip over the error handling because you remember that you handled it. Except sometimes you didn’t.

AI doesn’t know what you intended. It reads the code as written. That’s both its limitation and its strength.

The prompt I use for pre-PR review is deliberately adversarial. I don’t ask for suggestions. I ask for criticism:

You are a senior Java developer doing a code review. Be direct and critical.
I'm submitting this for review and I want you to find problems before my
teammates do. Focus on: correctness, performance anti-patterns, exception
handling, thread safety, and resource management. Java 21, Spring Boot 3.2.

[paste code]

Framing it as “find problems before my teammates do” works better than “review my code” in my experience. It shifts the AI from suggesting improvements to actively looking for bugs.

Catching Java Anti-Patterns

The categories where AI consistently earns its keep in Java reviews:

N+1 Query Problems

Hibernate makes it easy to write code that looks fine and performs terribly. The classic pattern:

// This looks reasonable until you see the SQL it generates
public List<OrderSummary> getOrderSummaries() {
    List<Order> orders = orderRepository.findAll();
    return orders.stream()
        .map(order -> new OrderSummary(
            order.getId(),
            order.getCustomer().getName(),  // SELECT for each order
            order.getItems().size()          // SELECT for each order
        ))
        .toList();
}

One call to getOrderSummaries() on a table with 500 orders generates 1001 queries. The AI flags this immediately and suggests the fix — a JOIN FETCH or a @EntityGraph:

@Query("SELECT o FROM Order o JOIN FETCH o.customer JOIN FETCH o.items")
List<Order> findAllWithDetails();

Thread Safety Issues

Spring beans are singletons by default. Any mutable instance state is shared across all threads. This is a genuine footgun:

@Service
public class ReportService {
    // PROBLEM: mutable instance state in a singleton
    private List<String> currentBatch = new ArrayList<>();

    public void addToBatch(String item) {
        currentBatch.add(item);  // Not thread-safe
    }

    public List<String> flushBatch() {
        List<String> result = new ArrayList<>(currentBatch);
        currentBatch.clear();
        return result;
    }
}

Two concurrent requests hitting addToBatch will corrupt the list. AI catches this pattern and flags the singleton/mutable-state combination. The fix here is either making the state local to the method, using ThreadLocal, or switching to a different concurrency primitive.

Resource Leaks

Pre-Java 7 patterns still show up in codebases that have been around a while:

public String readConfig(String path) throws IOException {
    BufferedReader reader = new BufferedReader(new FileReader(path));
    // If an exception is thrown here, reader is never closed
    StringBuilder sb = new StringBuilder();
    String line;
    while ((line = reader.readLine()) != null) {
        sb.append(line);
    }
    reader.close();
    return sb.toString();
}

Modern Java uses try-with-resources. AI will point this out and often suggest switching to Files.readString(Path.of(path)) for the whole thing.

Swallowed Exceptions

The exception handling anti-pattern that causes the most production incidents:

try {
    processPayment(order);
} catch (Exception e) {
    log.error("Payment failed");
    // e is never logged, stack trace is gone forever
}

When this hits production and payments start silently failing, you have no stack trace to debug with. AI catches the missing e parameter in the log call every time.

Pulling PR Comments into AI Context

The other place AI saves significant time: addressing PR review comments faster.

After a reviewer leaves comments, you typically need to: read each comment, find the relevant code, understand the suggestion, figure out the best fix, and then check whether your fix is actually correct. That loop is slow.

Here’s the workflow I use with the GitHub CLI:

# Get all review comments on your PR
gh pr view 123 --json reviews --jq '.reviews[].body' > review-comments.txt

# Or get inline comments specifically
gh api repos/OWNER/REPO/pulls/123/comments \
  --jq '.[].body' >> review-comments.txt

# Get the current diff
gh pr diff 123 > pr-diff.txt

Then paste both into the AI with this prompt:

Here are PR review comments I need to address:

[paste review-comments.txt]

Here is the current code diff:

[paste pr-diff.txt]

For each review comment:
1. Identify which part of the diff it refers to
2. Explain whether the reviewer's concern is valid
3. Suggest the most appropriate fix
4. Flag any comments that conflict with each other

The “flag conflicts” part is useful when you have multiple reviewers with different opinions. The AI won’t resolve the conflict for you — that requires a conversation — but it surfaces it so you don’t accidentally implement one reviewer’s suggestion in a way that violates another’s.

Reviewing Architecture Decisions

Line-by-line review and architecture review are completely different tasks that require completely different prompts.

For line-by-line, you’re looking for bugs, anti-patterns, and style. For architecture, you’re asking whether the structure of the code is right. Same AI, different question.

When I’m evaluating whether a service is doing too much:

Here is a Spring Boot service class. Tell me if it violates the Single
Responsibility Principle and explain what each responsibility is. If it
should be split, suggest how.

[paste service class]

For evaluating a new module boundary decision:

We're splitting our monolith. This is the proposed service boundary for
the "Notifications" service. Here's what it owns:
- Sending emails and SMS
- Managing user notification preferences
- Storing notification history
- Subscribing users to event types

Is this a good boundary? What are the likely coupling problems?
What's missing that this service probably needs?

Architecture review prompts work better when you give the AI context about what you’re trying to achieve, not just the code. The AI can’t see your organization, your team’s skill set, or the business constraints. You have to bring that context to the conversation.

If your team is figuring out how to integrate AI into your Java development workflow — not just code review, but the whole build-test-deploy cycle — we should talk. We’ve been doing this with our own delivery work and can share what’s actually working.

Common Mistakes Java Developers Make with AI Code Review

Taking AI output at face value. The AI will confidently suggest fixes that are wrong for your context. It might suggest adding @Transactional to a method that’s already in a transaction. It might recommend a caching strategy that doesn’t fit your data access patterns. Read its output critically. If a suggestion doesn’t make sense to you, ask it to explain the reasoning — often the explanation reveals the assumption that doesn’t apply.

Pasting code without context. A method pulled out of its class is harder to review than the same method with its imports, its class-level annotations, and a sentence about what the feature does. The more context you provide, the better the output. I include the full class file for anything non-trivial, plus a sentence about what I changed and why.

Using it as a substitute for understanding. If the AI says your code has an N+1 problem and you just apply the fix without understanding why, you’ll write the same bug again next week. The AI output is most useful when it names the pattern and you go look it up. Understanding the underlying problem is what sticks.

Running it once and calling it done. AI review is a tool in the process, not the process. After addressing the AI’s findings, I still do a manual pass. After the manual pass, human reviewers still look at it. The AI catches things I miss; human reviewers catch things the AI misses; I catch things after sleeping on it.

Limitations: When You Still Need Human Review

There are categories of review where AI is genuinely not useful:

Business domain correctness. The AI doesn’t know your domain rules. It can’t tell you whether the calculation in your PricingService is correct according to your business requirements. It doesn’t know that your company applies discounts before taxes, not after.

Security-sensitive code. AI gives you a starting point for security review, but it’s not a security audit. Authentication flows, authorization logic, and anything handling credentials need a human who understands your threat model.

Historical context. Sometimes code looks wrong but was written that way intentionally to work around a known infrastructure limitation. The AI doesn’t know your history. Your senior engineers do. Don’t rip out the workaround because the AI flagged it.

Interpersonal dynamics. Code review is also a teaching moment and a team alignment exercise. A junior developer needs different feedback than a senior who’s been on the project for three years. The AI gives the same feedback to everyone. That’s sometimes what you want; sometimes it’s not.

Complex distributed system interactions. Whether a change to service A correctly handles failure modes in service B requires understanding both services, the network between them, and your retry and circuit-breaking strategy. That context rarely fits in a prompt.

Frequently Asked Questions

What is AI code review?

AI code review is using a large language model — tools like Claude, GitHub Copilot, or GPT-4 — to analyze code for bugs, anti-patterns, and style issues. It can be done interactively (pasting code into a chat interface) or through automated tools integrated into your CI/CD pipeline.

Is AI code review accurate for Java?

Reasonably accurate for well-known anti-patterns: N+1 queries, resource leaks, thread safety basics, missing null checks, and common exception handling mistakes. Less accurate for domain-specific correctness, complex concurrency patterns, and anything requiring understanding of your specific system architecture. Use it as a first pass, not a final verdict.

Can AI replace human code reviewers?

No. Human reviewers bring domain knowledge, historical context, understanding of team conventions, and interpersonal judgment that AI doesn’t have. AI is useful as a pre-review step to catch obvious problems before the human review, which makes the human review more efficient. It’s a tool that makes the process better, not a replacement for the process.

What’s the best AI tool for Java code review?

Depends what you’re optimizing for. Claude (via Claude.ai or the API) handles large context windows well, which matters when you’re pasting full service classes. GitHub Copilot integrates directly into your IDE and provides inline suggestions. SonarQube and Checkstyle are deterministic tools that catch specific anti-patterns reliably without the variability of an LLM. Most teams end up using a combination.

How do I use Claude for code review?

The most effective approach: paste the full class or file (not just the changed lines), include a one-sentence description of what you changed and why, then ask it to act as a senior Java developer finding problems before your PR goes out. The more context you give, the more useful the output. For Spring Boot projects, specify the Spring Boot version so it doesn’t suggest deprecated APIs.

Java Modernization Readiness Assessment

15 questions your team should answer before starting a migration. Takes 10 minutes. Could save you months.