Spring AI tutorial: Get began with Spring AI

08 December 2025

15

Use the next context to reply the consumer's query.
If the query can't be answered from the context, state that clearly.

Context:
{context}

Query:
{query}

Then I created a brand new SpringAIRagService:

package deal com.infoworld.springaidemo.service;

import java.util.Record;
import java.util.Map;
import java.util.stream.Collectors;

import org.springframework.ai.chat.consumer.ChatClient;
import org.springframework.ai.chat.immediate.Immediate;
import org.springframework.ai.chat.immediate.PromptTemplate;
import org.springframework.ai.doc.Doc;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.stereotype.Service;

@Service
public class SpringAIRagService {
    @Worth("classpath:/templates/rag-template.st")
    non-public Useful resource promptTemplate;
    non-public closing ChatClient chatClient;
    non-public closing VectorStore vectorStore;

    public SpringAIRagService(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
        this.chatClient = chatClientBuilder.construct();
        this.vectorStore = vectorStore;
    }

    public String question(String query) {
        SearchRequest searchRequest = SearchRequest.builder()
                .question(query)
                .topK(2)
                .construct();
        Record<Doc> similarDocuments = vectorStore.similaritySearch(searchRequest);
        String context = similarDocuments.stream()
                .map(Doc::getText)
                .accumulate(Collectors.becoming a member of("n"));

        Immediate immediate = new PromptTemplate(promptTemplate)
                .create(Map.of("context", context, "query", query));

        return chatClient.immediate(immediate)
                .name()
                .content material();
    }
}

The SpringAIRagService wires in a ChatClient.Builder, which we use to construct a ChatClient, together with our VectorStore. The question() methodology accepts a query and makes use of the VectorStore to construct the context. First, we have to construct a SearchRequest, which we do by:

Invoking its static builder() methodology.
Passing the query because the question.
Utilizing the topK() methodology to specify what number of paperwork we wish to retrieve from the vector retailer.
Calling its construct() methodology.

On this case, we wish to retrieve the highest two paperwork which might be most much like the query. In follow, you’ll use one thing bigger, equivalent to the highest three or high 5, however since we solely have three paperwork, I restricted it to 2.

Subsequent, we invoke the vector retailer’s similaritySearch() methodology, passing it our SearchRequest. The similaritySearch() methodology will use the vector retailer’s embedding mannequin to create a multidimensional vector of the query. It would then evaluate that vector to every doc and return the paperwork which might be most much like the query. We stream over all related paperwork, get their textual content, and construct a context String.

Subsequent, we create our immediate, which tells the LLM to reply the query utilizing the context. Observe that it is very important inform the LLM to make use of the context to reply the query and, if it can’t, to state that it can’t reply the query from the context. If we don’t present these directions, the LLM will use the information it was skilled on to reply the query, which implies it should use info not within the context we’ve offered.

Lastly, we construct the immediate, setting its context and query, and invoke the ChatClient. I added a SpringAIRagController to deal with POST requests and cross them to the SpringAIRagService:

package deal com.infoworld.springaidemo.internet;

import com.infoworld.springaidemo.mannequin.SpringAIQuestionRequest;
import com.infoworld.springaidemo.mannequin.SpringAIQuestionResponse;
import com.infoworld.springaidemo.service.SpringAIRagService;

import org.springframework.http.ResponseEntity;
import org.springframework.internet.bind.annotation.PostMapping;
import org.springframework.internet.bind.annotation.RequestBody;
import org.springframework.internet.bind.annotation.RestController;

@RestController
public class SpringAIRagController {
    non-public closing SpringAIRagService springAIRagService;

    public SpringAIRagController(SpringAIRagService springAIRagService) {
        this.springAIRagService = springAIRagService;
    }

    @PostMapping("/springAIQuestion")
    public ResponseEntity<SpringAIQuestionResponse> askAIQuestion(@RequestBody SpringAIQuestionRequest questionRequest) {
        String reply = springAIRagService.question(questionRequest.query());
        return ResponseEntity.okay(new SpringAIQuestionResponse(reply));
    }
}

The askAIQuestion() methodology accepts a SpringAIQuestionRequest, which is a Java report:

package deal com.infoworld.springaidemo.mannequin;

public report SpringAIQuestionRequest(String query) {
}

The SpringAIQuestionRequest returns a SpringAIQuestionResponse:

package deal com.infoworld.springaidemo.mannequin;

public report SpringAIQuestionResponse(String reply) {
}

Now restart your utility and execute a POST to /springAIQuestion. In my case, I despatched the next request physique:

{
    "query": "Does Spring AI help RAG?"
}

And obtained the next response:

{
    "reply": "Sure. Spring AI explicitly helps Retrieval Augmented Era (RAG), together with chat reminiscence, integrations with main vector shops, a conveyable vector retailer API with metadata filtering, and a doc injection ETL framework to construct RAG pipelines."
}

As you possibly can see, the LLM used the context of the paperwork we loaded into the vector retailer to reply the query. We are able to additional check whether or not it’s following our instructions by asking a query that isn’t in our context:

{
    "query": "Who created Java?"
}

Right here is the LLM’s response:

{
    "reply": "The offered context doesn't embrace details about who created Java."
}

This is a crucial validation that the LLM is barely utilizing the offered context to reply the query and never utilizing its coaching information or, worse, making an attempt to make up a solution.

Conclusion

This text launched you to utilizing Spring AI to include massive language mannequin capabilities into Spring-based purposes. You possibly can configure LLMs and different AI applied sciences utilizing Spring’s customary utility.yaml file, then wire them into Spring elements. Spring AI offers an abstraction to work together with LLMs, so that you don’t want to make use of LLM-specific SDKs. For skilled Spring builders, this whole course of is much like how Spring Knowledge abstracts database interactions utilizing Spring Knowledge interfaces.

On this instance, you noticed how you can configure and use a big language mannequin in a Spring MVC utility. We configured OpenAI to reply easy questions, launched immediate templates to externalize LLM prompts, and concluded by utilizing a vector retailer to implement a easy RAG service in our instance utility.

Spring AI has a sturdy set of capabilities, and we’ve solely scratched the floor of what you are able to do with it. I hope the examples on this article present sufficient foundational data that can assist you begin constructing AI purposes utilizing Spring. As soon as you’re snug with configuring and accessing massive language fashions in your purposes, you possibly can dive into extra superior AI programming, equivalent to constructing AI brokers to enhance your corporation processes.

Learn subsequent: The hidden abilities behind the AI engineer.

Spring AI tutorial: Get began with Spring AI

Conclusion

Related Articles

GraphRAG allows extra context-aware and verifiable responses from LLMs

Report: FinOps priorities are shifting left and increasing

This week in AI updates: Claude Sonnet 4.6, Gemini 3.1 Professional, and extra (February 20, 2026)

LEAVE A REPLY Cancel reply

Latest Articles

GraphRAG allows extra context-aware and verifiable responses from LLMs

Report: FinOps priorities are shifting left and increasing

This week in AI updates: Claude Sonnet 4.6, Gemini 3.1 Professional, and extra (February 20, 2026)

Moderne provides Python help to its OpenRewrite code refactoring platform

Inside China’s Nice Firewall with Jackson Sippe