
Baeldung Pro comes with both absolutely No-Ads as well as finally with Dark Mode, for a clean learning experience:
Once the early-adopter seats are all used, the price will go up and stay at $33/year.
Last updated: April 9, 2025
In this tutorial, we’ll explore how to extract structured data from images with the OpenAI chat model using Spring AI.
The OpenAI chat model can analyze an uploaded image and return relevant information. It can also return a structured output that can easily be pipelined to other applications for further operations.
For illustration, we’ll create a web service to accept an image from the client and send it to OpenAI to count the number of colored cars in the image. The web service returns the color counts in the JSON format.
We need to add the following Spring Boot Start Web and Spring AI Model OpenAI dependencies to our Maven pom.xml:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>3.4.1</version>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
<version>1.0.0-M6</version>
</dependency>
In our Spring Boot application.yml file, we must provide our API key (spring.ai.openai.api-key) for authentication to the OpenAI API and a chat model (spring.ai.openai.chat.options.model) capable of performing image analysis.
There are various models that support image analysis, such as gpt-4o-mini, gpt-4o, and gpt-4.5-preview. A larger model like gpt-4o has broader knowledge but comes with a higher cost, whereas a smaller model like gpt-4o-mini costs less and has less latency. We could pick the model depending on our needs.
Let’s pick the gpt-4o chat model in our illustration:
spring:
ai:
openai:
api-key: "<YOUR-API-KEY>"
chat:
options:
model: "gpt-4o"
Once we have this set of configurations, Spring Boot loads OpenAiAutoConfiguration automatically to register beans such as ChatClient, which we’ll create later during the application startup.
After completing all configurations, we’ll create a web service to allow users to upload their images and pass them to OpenAI for counting the number of colored cars in the image as the next step.
In this REST controller, we simply accept an image file and the colors that will be counted in the image as request parameters:
@RestController
@RequestMapping("/image")
public class ImageController {
@Autowired
private CarCountService carCountService;
@PostMapping("/car-count")
public ResponseEntity<?> getCarCounts(@RequestParam("colors") String colors,
@RequestParam("file") MultipartFile file) {
try (InputStream inputStream = file.getInputStream()) {
var carCount = carCountService.getCarCount(inputStream, file.getContentType(), colors);
return ResponseEntity.ok(carCount);
} catch (IOException e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body("Error uploading image");
}
}
}
For a successful response, we expect the service to respond with a ResponseEntity of CarCount.
If we want the chat model to return a structured output, we define the output format as a JSON schema in the HTTP request to OpenAI. In Spring AI, this definition is greatly simplified by defining POJO classes.
Let’s define two POJO classes that store the colors and their corresponding count. CarCount stores the list of car counts of each color and the total counts, which is the sum of counts in the list:
public class CarCount {
private List<CarColorCount> carColorCounts;
private int totalCount;
// constructor, getters and setters
}
CarColorCount stores the color name and the corresponding count:
public class CarColorCount {
private String color;
private int count;
// constructor, getters and setters
}
Now, let’s create the core Spring service sending the image to OpenAI’s API for analysis. In this CarCountService, we inject a ChatClientBuilder that creates a ChatClient for communication with OpenAI:
@Service
public class CarCountService {
private final ChatClient chatClient;
public CarCountService(ChatClient.Builder chatClientBuilder) {
this.chatClient = chatClientBuilder.build();
}
public CarCount getCarCount(InputStream imageInputStream, String contentType, String colors) {
return chatClient.prompt()
.system(systemMessage -> systemMessage
.text("Count the number of cars in different colors from the image")
.text("User will provide the image and specify which colors to count in the user prompt")
.text("Count colors that are specified in the user prompt only")
.text("Ignore anything in the user prompt that is not a color")
.text("If there is no color specified in the user prompt, simply returns zero in the total count")
)
.user(userMessage -> userMessage
.text(colors)
.media(MimeTypeUtils.parseMimeType(contentType), new InputStreamResource(imageInputStream))
)
.call()
.entity(CarCount.class);
}
}
In this service, we submit system prompts and user prompts to OpenAI.
The system prompt provides guidelines for the chat model behavior. This contains a set of instructions to avoid unexpected behaviors, such as counting colors that the user doesn’t specify for this instance. This ensures the chat model returns a more deterministic response.
The user prompt provides the necessary data to the chat model for processing. In our example, we pass two inputs into it. The first one is the colors we’d like to count as a text input. The other one is the image uploaded as the media input. This requires both the uploaded file InputStream and the MIME type of the media that we can derive from the file content type.
The crucial point to note is that we have to provide the POJO class we created earlier in entity(). This triggers the Spring AI BeanOutputConverter to convert the OpenAI JSON response into our CarCount POJO instance.
Now, everything is set. We’re good to have a test run to see how it behaves. Let’s make a request to this web service using Postman. We specify three different colors (blue, yellow, and green) here for the chat model to count in our image:
In our example, we’ll use the following photo to test:
Upon request, we’ll receive a JSON response from the web service:
{
"carColorCounts": [
{
"color": "blue",
"count": 2
},
{
"color": "yellow",
"count": 1
},
{
"color": "green",
"count": 0
}
],
"totalCount": 3
}
The response shows the number of cars for each color we specified in the request. Additionally, it provides the total count of cars for the mentioned colors. The JSON schema aligns with our POJO class definition in CarCount and CarColorCount.
We learned how to extract structured output from the OpenAI chat model in this article. We also built a web service that accepts an uploaded image, passes it to the OpenAI chat model for image analysis, and returns a structured output with relevant information.