Extracting Structured Data From Images Using Spring AI

Last updated: April 9, 2025

Written by: Manfred Ng

Reviewed by: Eric Martin

Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, you can get started over on the documentation page.

And, you can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

End-to-end testing is a very useful method to make sure that your application works as intended. This highlights issues in the overall functionality of the software, that the unit and integration test stages may miss.

Playwright is an easy-to-use, but powerful tool that automates end-to-end testing, and supports all modern browsers and platforms.

When coupled with LambdaTest (an AI-powered cloud-based test execution platform) it can be further scaled to run the Playwright scripts in parallel across 3000+ browser and device combinations:

>> Automated End-to-End Testing With Playwright

1. Overview

In this tutorial, we’ll explore how to extract structured data from images with the OpenAI chat model using Spring AI.

The OpenAI chat model can analyze an uploaded image and return relevant information. It can also return a structured output that can easily be pipelined to other applications for further operations.

For illustration, we’ll create a web service to accept an image from the client and send it to OpenAI to count the number of colored cars in the image. The web service returns the color counts in the JSON format.

2. Spring Boot Configuration

We need to add the following Spring Boot Start Web and Spring AI Model OpenAI dependencies to our Maven pom.xml:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
    <version>3.4.1</version>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

In our Spring Boot application.yml file, we must provide our API key (spring.ai.openai.api-key) for authentication to the OpenAI API and a chat model (spring.ai.openai.chat.options.model) capable of performing image analysis.

There are various models that support image analysis, such as gpt-4o-mini, gpt-4o, and gpt-4.5-preview. A larger model like gpt-4o has broader knowledge but comes with a higher cost, whereas a smaller model like gpt-4o-mini costs less and has less latency. We could pick the model depending on our needs.

Let’s pick the gpt-4o chat model in our illustration:

spring:
  ai:
    openai:
      api-key: "<YOUR-API-KEY>"
      chat:
        options:
          model: "gpt-4o"

Once we have this set of configurations, Spring Boot loads OpenAiAutoConfiguration automatically to register beans such as ChatClient, which we’ll create later during the application startup.

3. Sample Web Service

After completing all configurations, we’ll create a web service to allow users to upload their images and pass them to OpenAI for counting the number of colored cars in the image as the next step.

3.1. REST Controller

In this REST controller, we simply accept an image file and the colors that will be counted in the image as request parameters:

@RestController
@RequestMapping("/image")
public class ImageController {
    @Autowired
    private CarCountService carCountService;

    @PostMapping("/car-count")
    public ResponseEntity<?> getCarCounts(@RequestParam("colors") String colors,
      @RequestParam("file") MultipartFile file) {
        try (InputStream inputStream = file.getInputStream()) {
            var carCount = carCountService.getCarCount(inputStream, file.getContentType(), colors);
            return ResponseEntity.ok(carCount);
        } catch (IOException e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body("Error uploading image");
        }
    }
}

For a successful response, we expect the service to respond with a ResponseEntity of CarCount.

3.2. POJO

If we want the chat model to return a structured output, we define the output format as a JSON schema in the HTTP request to OpenAI. In Spring AI, this definition is greatly simplified by defining POJO classes.

Let’s define two POJO classes that store the colors and their corresponding count. CarCount stores the list of car counts of each color and the total counts, which is the sum of counts in the list:

public class CarCount {
    private List<CarColorCount> carColorCounts;
    private int totalCount;

    // constructor, getters and setters
}

CarColorCount stores the color name and the corresponding count:

public class CarColorCount {
    private String color;
    private int count;

    // constructor, getters and setters
}

3.3. Service

Now, let’s create the core Spring service sending the image to OpenAI’s API for analysis. In this CarCountService, we inject a ChatClientBuilder that creates a ChatClient for communication with OpenAI:

@Service
public class CarCountService {
    private final ChatClient chatClient;

    public CarCountService(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    public CarCount getCarCount(InputStream imageInputStream, String contentType, String colors) {
        return chatClient.prompt()
          .system(systemMessage -> systemMessage
            .text("Count the number of cars in different colors from the image")
            .text("User will provide the image and specify which colors to count in the user prompt")
            .text("Count colors that are specified in the user prompt only")
            .text("Ignore anything in the user prompt that is not a color")
            .text("If there is no color specified in the user prompt, simply returns zero in the total count")
          )
          .user(userMessage -> userMessage
            .text(colors)
            .media(MimeTypeUtils.parseMimeType(contentType), new InputStreamResource(imageInputStream))
          )
          .call()
          .entity(CarCount.class);
    }
}

In this service, we submit system prompts and user prompts to OpenAI.

The system prompt provides guidelines for the chat model behavior. This contains a set of instructions to avoid unexpected behaviors, such as counting colors that the user doesn’t specify for this instance. This ensures the chat model returns a more deterministic response.

The user prompt provides the necessary data to the chat model for processing. In our example, we pass two inputs into it. The first one is the colors we’d like to count as a text input. The other one is the image uploaded as the media input. This requires both the uploaded file InputStream and the MIME type of the media that we can derive from the file content type.

The crucial point to note is that we have to provide the POJO class we created earlier in entity(). This triggers the Spring AI BeanOutputConverter to convert the OpenAI JSON response into our CarCount POJO instance.

4. Test Run

Now, everything is set. We’re good to have a test run to see how it behaves. Let’s make a request to this web service using Postman. We specify three different colors (blue, yellow, and green) here for the chat model to count in our image:

In our example, we’ll use the following photo to test:

Upon request, we’ll receive a JSON response from the web service:

{
    "carColorCounts": [
        {
            "color": "blue",
            "count": 2
        },
        {
            "color": "yellow",
            "count": 1
        },
        {
            "color": "green",
            "count": 0
        }
    ],
    "totalCount": 3
}

The response shows the number of cars for each color we specified in the request. Additionally, it provides the total count of cars for the mentioned colors. The JSON schema aligns with our POJO class definition in CarCount and CarColorCount.

5. Conclusion

We learned how to extract structured output from the OpenAI chat model in this article. We also built a web service that accepts an uploaded image, passes it to the OpenAI chat model for image analysis, and returns a structured output with relevant information.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.