Introduction to Spring Batch

Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, you can get started over on the documentation page.

And, you can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

End-to-end testing is a very useful method to make sure that your application works as intended. This highlights issues in the overall functionality of the software, that the unit and integration test stages may miss.

Playwright is an easy-to-use, but powerful tool that automates end-to-end testing, and supports all modern browsers and platforms.

When coupled with LambdaTest (an AI-powered cloud-based test execution platform) it can be further scaled to run the Playwright scripts in parallel across 3000+ browser and device combinations:

>> Automated End-to-End Testing With Playwright

1. Overview

In this tutorial, we’ll look at a practical, code-focused intro to Spring Batch. Spring Batch is a processing framework designed for the robust execution of jobs.

Its current version, 5.2.0, supports Spring 6.2.0 and Java 17+.

Here are a few interesting and practical use cases of the framework.

How to Trigger and Stop a Scheduled Spring Batch Job

Explore three different ways to trigger and stop a scheduled Spring Batch job.

Conditional Flow in Spring Batch

Learn how to create Spring Batch jobs with conditional flow.

Testing a Spring Batch Job

Explore various approaches to test a Spring Batch job.

2. Workflow Basics

Spring Batch follows the traditional batch architecture, in which a job repository schedules and interacts with jobs.

A job can have more than one step. And every step typically follows the sequence of reading data, processing it and writing it.

Of course, the framework will do most of the heavy lifting for us here, especially when it comes to the low-level persistence work of dealing with the jobs, using h2 for the job repository.

2.1. Example Use Case

The simple use case we’re going to tackle here is migrating some financial transaction data from CSV to XML.

The input file has a very simple structure.

It contains a transaction per line, made up of a username, the user ID, the date of the transaction and the amount:

username, userid, transaction_date, transaction_amount
devendra, 1234, 31/10/2015, 10000
john, 2134, 3/12/2015, 12321
robin, 2134, 2/02/2015, 23411

3. The Maven POM

Dependencies required for this project are Spring Batch Core, Spring Object/XML Marshalling (OXM), and H2 database:

<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-oxm</artifactId>
    <version>6.2.0</version>
</dependency>
<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <version>2.3.232</version>
</dependency>
<dependency>
    <groupId>org.springframework.batch</groupId>
    <artifactId>spring-batch-core</artifactId>
    <version>5.2.0</version>
</dependency>

4. Auto Create Spring Batch Schema

When we use Spring Batch, we can use the pre-packaged SQL initialization scripts to auto-create the schema on startup. Furthermore, when we use the embedded H2 database, Spring Boot automatically runs the corresponding SQL initialization script to initialize the database. However, when we use one of the other supported databases we need to configure Spring Boot properties for it to automatically detect the database and run the corresponding SQL initialization script on startup.

We can configure Spring Boot’s application.properties file to enable automatic database initialization:

spring.batch.jdbc.initialize-schema=always

Alternatively, we can configure Spring Boot’s application.yml file to enable automatic database initialization:

spring:
  batch:
    jdbc:
      initialize-schema: "always"

Additionally, we should not annotate the BatchConfig class with @EnableBatchProcessing. We do this so that Spring Boot takes control of configuring Spring Batch, including creating the Batch schema in the auto-configured data source.

Conversely, we can switch off the auto-creation of the Spring Batch schema, including for the embedded H2 database, by setting the same property in Spring Boot’s application.properties file to never:

spring.batch.jdbc.initialize-schema=never

Again, we can alternatively turn off automatic database initialization in Spring Boot’s application.yml file:

spring:
  batch:
    jdbc:
      initialize-schema: "never"

5. Spring Batch and Job Config

The basic Spring Batch configuration is displayed below, along with our job description for the CSV to XML functionality.

Java-based job configuration:

@Profile("spring")
public class SpringBatchConfig {

    @Value("input/record.csv")
    private Resource inputCsv;

    @Value("file:xml/output.xml")
    private Resource outputXml;

    @Bean
    public ItemReader<Transaction> itemReader()
      throws UnexpectedInputException, ParseException {
        FlatFileItemReader<Transaction> reader = new FlatFileItemReader<Transaction>();
        DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
        String[] tokens = { "username", "userid", "transactiondate", "amount" };
        tokenizer.setNames(tokens);
        reader.setResource(inputCsv);
        DefaultLineMapper<Transaction> lineMapper = 
          new DefaultLineMapper<Transaction>();
        lineMapper.setLineTokenizer(tokenizer);
        lineMapper.setFieldSetMapper(new RecordFieldSetMapper());
        reader.setLineMapper(lineMapper);
        return reader;
    }

    @Bean
    public ItemProcessor<Transaction, Transaction> itemProcessor() {
        return new CustomItemProcessor();
    }

    @Bean
    public ItemWriter<Transaction> itemWriter(Marshaller marshaller)
      throws MalformedURLException {
        StaxEventItemWriter<Transaction> itemWriter = 
          new StaxEventItemWriter<Transaction>();
        itemWriter.setMarshaller(marshaller);
        itemWriter.setRootTagName("transactionRecord");
        itemWriter.setResource(outputXml);
        return itemWriter;
    }

    @Bean
    public Marshaller marshaller() {
        Jaxb2Marshaller marshaller = new Jaxb2Marshaller();
        marshaller.setClassesToBeBound(new Class[] { Transaction.class });
        return marshaller;
    }

    @Bean
    protected Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager, 
      ItemReader<Transaction> reader, ItemProcessor<Transaction, Transaction> processor, 
      ItemWriter<Transaction> writer) {
        return new StepBuilder("step1", jobRepository)
          .<Transaction, Transaction> chunk(10, transactionManager)
          .reader(reader).processor(processor).writer(writer).build();
    }

    @Bean(name = "firstBatchJob")
    public Job job(JobRepository jobRepository, @Qualifier("step1") Step step1) {
        return new JobBuilder("firstBatchJob", jobRepository).preventRestart().start(step1).build();
    }
    
    public DataSource dataSource() {
     EmbeddedDatabaseBuilder builder = new EmbeddedDatabaseBuilder();
     return builder.setType(EmbeddedDatabaseType.H2)
       .addScript("classpath:org/springframework/batch/core/schema-drop-h2.sql")
       .addScript("classpath:org/springframework/batch/core/schema-h2.sql")
       .build();
    }
    
    @Bean(name = "transactionManager")
    public PlatformTransactionManager getTransactionManager() {
        return new ResourcelessTransactionManager();
    }
    
    @Bean(name = "jobRepository")
    public JobRepository getJobRepository() throws Exception {
        JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
        factory.setDataSource(dataSource());
        factory.setTransactionManager(getTransactionManager());
        factory.afterPropertiesSet();
        return factory.getObject();
    }
    
    @Bean(name = "jobLauncher")
    public JobLauncher getJobLauncher() throws Exception {
       TaskExecutorJobLauncher jobLauncher = new TaskExecutorJobLauncher();
       jobLauncher.setJobRepository(getJobRepository());
       jobLauncher.afterPropertiesSet();
       return jobLauncher;
    }
}

And the XML-based configuration:

<bean id="itemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="input/record.csv" />
    <property name="lineMapper">
        <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
            <property name="lineTokenizer">
                <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                    <property name="names" value="username,userid,transactiondate,amount" />
                </bean>
            </property>
            <property name="fieldSetMapper">
                <bean class="com.baeldung.batch.service.RecordFieldSetMapper" />
            </property>
        </bean>
    </property>
    <property name="linesToSkip" value="1" />
</bean>

<bean id="itemProcessor" class="com.baeldung.batch.service.CustomItemProcessor" />

<bean id="itemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter">
    <property name="resource" value="file:xml/output.xml" />
    <property name="marshaller" ref="marshaller" />
    <property name="rootTagName" value="transactionRecord" />
</bean>

<bean id="marshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
    <property name="classesToBeBound">
        <list>
            <value>com.baeldung.batch.model.Transaction</value>
        </list>
    </property>
</bean>

<batch:job id="firstBatchJob">
    <batch:step id="step1">
        <batch:tasklet>
            <batch:chunk reader="itemReader" writer="itemWriter"
              processor="itemProcessor" commit-interval="10">
            </batch:chunk>
        </batch:tasklet>
    </batch:step>
</batch:job>


<!-- connect to H2 database -->
<bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
    <property name="driverClassName" value="org.h2.Driver" />
    <property name="url" value="jdbc:h2:file:~/repository" />
    <property name="username" value="" />
    <property name="password" value="" />
</bean>

<!-- stored job-meta in database -->
<bean id="jobRepository" class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean">
    <property name="dataSource" ref="dataSource" />
    <property name="transactionManager" ref="transactionManager" />
    <property name="databaseType" value="h2" />
</bean>

<bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />

<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
    <property name="jobRepository" ref="jobRepository" />
</bean>

Now that we have the whole config, let’s break it down and start discussing it.

5.1. Read Data and Create Objects With ItemReader

First, we configured the cvsFileItemReader that will read the data from the record.csv and convert it into the Transaction object:

@SuppressWarnings("restriction")
@XmlRootElement(name = "transactionRecord")
public class Transaction {
    private String username;
    private int userId;
    private LocalDateTime transactionDate;
    private double amount;

    /* getters and setters for the attributes */

    @Override
    public String toString() {
        return "Transaction [username=" + username + ", userId=" + userId
          + ", transactionDate=" + transactionDate + ", amount=" + amount
          + "]";
    }
}

To do so, it uses a custom mapper:

public class RecordFieldSetMapper implements FieldSetMapper<Transaction> {
 
    public Transaction mapFieldSet(FieldSet fieldSet) throws BindException {
        DateTimeFormatter formatter = DateTimeFormatter.ofPattern("d/M/yyy");
        Transaction transaction = new Transaction();
 
        transaction.setUsername(fieldSet.readString("username"));
        transaction.setUserId(fieldSet.readInt(1));
        transaction.setAmount(fieldSet.readDouble(3));
        String dateString = fieldSet.readString(2);
        transaction.setTransactionDate(LocalDate.parse(dateString, formatter).atStartOfDay());
        return transaction;
    }
}

5.2. Processing Data With ItemProcessor

We have created our own item processor, CustomItemProcessor. This doesn’t process anything related to the transaction object.

All it does is pass the original object coming from the reader to the writer:

public class CustomItemProcessor implements ItemProcessor<Transaction, Transaction> {

    public Transaction process(Transaction item) {
        return item;
    }
}

5.3. Writing Objects to the FS With ItemWriter

Finally, we are going to store this transaction into an XML file located at xml/output.xml:

<bean id="itemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter">
    <property name="resource" value="file:xml/output.xml" />
    <property name="marshaller" ref="marshaller" />
    <property name="rootTagName" value="transactionRecord" />
</bean>

5.4. Configuring the Batch Job

So, all we have to do is connect the dots with a job using the batch:job syntax.

Note the commit-interval. That’s the number of transactions to be kept in memory before committing the batch to the itemWriter.

It will hold the transactions in memory until that point (or until the end of the input data is encountered).

The Java bean and correspondent XML configuration are displayed:

@Bean
protected Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager,
  @Qualifier("itemProcessor") ItemProcessor<Transaction, Transaction> processor, ItemWriter<Transaction> writer) {
    return new StepBuilder("step1", jobRepository)
      .<Transaction, Transaction> chunk(10, transactionManager)
      .reader(itemReader(inputCsv))
      .processor(processor)
      .writer(writer)
      .build();
}

<batch:job id="firstBatchJob">
    <batch:step id="step1">
        <batch:tasklet>
            <batch:chunk reader="itemReader" writer="itemWriter" processor="itemProcessor" commit-interval="10">
            </batch:chunk>
        </batch:tasklet>
    </batch:step>
</batch:job>

5.5. Running the Batch Job

Now let’s set up and run everything:

@Profile("spring")
public class App {
    public static void main(String[] args) {
        // Spring Java config
        AnnotationConfigApplicationContext context = new AnnotationConfigApplicationContext();
        context.register(SpringConfig.class);
        context.register(SpringBatchConfig.class);
        context.refresh();
        
        JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
        Job job = (Job) context.getBean("firstBatchJob");
        System.out.println("Starting the batch job");
        try {
            JobExecution execution = jobLauncher.run(job, new JobParameters());
            System.out.println("Job Status : " + execution.getStatus());
            System.out.println("Job completed");
        } catch (Exception e) {
            e.printStackTrace();
            System.out.println("Job failed");
        }
    }
}

We run our Spring application using -Dspring.profiles.active=spring profile.

In the next section, we configure our example in a Spring Boot application.

6. Run Batch Jobs in a Particular Order

In Spring Batch, we can have jobs execute in a specific order by creating a parent job that orchestrates the execution of multiple child jobs. This is particularly useful when the output of one job is required as input for another or when jobs need to be executed sequentially due to business logic.

For demonstration purposes, we’ll create two steps that simply log messages:

@Bean
public Step firstStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
    return new StepBuilder("firstStep", jobRepository)
      .<String, String>chunk(1, transactionManager)
      .reader(new IteratorItemReader<>(Stream.of("Data from Step 1").iterator()))
      .processor(item -> {
          System.out.println("Processing: " + item);
          return item;
      })
      .writer(items -> items.forEach(System.out::println))
      .build();
}

@Bean
public Step secondStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
    return new StepBuilder("secondStep", jobRepository)
      .<String, String>chunk(1, transactionManager)
      .reader(new IteratorItemReader<>(Stream.of("Data from Step 2").iterator()))
      .processor(item -> {
          System.out.println("Processing: " + item);
          return item;
      })
      .writer(items -> items.forEach(System.out::println))
      .build();
}

Next, we create a parent job that executes the defined steps in the desired order. The parent job starts with firstStep and then proceeds to secondStep.

We can use the JobBuilder to define the sequence of steps:

@Bean(name = "parentJob")
public Job parentJob(JobRepository jobRepository,
    @Qualifier("firstStep") Step firstStep,
    @Qualifier("secondStep") Step secondStep) {
    return new JobBuilder("parentJob", jobRepository)
      .start(firstStep)
      .next(secondStep)
      .build();
}

Now, let’s run the job using the JobLauncher within our application context. When these steps run, we’ll see logs similar to the following:

Processing: Data from Step 1
Data from Step 1

Processing: Data from Step 2
Data from Step 2

By following these steps, we can effectively run multiple Spring Batch jobs in a specified order without relying on complex item processing or reading mechanisms.

7. Spring Boot Configuration

In this section, we’ll create a Spring Boot application and convert the previous Spring Batch Config to run in the Spring Boot environment. In fact, this is roughly the equivalent of the previous Spring Batch example.

7.1. Maven Dependencies

Let’s start by declaring spring-boot-starter-batch dependency in a Spring Boot application in the pom.xml:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch</artifactId>
    <version>3.4.0</version>
</dependency>

We need a database to store the Spring Batch job information. In this tutorial, we use an in-memory H2 database, as configured before.

7.2. Spring Boot Config

We use the @Profile annotation to distinguish between the Spring and Spring Boot configurations. We set the spring-boot profile in our application:

@SpringBootApplication
public class SpringBatchApplication {

    public static void main(String[] args) {
        SpringApplication springApp = new SpringApplication(SpringBatchApplication.class);
        springApp.setAdditionalProfiles("spring-boot");
        springApp.run(args);
    }

}

7.3. Spring Batch Job Config

We use the batch job configuration the same as the SpringBatchConfig class from earlier:

@Configuration
public class SpringBootBatchConfig {

    @Value("input/record.csv")
    private Resource inputCsv;

    @Value("input/recordWithInvalidData.csv")
    private Resource invalidInputCsv;

    @Value("file:xml/output.xml")
    private Resource outputXml;

    // ...
}

Starting with spring-boot 3.0, @EnableBatchProcessing annotation is discouraged. We declare manually JobRepository, JobLauncher and TransactionManager beans. In addition, JobBuilderFactory and StepBuilderFactory are deprecated, and it’s recommended to use JobBuilder and StepBuilder class with the name of the job/step builder.

8. Conclusion

In this article, we learned how to work with Spring Batch and how to use it in a simple use case.

We saw how we can easily develop our batch processing pipeline and how we can customize different stages in reading, processing, and writing.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

REST with Spring Boot

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Learn Junit

Spring Boot

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

1. Overview

Further reading:

How to Trigger and Stop a Scheduled Spring Batch Job

Conditional Flow in Spring Batch

Testing a Spring Batch Job

2. Workflow Basics

2.1. Example Use Case

3. The Maven POM

4. Auto Create Spring Batch Schema

5. Spring Batch and Job Config

5.1. Read Data and Create Objects With ItemReader

5.2. Processing Data With ItemProcessor

5.3. Writing Objects to the FS With ItemWriter

5.4. Configuring the Batch Job

5.5. Running the Batch Job

6. Run Batch Jobs in a Particular Order

7. Spring Boot Configuration

7.1. Maven Dependencies

7.2. Spring Boot Config

7.3. Spring Batch Job Config

8. Conclusion