Guide to Java Collectors

Last updated: January 8, 2024

Written by: Grzegorz Piwowarek

Java Streams

Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, you can get started over on the documentation page.

And, you can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

End-to-end testing is a very useful method to make sure that your application works as intended. This highlights issues in the overall functionality of the software, that the unit and integration test stages may miss.

Playwright is an easy-to-use, but powerful tool that automates end-to-end testing, and supports all modern browsers and platforms.

When coupled with LambdaTest (an AI-powered cloud-based test execution platform) it can be further scaled to run the Playwright scripts in parallel across 3000+ browser and device combinations:

>> Automated End-to-End Testing With Playwright

1. Overview

In this tutorial, we’ll be going through Java 8’s collectors, which are used in the final step of processing a Stream. To read more about the Stream API itself, please refer to this article.

If we want to see how to leverage the power of collectors for parallel processing, we can look at this project.

**2. Stream.collect() Method**

The collect() method is one of Java 8’s Stream API terminal methods. It allows us to perform mutable fold operations (repackaging elements to some data structures and applying some additional logic, concatenating them, etc.) on data elements held in a Stream instance.

The strategy for this operation is provided via the Collector interface implementation.

3. Collectors

Typically, we can find all the predefined implementations in the Collectors class. It’s common practice to use the following static import with them to leverage increased readability:

import static java.util.stream.Collectors.*;

Furthermore, we can also use single import collectors of our choice:

import static java.util.stream.Collectors.toList;
import static java.util.stream.Collectors.toMap;
import static java.util.stream.Collectors.toSet;

In the following examples, we’ll be reusing the following list:

List<String> givenList = Arrays.asList("a", "bb", "ccc", "dd");

3.1. Collectors.toCollection()

As we’ve already noted, when using the toSet() and toList() collectors, we can’t make any assumptions about their implementations. If we want to use a custom implementation, we’ll need to use the toCollection() collector with a provided collection of our choice.

Let’s create a Stream instance representing a sequence of elements, and then collect them into a LinkedList instance:

List<String> result = givenList.stream()
  .collect(toCollection(LinkedList::new))

Notice that this will not work with any immutable collections. In such a case, we would need to either write a custom collector implementation or use collectingAndThen().

3.2. Collectors.toList()

As the name implies, the main purpose of the toList() method is to collect all Stream elements into a List instance. The important thing to remember is that we can’t assume any particular List implementation with this method. If we want to have more control over this, we can use toCollection() instead.

Let’s create a Stream instance representing a sequence of elements, and then collect them into a List instance:

List<String> result = givenList.stream()
  .collect(toList());

3.3. Collectors.toUnmodifiableList()

Java 10 introduced a convenient way to accumulate the Stream elements into an unmodifiable List:

List<String> result = givenList.stream()
  .collect(toUnmodifiableList());

So, if we try to modify the result List, we’ll get an UnsupportedOperationException:

assertThatThrownBy(() -> result.add("foo"))
  .isInstanceOf(UnsupportedOperationException.class);

3.4. Collectors.toSet()

The toSet() collector can be used to collect all Stream elements in a Set instance. The important thing to remember is that we can’t assume any particular Set implementation with this method. If we want to have more control over this, we can use toCollection() instead.

Let’s create a Stream instance representing a sequence of elements, and then collect them into a Set instance:

Set<String> result = givenList.stream()
  .collect(toSet());

A Set doesn’t contain duplicate elements. If our collection contains elements equal to each other, they appear in the resulting Set only once:

List<String> listWithDuplicates = Arrays.asList("a", "bb", "c", "d", "bb");
Set<String> result = listWithDuplicates.stream()
  .collect(toSet();
assertThat(result)
  .hasSize(4);

3.5. Collectors.toUnmodifiableSet()

Since Java 10, we can easily create an unmodifiable Set using the toUnmodifiableSet() collector:

Set<String> result = givenList.stream()
  .collect(toUnmodifiableSet());

However, any attempt to modify the Set will end up with an UnsupportedOperationException:

assertThatThrownBy(() -> result.add("foo"))
  .isInstanceOf(UnsupportedOperationException.class);

**3.6. Collectors.toMap()**

The toMap() collector can be used to collect Stream elements into a Map instance. To do so, we need to provide two functions: keyMapper() and valueMapper().

Firstly, we’ll use keyMapper() to extract a Map key from a Stream element. Then, we can use valueMapper() to extract a value associated with a given key.

So, let’s collect those elements into a Map that stores strings as keys and their lengths as values:

Map<String, Integer> result = givenList.stream()
  .collect(toMap(Function.identity(), String::length))

In a nutshell, Function.identity() is just a shortcut for defining a function that accepts and returns the same value.

So what happens if our collection contains duplicate elements? Contrary to toSet(), the toMap() method doesn’t silently filter duplicates, which is understandable because how would it figure out which value to pick for this key?

List<String> listWithDuplicates = Arrays.asList("a", "bb", "c", "d", "bb");
assertThatThrownBy(() -> {
    listWithDuplicates.stream().collect(toMap(Function.identity(), String::length));
}).isInstanceOf(IllegalStateException.class);

Note that toMap() doesn’t even evaluate whether the values are also equal. If it sees duplicate keys, it immediately throws an IllegalStateException.

In such cases with key collision, we should use toMap() with another signature:

Map<String, Integer> result = givenList.stream()
  .collect(toMap(Function.identity(), String::length, (item, identicalItem) -> item));

The third argument here is a BinaryOperator(), where we can specify how we want to handle collisions. In this case, we’ll just pick any of these two colliding values because we know that the same strings will always have the same lengths too.

3.7. Collectors.toUnmodifiableMap()

Similar to with Lists and Sets, Java 10 introduced an easy way to collect Stream elements into an unmodifiable Map:

Map<String, Integer> result = givenList.stream()
  .collect(toUnmodifiableMap(Function.identity(), String::length))

As we can see, if we try to put a new entry into a result Map, we’ll get an UnsupportedOperationException:

assertThatThrownBy(() -> result.put("foo", 3))
  .isInstanceOf(UnsupportedOperationException.class);

**3.8. Collectors.collectingAndThen()**

CollectingAndThen() is a special collector that allows us to perform another action on a result straight after collecting ends.

Let’s collect Stream elements to a List instance, and then convert the result into an ImmutableList instance:

List<String> result = givenList.stream()
  .collect(collectingAndThen(toList(), ImmutableList::copyOf))

3.9. Collectors.joining()

The Joining() collector can be used for joining Stream<String> elements.

We can join them together by doing:

String result = givenList.stream()
  .collect(joining());

This will result in:

"abbcccdd"

We can also specify custom separators, prefixes, and postfixes:

String result = givenList.stream()
  .collect(joining(" "));

This will result in:

"a bb ccc dd"

We can also write:

String result = givenList.stream()
  .collect(joining(" ", "PRE-", "-POST"));

This will result in:

"PRE-a bb ccc dd-POST"

**3.10. Collectors.counting()**

Counting() is a simple collector that allows for the counting of all Stream elements.

Now we can write:

Long result = givenList.stream()
  .collect(counting());

**3.11. Collectors.summarizingDouble/Long/Int()**

SummarizingDouble/Long/Int is a collector that returns a special class containing statistical information about numerical data in a Stream of extracted elements.

We can obtain information about string lengths by doing:

DoubleSummaryStatistics result = givenList.stream()
  .collect(summarizingDouble(String::length));

In this case, the following will be true:

assertThat(result.getAverage()).isEqualTo(2);
assertThat(result.getCount()).isEqualTo(4);
assertThat(result.getMax()).isEqualTo(3);
assertThat(result.getMin()).isEqualTo(1);
assertThat(result.getSum()).isEqualTo(8);

3.12. Collectors.averagingDouble/Long/Int()

AveragingDouble/Long/Int is a collector that simply returns an average of extracted elements.

We can get the average string length by doing:

Double result = givenList.stream()
  .collect(averagingDouble(String::length));

**3.13. Collectors.summingDouble/Long/Int()**

SummingDouble/Long/Int is a collector that simply returns a sum of extracted elements.

We can get the sum of all string lengths by doing:

Double result = givenList.stream()
  .collect(summingDouble(String::length));

3.14. Collectors.maxBy/minBy

MaxBy() and MinBy() collectors return the biggest/smallest element of a Stream according to a provided Comparator instance.

We can pick the biggest element by doing:

Optional<String> result = givenList.stream()
  .collect(maxBy(Comparator.naturalOrder()));

We can see that the returned value is wrapped in an Optional instance. This forces users to rethink the empty collection corner case.

**3.15. Collectors.groupingBy()**

Typically, we can use the GroupingBy() collector to group objects by a given property and then store the results in a Map instance.

We can group them by string length, and store the grouping results in Set instances:

Map<Integer, Set<String>> result = givenList.stream()
  .collect(groupingBy(String::length, toSet()));

This will result in the following being true:

assertThat(result)
  .containsEntry(1, newHashSet("a"))
  .containsEntry(2, newHashSet("bb", "dd"))
  .containsEntry(3, newHashSet("ccc"));

We can see that the second argument of the groupingBy() method is a Collector. In addition, we’re free to use any collector of our choice.

3.16. Collectors.partitioningBy()

partitioningBy() is a specialized case of groupingBy() that accepts a Predicate instance, and then collects Stream elements into a Map instance that stores Boolean values as keys and collections as values. Under the “true” key, we can find a collection of elements matching the given Predicate, and under the “false” key, we can find a collection of elements not matching the given Predicate.

We can write:

Map<Boolean, List<String>> result = givenList.stream()
  .collect(partitioningBy(s -> s.length() > 2))

This results in a Map containing:

{false=["a", "bb", "dd"], true=["ccc"]}

3.17. Collectors.teeing()

Let’s find the maximum and minimum numbers from a given Stream using the collectors we’ve learned so far:

List<Integer> numbers = Arrays.asList(42, 4, 2, 24);
Optional<Integer> min = numbers.stream().collect(minBy(Integer::compareTo));
Optional<Integer> max = numbers.stream().collect(maxBy(Integer::compareTo));
// do something useful with min and max

Here we’re using two different collectors, and then combining the results of those two to create something meaningful. Before Java 12, in order to cover such use cases, we had to operate on the given Stream twice, store the intermediate results into temporary variables, and then combine those results afterward.

Fortunately, Java 12 offers a built-in collector that takes care of these steps on our behalf; all we have to do is provide the two collectors and the combiner function.

Since this new collector tees the given stream towards two different directions, it’s called teeing():

numbers.stream().collect(teeing(
  minBy(Integer::compareTo), // The first collector
  maxBy(Integer::compareTo), // The second collector
  (min, max) -> // Receives the result from those collectors and combines them
));

This example is available on GitHub in the core-java-12 project.

4. Custom Collectors

If we want to write our own Collector implementation, we need to implement the Collector interface, and specify its three generic parameters:

public interface Collector<T, A, R> {...}

T – the type of objects that will be available for collection
A – the type of a mutable accumulator object
R – the type of a final result

Let’s write an example Collector for collecting elements into an ImmutableSet instance. We start by specifying the right types:

private class ImmutableSetCollector<T>
  implements Collector<T, ImmutableSet.Builder<T>, ImmutableSet<T>> {...}

Since we need a mutable collection for internal collection operation handling, we can’t use ImmutableSet. Instead, we need to use some other mutable collection or any other class that could temporarily accumulate objects for us. In this case, we will go with an ImmutableSet.Builder() and now we need to implement 5 methods:

Supplier<ImmutableSet.Builder<T>> supplier()
BiConsumer<ImmutableSet.Builder<T>, T> accumulator()
BinaryOperator<ImmutableSet.Builder<T>> combiner()
Function<ImmutableSet.Builder<T>, ImmutableSet<T>> finisher()
Set<Characteristics> characteristics()

The supplier() method returns a Supplier instance that generates an empty accumulator instance. So in this case, we can simply write:

@Override
public Supplier<ImmutableSet.Builder<T>> supplier() {
    return ImmutableSet::builder;
}

The accumulator() method returns a function that is used for adding a new element to an existing accumulator object. So let’s just use the Builder‘s add method:

@Override
public BiConsumer<ImmutableSet.Builder<T>, T> accumulator() {
    return ImmutableSet.Builder::add;
}

The combiner() method returns a function that is used for merging two accumulators together:

@Override
public BinaryOperator<ImmutableSet.Builder<T>> combiner() {
    return (left, right) -> left.addAll(right.build());
}

The finisher() method returns a function that is used for converting an accumulator to the final result type. So in this case, we’ll just use Builder‘s build method:

@Override
public Function<ImmutableSet.Builder<T>, ImmutableSet<T>> finisher() {
    return ImmutableSet.Builder::build;
}

The characteristics() method is used to provide Stream with some additional information that will be used for internal optimizations. In this case, we don’t pay attention to the elements order in a Set because we’ll use Characteristics.UNORDERED. To obtain more information regarding this subject, check Characteristics‘ JavaDoc:

@Override public Set<Characteristics> characteristics() {
    return Sets.immutableEnumSet(Characteristics.UNORDERED);
}

Here is the complete implementation along with the usage:

public class ImmutableSetCollector<T>
  implements Collector<T, ImmutableSet.Builder<T>, ImmutableSet<T>> {

@Override
public Supplier<ImmutableSet.Builder<T>> supplier() {
    return ImmutableSet::builder;
}

@Override
public BiConsumer<ImmutableSet.Builder<T>, T> accumulator() {
    return ImmutableSet.Builder::add;
}

@Override
public BinaryOperator<ImmutableSet.Builder<T>> combiner() {
    return (left, right) -> left.addAll(right.build());
}

@Override
public Function<ImmutableSet.Builder<T>, ImmutableSet<T>> finisher() {
    return ImmutableSet.Builder::build;
}

@Override
public Set<Characteristics> characteristics() {
    return Sets.immutableEnumSet(Characteristics.UNORDERED);
}

public static <T> ImmutableSetCollector<T> toImmutableSet() {
    return new ImmutableSetCollector<>();
}

Finally, here in action:

List<String> givenList = Arrays.asList("a", "bb", "ccc", "dddd");

ImmutableSet<String> result = givenList.stream()
  .collect(toImmutableSet());

5. Java 9 Improvements

Here, we’re going to explore two new collectors added in Java 9: filtering() and flatMapping() used in combination with Collectors.groupingBy() providing intelligent collections of elements.

5.1. Collectors.filtering()

The Collectors.filtering() is similar to the Stream.filter().It’s used for filtering input elements but used for different scenarios. The filter() method from the Stream API is used in the stream chain whereas this new filtering() method is a collector that was designed to be used along with groupingBy().

With filter(), the values are filtered first and then it’s grouped. In this way, the values which are filtered out are gone and there is no trace of it. If we need a trace then we would need to group first and then apply filtering which actually the Collectors.filtering() does.

Collectors.filtering() takes a function for filtering the input elements and a collector to collect the filtered elements:

@Test
public void givenList_whenSatifyPredicate_thenMapValueWithOccurences() {
    List<Integer> numbers = List.of(1, 2, 3, 5, 5);

    Map<Integer, Long> result = numbers.stream()
      .filter(val -> val > 3)
      .collect(Collectors.groupingBy(i -> i, Collectors.counting()));

    assertEquals(1, result.size());

    result = numbers.stream()
      .collect(Collectors.groupingBy(i -> i,
        Collectors.filtering(val -> val > 3, Collectors.counting())));

    assertEquals(4, result.size());
}

5.2. Collectors.flatMapping()

The Collectors.flatMapping() is similar to Collectors.mapping() but has a more fine-grained objective. Both the collectors take a function and a collector where the elements are collected but flatMapping() function accepts a Stream of elements which is then accumulated by the collector.

Let’s see the following model class:

class Blog {
    private String authorName;
    private List<String> comments;
      
    // constructor and getters
}

Collectors.flatMapping() lets us skip intermediate collection and write directly to a single container which is mapped to that group defined by the Collectors.groupingBy():

@Test
public void givenListOfBlogs_whenAuthorName_thenMapAuthorWithComments() {
    Blog blog1 = new Blog("1", "Nice", "Very Nice");
    Blog blog2 = new Blog("2", "Disappointing", "Ok", "Could be better");
    List<Blog> blogs = List.of(blog1, blog2);
        
    Map<String,  List<List<String>>> authorComments1 = blogs.stream()
     .collect(Collectors.groupingBy(Blog::getAuthorName, 
       Collectors.mapping(Blog::getComments, Collectors.toList())));
       
    assertEquals(2, authorComments1.size());
    assertEquals(2, authorComments1.get("1").get(0).size());
    assertEquals(3, authorComments1.get("2").get(0).size());

    Map<String, List<String>> authorComments2 = blogs.stream()
      .collect(Collectors.groupingBy(Blog::getAuthorName, 
        Collectors.flatMapping(blog -> blog.getComments().stream(), 
        Collectors.toList())));

    assertEquals(2, authorComments2.size());
    assertEquals(2, authorComments2.get("1").size());
    assertEquals(3, authorComments2.get("2").size());
}

The Collectors.mapping() maps all grouped author’s comments to the collector’s container i.e. List whereas this intermediate collection is removed with flatMapping() as it gives a direct stream of the comment list to be mapped to the collector’s container.

6. Conclusion

In this article, we explored in depth Java 8’s Collectors and showed how to implement a custom one. Along the way, we elucidated the Java 9 improvements. Make sure to check out one of my projects that enhances the capabilities of parallel processing in Java.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

REST with Spring Boot

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Learn Junit

Spring Boot

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

1. Overview

2. Stream.collect() Method

3. Collectors

3.1. Collectors.toCollection()

3.2. Collectors.toList()

3.3. Collectors.toUnmodifiableList()

3.4. Collectors.toSet()

3.5. Collectors.toUnmodifiableSet()

3.6. Collectors.toMap()

3.7. Collectors.toUnmodifiableMap()

3.8. Collectors.collectingAndThen()

3.9. Collectors.joining()

3.10. Collectors.counting()

3.11. Collectors.summarizingDouble/Long/Int()

3.12. Collectors.averagingDouble/Long/Int()

3.13. Collectors.summingDouble/Long/Int()

3.14. Collectors.maxBy/minBy

3.15. Collectors.groupingBy()

3.16. Collectors.partitioningBy()

3.17. Collectors.teeing()

4. Custom Collectors

5. Java 9 Improvements

5.1. Collectors.filtering()

5.2. Collectors.flatMapping()

6. Conclusion

**2. Stream.collect() Method**

**3.6. Collectors.toMap()**

**3.8. Collectors.collectingAndThen()**

**3.10. Collectors.counting()**

**3.11. Collectors.summarizingDouble/Long/Int()**

**3.13. Collectors.summingDouble/Long/Int()**

**3.15. Collectors.groupingBy()**