Java Streams are not for Seniors
A few days ago I stumpled upon this LinkedIn post (in Spanish, but easily understandable) including this image with two solutions to the same problem:
This isn’t a new topic, as the doubts about Java Streams, how they work and when to use them are very common. But I don’t think it has to do with seniority.
The Streams API is not a “clever construct to hide loops” that only senior developers undestand, but probably the greatest revolution ever implemented in the Java language. A revolution that took a purely imperative language and made it compatible with declarative programming.
Declarative vs. Imperative programming
Imperative programming focuses on writing code that tells the computer exactly how to solve a problem. We start with a system in an initial state, and describe a series of steps that modify that state until we get the desired final one.
Declarative programming, on the other hand, implies enumerating what operations we need to apply on a n initial system to get the desired solution. How that operations work is not part of our program, we leave that to the programming language implementation, that’s why the functional approach is sometimes harder to debug as we do not see the actual algorithm working.
We use declarative languages everyday to describe things, think of YAML, JSON, XML or HTML. They are perfect to define system configurations or document object models. SQL is also a declarative language, as you use it to describe the operations that the database needs to do an an initial set of tables to produce the result you need, but not how to implement them.
Functional programming
If you use a declarative approach to describe the solution of a problem, using the concept of Function as the main building block, you have what we call functional programming.
A Function in this context is a first-level construction that represents an operation. Introducing functional programming in Java is beyond the scope of this article, but you can find multiple articles on this topic.
Another core principle in functional programming is Immutability. Functional code describes the sequence of operations (functions) which, when executed in order over a source data structure, produce a new data structure with the desired shape.
Why do Streams seem difficult?
Before JDK8, you could only write imperative code in Java. A whole generation of developers that learned how to code with Java in the 90s had this way of thinking and solving problems rooted deep in their minds.
The Streams API is part of an effort to introduce the functional paradigm in Java. Toghether with lambda expressions and Records (introduced later in Java14), they are the three pillars of functional programming:
- First-level functions support
- Declarative style for data manipulation
- Immutable data structures
The reason why this seems difficult at first is because we use to learn Java by writing imperative code. We write loops, add getters and setters to our Classes and use Lists and Maps to store and manipulate data. By the time we learn about Immutability and Functions, the imperative style seem much more natural to us. And since Java allows both programming styles, we tend to mix them and struggle to understand the benefits of each one or even detect which one we are using at a given time.
What about performance?
Another popular concern about the functional approach is performance. In an imperative algorithm we see the logic in our code, and are able to optimize it to make it faster if we need to, but functional code being declarative means that the algorithm is not evident.
Some comments in the Linkedin post worried that the declarative approach would require iterating through the list three times, given the presence of filter, map and collect operations. This is not how this works, and I doubt that any programming language would iterate three times to implement this kind of operation.
A Java Stream represents a pipeline that applies a sequence of operations to an initial collection. It only starts iterating the list when we call the final operation (in this case the collect method), and will always produce a new collection with the result.
The declarative nature of Java Streams make them much more flexible and powerful than their imperative version. We can use different algorithms to iterate the collection depending on its size, for example. Processing big collections in parallel is complex in imperative code, but it becomes trivial when you use immutable data structures and pure functions.
Besides, Java runs in a JVM with a JIT compiler that can optimize the native code it generates, so that the same Java Streams expression can generate different execution paths depending on the size and nature of the collection we iterate.
Ultimately, the choice between using a for loop and a stream comes down to specific requirements: are there scenarios where the imperative way outperforms the declarative way in terms of speed or resource usage? Of course there are. Can Java streams be considered the preferred choice in most scenarios due to their flexibility, robustness, scalability, and maintainability? Indeed, this is the case.
Let’s write some code
To visualize how the declarative approach works, we can use the same example in the image. Let’s define the Person class, but adding a bit of logic to have some insights about how it is working. I’ve added a countMethodCall method that will keep the number of times that a particular method is called on the class. It will also write a string to the standard output so that we can actually see the calls. Then, I’ve modified all the getters and setters to invoke that method.
public class Person {
private static final Map<String, Integer> methodCalls = new HashMap<>();
private int age;
private String name;
public Person(int age, String name) {
this.age = age;
this.name = name;
}
private void countMethodCall(String methodName)
{
System.out.printf(
"Calling method <%s> on object <%s>%n",
methodName,
this.hashCode());
var previousValue = methodCalls.getOrDefault(methodName, 0);
methodCalls.put(methodName, previousValue + 1);
}
public int getAge() {
countMethodCall("getAge");
return age;
}
public void setAge(int age) {
countMethodCall("setAge");
this.age = age;
}
public String getName() {
countMethodCall("getName");
return name;
}
public void setName(String name) {
countMethodCall("setName");
this.name = name;
}
public static Integer getMethodCalls(String methodName)
{
return methodCalls.getOrDefault(methodName, 0);
}
}
To execute this exercise, I’ve written a test that will create a list with 100 Person instances, where the age in each one is equal to its ordinal position. Then, it transforms the list of people into a list of Strings containing the names of those whose age is greater than 18, in a functional fashion:
public class StreamsLazinessTest {
@Test
void testStreamsLaziness()
{
var people = IntStream
.range(0,100) // <- create a range of ints, from 0 to 99
.boxed() // <- box them as Integer objects
// replace each Integer with a Person that has the value as age
.map(value -> new Person(value, "Person with age %d".formatted(value)))
.toList(); // <- return the result as a List
List<String> names = people
.stream() // <- create a stream from the list of people
// filter the stream, keeping only those with age > 18
.filter(person -> person.getAge() > 18)
.map(Person::getName) // <- replace each Person object with a String given by its name
.toList(); // <- return the result as a List
//Verify that getAge is called once in every Person object ...
assertEquals(100, Person.getMethodCalls("getAge"));
//... but getName is only called on those that passed the filter
assertEquals(81, Person.getMethodCalls("getName"));
}
}
If we execute this test we will se something like this in the console:
Calling method <getAge> on object <1473611564>
Calling method <getAge> on object <107456312>
Calling method <getAge> on object <921760190>
Calling method <getAge> on object <360067785>
Calling method <getAge> on object <1860250540>
Calling method <getAge> on object <1426329391>
Calling method <getAge> on object <1690859824>
Calling method <getAge> on object <1074593562>
Calling method <getAge> on object <660017404>
Calling method <getAge> on object <1381965390>
Calling method <getAge> on object <1979313356>
Calling method <getAge> on object <1386883398>
Calling method <getAge> on object <1306854175>
Calling method <getAge> on object <1742920067>
Calling method <getAge> on object <1564984895>
Calling method <getAge> on object <1587819720>
Calling method <getAge> on object <1002191352>
Calling method <getAge> on object <1256440269>
Calling method <getAge> on object <704024720>
Calling method <getAge> on object <1452012306>
Calling method <getName> on object <1452012306>
Calling method <getAge> on object <211968962>
Calling method <getName> on object <211968962>
Calling method <getAge> on object <1486566962>
Calling method <getName> on object <1486566962>
Calling method <getAge> on object <1173643169>
...
Calling method <getName> on object <973576304>
Process finished with exit code 0
When we invoke the collect method on the Stream, it starts iterating the Person objects. They first get to the filter operation, that will remove them from the pipeline if the provided expression returns false. There is no getName call for the first 18 items, as they are removed in the filter step.
The other important thing to note is that, for ages > 18, the getName call comes right after the getAge call. The collection is iterated only once, with all the work done on each item before passing to the next.
So which approach is better?
As always, it depends. Both declarative and imperative programming styles have advantages and disadvantages. As I said, in general I prefer the declarative approach for most problems, as I think it is more concise, scalable and easy to maintain.
If performance or memory usage is critical for your solution, you need to measure. Do not assume that since Java Streams introduce abstractions and declarative approach it will run slower or consume more memory, as this strongly depends on your particular case. Write proof of concept versions with both styles, measure memory and time, and make a data-based decision.
Final thoughts
I have written this aticle because I am one of those developers that started coding with Java in the 90s, with the imperative paradigm as the natural way to solve problems. I have walked the path from there to the functional style, and now understand the benefits and the reasons for it. But I know that it is hard to comprehend at first, so I hope reading this will make the journey a bit easier for some of you.
Happy coding!