1. Overview
Working with Strings is an essential part of many Java applications. One of the most powerful tools is the String.replaceAll() method.
In this tutorial, we’ll learn how this method works and explore some practical examples.
2. The String.replaceAll() Method
The String.replaceAll() method allows us to search for patterns within a String and replace them with a desired value. It’s more than a simple search-and-replace tool, as it leverages regular expressions (regex) to identify patterns, making it highly flexible for a variety of use cases.
Let’s look at its signature:
public String replaceAll(String regex, String replacement)
It accepts two parameters:
- regex – a regular expression used to match parts of the String
- replacement – the String that replaces each match found
Next, let’s see an example:
String input = "Hello w o r l d";
String result = input.replaceAll("\\s", "_");
assertEquals("Hello_w_o_r_l_d", result);
In this example, we replace all regex pattern “\\s”, which each match a single whitespace character, with a “_” character for each match.
Let’s take the same input and perform a different replacement:
result = input.replaceAll("e.*o", "X");
assertEquals("HX r l d", result);
In the above code, the regex “e.*o” matches any segments that begin with “e” and end with “o”. Therefore, “ello w o” from input gets replaced with “X“.
It’s worth noting that “.*” performs a greedy match. That is to say, it matches from the first “e” to the last “o” in the String. If we want a pattern to match from the first “e” to the next “o” instead of the last, in other words, to be a non-greedy match, we can use the regex “e[^0]*o” or “e.*?o“:
result = input.replaceAll("e.*?o", "X");
assertEquals("HX w o r l d", result);
result = input.replaceAll("e[^o]*o", "X");
assertEquals("HX w o r l d", result);
“e.*?o” matches from ‘e’ matches any substring that starts with an ‘e‘, ends with an ‘o‘, and has the shortest possible sequence of characters in between. The non-greedy “*?” quantifier ensures that the match is as short as possible, stopping at the first valid occurrence of the ‘o‘ after the ‘e‘.
On the other side, “e[^o]*o” matches Strings that:
- Start with an ‘e‘
- Contain zero or more characters that aren’t ‘0‘ between ‘e‘ and ‘o‘
- End with an ‘o‘
As we can see, both approaches produce the expected result: “HX w o r l d“.
3. String.replace() vs String.replaceAll()
The String class provides replace() and replaceAll(). The two methods look similar, and they can produce the same results sometimes:
String input = "hello.java.hello.world";
String replaceResult = input.replace("hello", "hi");
assertEquals("hi.java.hi.world", replaceResult);
String replaceAllResult = input.replaceAll("hello", "hi");
assertEquals("hi.java.hi.world", replaceAllResult);
In this example, both replace(“hello”, “hi”) and replaceAll(“hello”, “hi”) replace all “hello“s in input with “hi“. So, some of us may ask, what’s their difference then?
The key difference between replace() and replaceAll() is that replace() always performs literal String replacement. In contrast, replaceAll() uses regex to match patterns, making it a more advanced tool for string manipulation.
The two methods produced the same result in the above example since the characters in the regex pattern “hello” have no special meanings.
Now, let’s say we want to replace all dots “.” with colons “:“. If we still pass the same parameters to replace() and replaceAll() this time, they produce different results:
replaceResult = input.replace(".", ":");
assertEquals("hello:java:hello:world", replaceResult);
replaceAllResult = input.replaceAll(".", ":");
assertEquals("::::::::::::::::::::::", replaceAllResult);
As we can see, the replaceAll() method replaces every character in input with “:“. This is because “.” has a special meaning in regex: matching any character.
Next, let’s see how to tell the regex engine to treat special characters as literal.
4. Handling Special Characters in Regex
Many characters have special meanings in regex, for example:
- ‘.’ – Matches any character
- ‘[‘ and ‘]’ – Define a character class
- ‘(‘ and ‘)’ – Define a capture group
- ‘|’ – Logical OR
- …
Sometimes, when we have these characters in our regex pattern, we want the regex engine to treat them as literal characters. We have two options to achieve that: escaping the character using backslash or putting it in a character class.
Next, let’s solve the problem in a previous example: replacing all “.” characters with “:“:
String input = "hi.java.hi.world";
String result = input.replaceAll("\\.", ":");
assertEquals("hi:java:hi:world", result);
result = input.replaceAll("[.]", ":");
assertEquals("hi:java:hi:world", result);
As we can see, we can get the expected result by escaping “.” or putting “.” in a character class.
Next, let’s see another example. Let’s say we have the String ” (debug) hello.world” and we want to replace “(debug)” with “[info]“:
input = "(debug) hello.world";
result = input.replaceAll("(debug)", "[info]");
assertEquals("([info]) hello.world", result);
result = input.replaceAll("[(]debug[)]", "[info]");
assertEquals("[info] hello.world", result);
result = input.replaceAll("\\(debug\\)", "[info]");
assertEquals("[info] hello.world", result);
As the above code shows, since ‘(‘ and ‘)‘ are special characters in regex, replaceAll(“(debug)”, “[info]”) doesn’t give us the expected output.
However, escaping ‘(‘ and ‘)‘ or adding them to character classes solves the problem.
5. When the Regex Is Invalid
We understand that String.replaceAll() works based on regex. Next, let’s figure out what happens if the regex we passed to the method is invalid:
String input = "Hello world";
assertThrows(PatternSyntaxException.class, () -> input.replaceAll("e**", "X"));
In this example, we pass the regex “e**” to replaceAll(). But, “e**” is an invalid regex due to the improper usage of the “*” quantifiers.
The first “*” is a quantifier. So, “e*” means “zero or multiple contiguous ‘e‘ characters. The second “*” is another quantifier, but it doesn’t have a valid preceding element to act on. Therefore, “e**” is an invalid regex.
As the test above shows, when we pass an invalid regex to String.replaceAll(), a PatternSyntaxException is raised.
6. It’s More Than Just a Search-And-Replace Tool
Although String.replaceAll() sounds like a simple search-and-replace tool, it can do much more than that. In this section, let’s look at some advanced usages.
6.1. Referencing Capture Groups in the Replacement
Let’s say we have a nine-digit input:
String input = "123456789";
We aim to split the input into three three-digit groups, reverse and join the three groups with “–“:
String expected = "789-456-123";
Next, let’s solve this problem using String.replaceAll():
String result = input.replaceAll("(\\d{3})(\\d{3})(\\d{3})", "$3-$2-$1");
assertEquals(expected, result);
As the code shows, we created three capturing groups in the regex. Further, we can use $1, $2, and $3 to refer to those groups in the replacement String. This allows us to rearrange groups easily.
Additionally, we can use the named capturing group “(?<groupName>)” in the regex and reference those groups in the replacement by names – “${groupName}”:
result = input.replaceAll("(?<first>\\d{3})(?<second>\\d{3})(?<third>\\d{3})", "${third}-${second}-${first}");
assertEquals(expected, result);
In this example, we defined names for the three capturing groups (“first“, “second” and “third“), and referenced them by name in the replacement String.
Named capturing groups have several advantages regarding readability, maintainability, and clarity, particularly when dealing with complex patterns.
6.2. Inserting Hyphens Between Any Two Contiguous Characters
Let’s say we are given a String with unknown length:
String input = "abcdefg";
Now, our task is inserting “–“s between any two contiguous characters:
String expected = "a-b-c-d-e-f-g";
We can use String.replaceAll() with a lookaround assertion to solve it:
String result = input.replaceAll("(?<=.)(?=.)", "-");
assertEquals(expected, result);
In the regex “(?<=.)(?=.)“:
- (?<=.) – Lookbehind assertion ensures that there is any character (.) before the current position.
- (?=.) – Lookahead assertion ensures that there is any character (.) after the current position.
It’s important to note that lookbehind and lookahead assertions don’t consume the character before or after them. Therefore, this regex matches the position between any contiguous characters.
When this pattern is passed to replaceAll(), it replaces those in-between positions with a hyphen (“–“).
7. Conclusion
String.replaceAll() is a powerful tool for manipulating text in Java. Its power lies in its ability to work with regular expressions, enabling us to perform complex replacements in just a few lines of code.
In this article, we’ve explored its typical usage through examples and discussed the difference between String.replace() and String.replaceAll().