1. Introduction
In this tutorial, we’ll learn about the String.split() method in Java. This method helps us break a string into smaller pieces, based on a specified delimiter.
A delimiter is a character or a sequence of characters that mark the boundaries between the pieces. For example, we might want to split a sentence into words or a list of values separated by commas.
2. Understanding String.split()
The split() method is part of the String class in Java. Here’s its basic signature:
String[] split(String regex)
String[] split(String regex, int limit)
This method returns an array of strings. Each element in the array is a substring from the original string. The regex is the regular expression that defines the delimiter.
We can also pass a limit on the number of splits to the split() method. The limit determines how many times the string will be split:
- If the limit is greater than 0, the string is split at most limit – 1 times.
- If the limit is 0, the string is split normally, but trailing empty strings are discarded.
- If the limit is negative, the method splits the string as many times as possible, including trailing empty strings.
Let’s start with a simple example. Suppose we have a sentence, and we want to split it into words:
@Test
void whenSplit_thenCorrect() {
String s = "Welcome to Baeldung";
String[] expected1 = new String[] { "Welcome", "to", "Baeldung" };
String[] expected2 = new String[] { "Welcome", "to Baeldung" };
assertArrayEquals(expected1, s.split(" "));
assertArrayEquals(expected2, s.split(" ", 2));
}
3. Empty Strings and Missing Delimiters
If the string doesn’t contain the delimiter, the split() method returns an array containing the original string. On the other hand, if the string is empty, the result is an array with a one element empty string:
@Test
void whenSplitEmptyString_thenGetExpectedArray() {
String s = "";
String[] expected = new String[] { "" };
String[] result = s.split(",");
assertArrayEquals(expected, result);
}
4. Special Characters and Regular Expressions
The split() method treats the delimiter as a regular expression, so we can use regex patterns for splitting. However, this also means that certain characters, like dots (.), backslashes (\), and other metacharacters, need to be escaped.
For example, if we want to split on a dot (.), we need to escape it like this: \\. because the dot (.) is a special character in regular expressions:
@Test
void whenSplitWithDotDelimiter_thenGetExpectedArray() {
String s = "www.example.com";
String[] expected = new String[] { "www", "example", "com" };
String[] result = s.split("\\.");
assertArrayEquals(expected, result);
}
5. Different Character Encodings
Once data is in a Java String, it’s already decoded into a sequence of Unicode characters. Therefore, we don’t need to explicitly handle character encodings in the split() method:
@Test
void whenSplitWithCommaDelimiter_thenHandleDifferentEncodings() {
String utf8String = "Hello, 你好, Bonjour";
String[] expected = new String[] { "Hello, ", "好, Bonjour" };
String[] result = utf8String.split("你");
assertArrayEquals(expected, result);
}
6. Multiple Delimiters
In more complex scenarios, we may need to split a string with multiple delimiters. The split() method supports combining multiple delimiters into a single pattern:
@Test
void whenSplitWithMultipleDelimiters_thenGetExpectedArray() {
String s = "apple,banana;orange|grape";
String[] expected = new String[] { "apple", "banana", "orange", "grape" };
String[] result = s.split("[,;|]");
assertArrayEquals(expected, result);
}
In this case, the string is split using commas, semicolons, and pipes as delimiters.
7. Leading/Trailing Delimiters
The split() method handles cases with leading delimiters by creating empty strings (“”) in the resulting array. However, trailing delimiters are ignored unless we explicitly specify a limit of -1. When the limit is set to -1, the split() method includes empty strings caused by trailing delimiters in the resulting array:
@Test
void whenSplitWithLeadingAndTrailingCommas_thenGetExpectedArray() {
String s = ",apple,banana,";
String[] expected = new String[] { "", "apple", "banana" };
String[] result = s.split(",");
assertArrayEquals(expected, result);
String[] expectedWithLimit = new String[] { "", "apple", "banana", "" };
String[] resultWithLimit = s.split(",", -1);
assertArrayEquals(expectedWithLimit, resultWithLimit);
}
8. Exception Handling
When using the split() method, it’s essential to ensure that the delimiter pattern we provide is valid. If the delimiter pattern is invalid, the method throws a PatternSyntaxException. The PatternSyntaxException is a runtime exception that occurs when the regular expression used in split() has incorrect syntax.
Here’s an example:
@Test
void whenPassInvalidParameterToSplit_thenPatternSyntaxExceptionThrown() {
String s = "Welcome*to Baeldung";
assertThrows(PatternSyntaxException.class, () -> {
String[] result = s.split("*");
});
}
In this case, we try to split a string using the * character as a delimiter. Since * is a special character in regular expressions, we need to escape it properly. Since we don’t escape it, the code throws a PatternSyntaxException.
9. Conclusion
In this article, we explored the String.split() method, which allows us to divide strings into smaller substrings based on specified delimiters. We learned how to use this method with regular expressions, handle different character encodings, and work with multiple delimiters.