
Baeldung Pro comes with both absolutely No-Ads as well as finally with Dark Mode, for a clean learning experience:
Once the early-adopter seats are all used, the price will go up and stay at $33/year.
Last updated: May 11, 2024
When dealing with Strings in Java, we sometimes need to encode them into a specific charset.
This tutorial is a practical guide showing different ways to encode a String to the UTF-8 charset.
For a more technical deep-dive, see our Guide to Character Encoding.
To showcase the Java encoding, we’ll work with the German String “Entwickeln Sie mit Vergnügen”:
String germanString = "Entwickeln Sie mit Vergnügen";
byte[] germanBytes = germanString.getBytes();
String asciiEncodedString = new String(germanBytes, StandardCharsets.US_ASCII);
assertNotEquals(asciiEncodedString, germanString);
This String encoded using US_ASCII gives us the value “Entwickeln Sie mit Vergn?gen” when printed because it doesn’t understand the non-ASCII ü character.
But when we convert an ASCII-encoded String that uses all English characters to UTF-8, we get the same string:
String englishString = "Develop with pleasure";
byte[] englishBytes = englishString.getBytes();
String asciiEncondedEnglishString = new String(englishBytes, StandardCharsets.US_ASCII);
assertEquals(asciiEncondedEnglishString, englishString);
Let’s see what happens when we use the UTF-8 encoding.
Let’s start with the core library.
Strings are immutable in Java, which means we cannot change a String character encoding. To achieve what we want, we need to copy the bytes of the String and then create a new one with the desired encoding.
First, we get the String bytes, and then we create a new one using the retrieved bytes and the desired charset:
String rawString = "Entwickeln Sie mit Vergnügen";
byte[] bytes = rawString.getBytes(StandardCharsets.UTF_8);
String utf8EncodedString = new String(bytes, StandardCharsets.UTF_8);
assertEquals(rawString, utf8EncodedString);
Alternatively, we can use the StandardCharsets class introduced in Java 7 to encode the String.
First, we’ll encode the String into bytes, and second, we’ll decode it into a UTF-8 String:
String rawString = "Entwickeln Sie mit Vergnügen";
ByteBuffer buffer = StandardCharsets.UTF_8.encode(rawString);
String utf8EncodedString = StandardCharsets.UTF_8.decode(buffer).toString();
assertEquals(rawString, utf8EncodedString);
Besides using core Java, we can alternatively use Apache Commons Codec to achieve the same results.
Apache Commons Codec is a handy package containing simple encoders and decoders for various formats.
First, let’s start with the project configuration.
When using Maven, we have to add the commons-codec dependency to our pom.xml:
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.14</version>
</dependency>
Then, in our case, the most interesting class is StringUtils, which provides methods to encode Strings.
Using this class, getting a UTF-8 encoded String is pretty straightforward:
String rawString = "Entwickeln Sie mit Vergnügen";
byte[] bytes = StringUtils.getBytesUtf8(rawString);
String utf8EncodedString = StringUtils.newStringUtf8(bytes);
assertEquals(rawString, utf8EncodedString);
Encoding a String into UTF-8 isn’t difficult, but it’s not that intuitive. This article presents three ways of doing it, using either core Java or Apache Commons Codec.