At the first glance of String.replaceAll(String regexp, String replacement) it seems very obvious what the method does, and most of the times it does exactly what you want, but under some conditions it does not. Let me show you this by showing you a few unit tests I wrote recently when fixing a bug in our production system:
@Test
public void regularReplace1() {
final String input = "this is a user description";
String result = input.replaceAll("\\{\\[user\\]\\}", "name");
Assert.assertEquals(input, result);
}
This is what we expected. Lets try something else:
@Test
public void regularReplace2() {
final String input = "this is a {[user]} description";
String result = input.replaceAll("\\{\\[user\\]\\}", "name");
Assert.assertEquals("this is a name description", result);
}
Still fine, but what happens when our replacement string contains a $ ?
@Test(expected = StringIndexOutOfBoundsException.class)
public void regularReplace3() {
final String input = "this is a {[user]} description";
String result = input.replaceAll("\\{\\[user\\]\\}", "name $");
Assert.assertEquals("this is a name $ description", result);
}
As you can see from my test, it expects StringIndexOutOfBoundsException to be thrown. Why? We’ll get to that later. Lets try moving the $ to the beginning of the string:
@Test(expected = IllegalArgumentException.class)
public void regularReplace4() {
final String input = "this is a {[user]} description";
String result = input.replaceAll("\\{\\[user\\]\\}", "$name");
Assert.assertEquals("this is a $name description", result);
}
Now replaceAll is going to throw IllegalArgumentException. If you know how regular expressions works you are probably starting to figure out what is going on. Lets try with another magical character, the backslash:
@Test
public void regularReplace5() {
final String input = "this is a {[user]} description";
String result = input.replaceAll("\\{\\[user\\]\\}", "\\ name");
// We expect them to be the same, but no
Assert.assertNotSame("this is a \\ name description", result);
}
No exception, but not what we expected. Lets move the backslash to the end of the line:
@Test(expected = StringIndexOutOfBoundsException.class)
public void regularReplace6() {
final String input = "this is a {[user]} description";
String result = input.replaceAll("\\{\\[user\\]\\}", "name \\");
Assert.assertEquals("this is a name \\ description", result);
}
Ok, now we are getting a StringIndexOutOfBoundsException. All this seems rather strange, but figuring out what is causing this is not hard. By reading the Java documentation for String.replaceAll will tell you that replaceAll is implemented like this:
Pattern.compile(regex).matcher(str).replaceAll(repl)
so moving on to the documentation for Matcher.replaceAll we can read the following:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
Really? Not quite what I expected, I thought the replacement string was just a string that would replace what we matched with the regular expression, turns out that replaceAll is a bit more powerful than that, and also somewhat dangerous. If we are dealing with a replacement string that comes from user input we must escape the special characters $ and \ before we can call string.replaceAll. To fix the bug in our system I first implemented a method, safeReplaceAll:
public static String safeReplaceAll(String input, String regex, String replacement)
{
if (input == null) { return null; }
// Escape special characters in replacement, then do replace
return input.replaceAll(regex, Matcher.quoteReplacement(replacement));
}
So here is the tests using our safeReplaceAll:
@Test
public void safeReplace1() {
final String input = "this is a user description";
String result = StringUtil.safeReplaceAll(input, "\\{\\[user\\]\\}", "name");
Assert.assertEquals(input, result);
}
@Test
public void safeReplace2() {
final String input = "this is a {[user]} description";
String result = StringUtil.safeReplaceAll(input, "\\{\\[user\\]\\}", "name");
Assert.assertEquals("this is a name description", result);
}
@Test
public void safeReplace3() {
final String input = "this is a {[user]} description";
String result = StringUtil.safeReplaceAll(input, "\\{\\[user\\]\\}", "name $");
Assert.assertEquals("this is a name $ description", result);
}
@Test
public void safeReplace4() {
final String input = "this is a {[user]} description";
String result = StringUtil.safeReplaceAll(input, "\\{\\[user\\]\\}", "$name");
Assert.assertEquals("this is a $name description", result);
}
@Test
public void safeReplace5() {
final String input = "this is a {[user]} description";
String result = StringUtil.safeReplaceAll(input, "\\{\\[user\\]\\}", "\\ name");
Assert.assertEquals("this is a \\ name description", result);
}
@Test
public void safeReplace6() {
final String input = "this is a {[user]} description";
String result = StringUtil.safeReplaceAll(input, "\\{\\[user\\]\\}", "name \\");
Assert.assertEquals("this is a name \\ description", result);
}
After this I realized that in Java 1.5 an overloaded method to String.replace(char oldChar, char newChar) was introduced:
String.replace(CharSequence target, CharSequence replacement)
… this method works as a drop in replacement for String.replaceAll! Sometimes reading the API before coding would save you time …