String Equality and Interning

Strings in Java are objects, but resemble primitives (such as ints or chars) in that Java source code may contain String literals, and Strings may be concatenated using the “+” operator. These are convenient features, but the similarity of Strings to primitives sometimes causes confusion when Strings are compared.

Java provides two basic mechanisms for testing for equality. The “==” operator can be used to test primitive values for equality, and can also be used to determine if two object references point to the same underlying object. For Java objects, the equals(Object) method will return true if the argument is equal to the object on which the method is invoked, where equality is defined by the object’s class semantics.

Since Strings are objects, the equals(Object) method will return true if two Strings have the same contents, i.e., the same characters in the same order. The == operator will only be true if two String references point to the same underlying String object. Hence two Strings representing the same content will be equal when tested by the equals(Object) method, but will only by equal when tested with the == operator if they are actually the same object.

To save memory (and speed up testing for equality), Java supports “interning” of Strings. When the intern() method is invoked on a String, a lookup is performed on a table of interned Strings. If a String object with the same content is already in the table, a reference to the String in the table is returned. Otherwise, the String is added to the table and a reference to it is returned. The result is that after interning, all Strings with the same content will point to the same object. This saves space, and also allows the Strings to be compared using the == operator, which is much faster than comparison with the equals(Object) method.

Confusion can arise because Java automatically interns String literals. This means that in many cases, the == operator appears to work for Strings in the same way that it does for ints or other primitive values. Code written based on this assumption will fail in a potentially non-obvious way when the == operator is used to compare Strings with equal content but contained in different String instances. Figure 1 illustrates several cases.



/**
 * Example of comparing Strings with and without intern()ing.
 */
public class StringInternExample {
    private static char[] chars = 
        {'A', ' ', 'S', 't', 'r', 'i', 'n', 'g'};

    public static void main(String[] args) {
        // (0) For the base case, we just use a String literal
        String aString = "A String";

        // (1) For the first test case, we construct a String by 
        // concatenating several literals. Note, however, 
        // that all parts of the string are known at compile time.
        String aConcatentatedString = "A" + " " + "String";

        printResults("(1)",
            "aString", aString, 
            "aConcatentatedString", aConcatentatedString);

        // (2) For the second case, construct the same String, but
        // in a way such that it's contents cannot be known
        // until runtime.
        String aRuntimeString = new String(chars);

        // Verify that (0) and (2) are the same according 
        // to equals(...), but not ==
        printResults("(2)",
            "aString", aString, 
            "aRuntimeString", aRuntimeString);

        // (3) For the third case, create a String object by
        // invoking the intern() method on (3).
        String anInternedString = aRuntimeString.intern();

        // Verify that (0) and (3) now reference the same
        // object.
        printResults("(3)",
            "aString", aString, 
            "anInternedString", anInternedString);

        // (4) For the forth case, we explicitly construct
        // String object around a literal. 
        String anExplicitString = new String("A String");

        // Verify that (0) and (4) are different objects.
        // Interning would solve this, but it would be
        // better to simply avoid constructing a new object
        // around a literal.
        printResults("(4)",
            "aString", aString, 
            "anExplicitString", anExplicitString);

        // (5) For a more realistic test, compare (0) to
        // the first argument. This illustrates that unless
        // intern()'d, Strings that originate externally
        // will not be ==, even when they contain the
        // same values.
        if (args.length > 0) {
            String firstArg = args[0];
            printResults("(5)",
                "aString", aString, 
                "firstArg", firstArg);

            // (6) Verify that interning works in this case
            String firstArgInterned = firstArg.intern();
            printResults("(6)",
                "aString", aString, 
                "firstArgInterned", firstArgInterned);
        }
    }

    /**
     * Utility method to print the results of equals(...) and ==
     */
    private static void printResults(String tag,
        String s1Name, String s1, String s2Name, String s2) {
        System.out.println(tag);
        System.out.println("  " +
            s1Name + " == " + s2Name + " : " + (s1 == s2));
        System.out.println("  " +
            s1Name + ".equals(" + s2Name + ") : " + s1.equals(s2));
        System.out.println();
    }

}



Figure 1. A simple class to test String equality.

Figure 2 shows the result of running the test code:


> java StringInternExample "A String"
(1)
  aString == aConcatentatedString : true
  aString.equals(aConcatentatedString) : true

(2)
  aString == aRuntimeString : false
  aString.equals(aRuntimeString) : true

(3)
  aString == anInternedString : true
  aString.equals(anInternedString) : true

(4)
  aString == anExplicitString : false
  aString.equals(anExplicitString) : true

(5)
  aString == firstArg : false
  aString.equals(firstArg) : true

(6)
  aString == firstArgInterned : true
  aString.equals(firstArgInterned) : true



Figure 2. Result of running the StringInternExample class

In this example, we start with a String variable (aString) that we initialize with a String literal. As noted above, Java will automatically intern String literals, so the String is added to the intern table. There are 6 test cases:


  1. In the first case we construct a String by concatenating three String literals. The resulting String can be resolved at compile time, and is also interned. Hence in the output, the Strings are equal according to both == and equals(...).
  2. In the second test we construct a String from an array of characters. This String will not be automatically interned, so the == operator indicates that the two Strings are different objects.
  3. Invoking the intern() method on the String from case (2) returns a reference to the instance of the String from the intern table.
  4. Explicitly constructing a new String object initialized to a literal similarly produces a non-interned String. (Invoking intern() on this String would return a reference to the String from the intern table.)
  5. A String from the argument list will also not be interned.
  6. Explicitly invoking intern() returns a reference to the interned String.

The point of this example is to illustrate that the == operator should be used with care when comparing Strings. It is appropriate in cases where you need to check for references to the same object, in cases where you can be certain that both Strings have been automatically interned, or in cases where it is useful to explicitly intern Strings (e.g., if you will be repeatedly comparing the same set of Strings). Otherwise, you should use the equals(...) method to test for String equality.


Categories



Pages

Meta


Copyright 2003-2007 - Philip Isenhour