System.String

I’ve been reading some more of Applied Microsoft .NET Framework Programming this weekend.
This book covers a lot about the .NET Framework that I already know,
but delves deeper into topics, such as what happens at the
CLR level, and how C# translates to IL. I was reading the chapter about
strings on Saturday and decided to post a summary of what the author
(Jeffrey Richter) covers, since not all developers are aware of what
happens to strings under the hood.

Firstly, just to set the record straight, System.String is a
reference type and not a value type, which means each string is
allocated on the heap and not the stack. System.String is peculiar in
that it has a shortcut keyword associated with it
(string), which is a common practice with primitive value
types – System.Int32 (int), System.Char (char) etc. String references
also do not require the new keyword to allocate an instance,
the following line of code will allocate space on the heap, fill it
with string data and assign the reference to the variable:

string aString = “This is a literal string”;

Strings are immutable, meaning that once an instance of 
System.String is created the contents of the object cannot be changed
and is read-only. Immutable strings enable string references to be
copied throughout an application without the need for thread
synchronization.  A large number of string copies in an
application require minimal overhead because only references to
original string objects are copied and not the data itself. 
Immutable strings come with a price, concatenation of multiple string
references using the + operator generates temporary string allocations
for each concatenation operation. The following code will
generate at least 3 temporary strings on the heap before assigning the
result to the result, also a string allocation. Using StringBuilder or
StringWriter are preferred methods over string concatenation.

string a = “a”;
string b = “b”;
string c = “c”;
string d = “d”;
string e = “e”;
string result = a+ b + c + d + e;

The Framework has a neat way of optimizing literal strings in an
application. Each unique literal string reference in an application is
stored in an internal hash table. Whenever a new string allocation is
required the hash table is consulted first to determine if a reference
is already available. Since strings are immutable two references to the
same string object can coexists in the application without causing
conflict. Dynamic created strings (e.g. strings that are the result of
concatenation at runtime) are not added to the internal hash table and
thus require a heap allocation to instantiate them. The Framework
provides a static method called String.Intern, which will add a dynamic string to the hash table if not already interned. The String.IsInterned method behaves the same as String.Intern
accept that it’ll only return a reference to an interned string if it
exists and not add the string to the hash table, otherwise it’ll return
null. Note, once a string is interned it is left in the hash table for
the duration of the default AppDomain, there is no garbage collection
of intern’d strings during the life of the application.

The following code demonstrates string intern:

using System;

namespace StrTest
{
///
/// Test string class.
///
class MainApp
{
///
/// The main entry point for the application.
///
[STAThread]
static void Main()
{
// Strings are immutable (cannot be changed)
string literal1 = “An immutable string”;

// Strings are reference types, following code will return true.
string literal2 = literal1;
Console.WriteLine(“Test1: {0}”,
Object.ReferenceEquals(literal1, literal2));

// The following code will return false (as well as boxing value types).
int intOne = 1, intTwo = intOne;
Console.WriteLine(“Test2: {0}”,
Object.ReferenceEquals(intOne, intTwo));

// Literal strings are interned – shared reference to two identical string.
string anotherLiteral = “An immutable string”;

// The following code will return true.
Console.WriteLine(“Test3: {0}”,
Object.ReferenceEquals(literal1, anotherLiteral));

// Dynamic strings are not interned.
// The following code will return false.
string dynamicString1 = “An”;
string dynamicString2 = dynamicString1 + ” immutable string”;
Console.WriteLine(“Test4: {0}”,
Object.ReferenceEquals(literal1, dynamicString2));

// ReferenceEquals is faster than Equals because Equals tests the value of strings.
// Following code will return true.
Console.WriteLine(“Test5: {0}”,
Object.Equals(literal1, dynamicString2));

// C# compiler will perform one concat operation for concat of literal strings.
// Following code returns true because concat’d literal string is also interned.
string concatStr = “An” + ” ” + “immutable” + ” ” + “string”;
Console.WriteLine(“Test6: {0}”,
Object.ReferenceEquals(literal1, concatStr));

// Following code will return false. Multiple concats should be avoided.
// Following will perform 2 concat operations with at least 2 newly created strings.
string anStr = “An “;
string immutableStr = “immutable “;
string stringStr = “string”;
string badConcat = anStr + immutableStr + stringStr;
Console.WriteLine(“Test7: {0}”,
Object.ReferenceEquals(literal1, badConcat));

// Strings can be interned, following code will return true.
// Intern method will add string to the internal hash table if not already present.
// Note, interned strings remain for the life of the default appDomain (not garbage collected).
string internedStr = String.Intern(dynamicString2);
Console.WriteLine(“Test8: {0}”,
Object.ReferenceEquals(literal1, internedStr));

// It is possible to check for an interned string w/o adding it to the hash table.
// Following code will return null.
string dynamicString3 = dynamicString2 + ” which is not interned”;
Console.WriteLine(“Test9: {0}”, (null ==
String.IsInterned(dynamicString3) ? “null” : “non-null”));

// Following code will return non-null.
string dynamicString4 = “An immutable string”;
Console.WriteLine(“Test10: {0}”, (null ==
String.IsInterned(dynamicString4) ? “null” : “non-null”));

// Further proof that strings are references…
// ChangeStr method passes string reference by reference (reference to reference).
// This means the reference variable parStr can be reassigned to another string
// on the heap. The literal string in the function is a heap object.
string parStr = “original string”;
ChangeStr(ref parStr);
Console.WriteLine(“Test 11: {0}”, parStr);
}

///
/// Change the passed string value.
///
///
private static void ChangeStr(ref string par)
{
par = “a new string”;
}
}
}