String intern pool
From Wikipedia, the free encyclopedia
In some modern object-oriented programming languages, including Java and .NET languages, the string intern pool is a data structure managed internally by the platform or virtual machine to facilitate efficient implementation of certain string processing tasks. The pool contains a single copy (called the intern) of each distinct string that is currently represented by an interned string object in the system. By invoking a method of the string class (for example String.intern() in Java), the programmer has access to this unique string object.
The primary use for the string intern pool is to speed up string comparisons, which are sometimes a performance bottleneck in applications (such as compilers and dynamic programming language runtimes) that rely heavily on hash tables with string keys. Checking that two different strings are equal involves examining every character of both strings. This is slow for several reasons: it is inherently O(n) in the length of the strings; it typically requires reads from several regions of memory, which take time; and the reads fills up the processor cache, meaning there is less cache available for other needs. However, if the strings being compared are interned, then they are equal if and only if they both refer to the same string object. This check can be done much faster and without reading from the strings themselves.
Another use of the string intern pool is to reduce memory usage if a large number of active objects store a given string created dynamically during the execution of a program and is often repeated (for instance, it is read from the network, or from storage). Such strings may include magic numbers or network protocol information.
[edit] The intern pool in other programming languages
Even if the standard library of a programming language does not support string interning, it is possible to implement an intern pool in user code.
An example of such an implementation follows, in the C++ programming language. Note that this implementation returns pointers to arrays of characters, not String objects.
class StringTable {
char** array;
int* count;
int size;
int numStrings;
public:
StringTable() {
size = 20;
array = new char*[size];
count = new int[size];
numStrings = 0;
}
int addString(const char* ch) { // this method adds a String to the pool and returns its index
if (numStrings == 0) { // if there is no string in the pool, add first
array[0] = new char[strlen(ch)+1];
strcpy(array[0], ch);
numStrings++;
return 0;
}
int i = 0;
for (i; i < numStrings; i++) { // compares, if the String is already in the pool
if (strcmp(ch, array[i]) == 0) {
count[i]++;
return i;
}
}
if (i == size) {
return -1;
}
array[i] = new char[strlen(ch)+1]; // if not, add String to the pool and return its index
strcpy(array[i], ch);
numStrings++;
return numStrings-1;
}
char* getString(int index) const {
return array[index];
}
};
class String {
int length;
char* text;
int index;
public:
static StringTable* table;
String(const char* ch) {
length = strlen(ch);
text = new char[length+1];
strcpy(text,ch);
index = table->addString(text);
}
char* intern() { // return pointer to the String in pool
if (index == -1)
return 0;
return table->getString(index);
}
};
StringTable* String::table = new StringTable();

