We developers often think about time complexity. We are familiar with the fact that time complexity is the main factor, which helps us create really powerful algorithms. When we need to store data requiring random access, we use hash sets or dictionaries. Why do we do this? Because we know, that time complexity is O(1) for that kind of stores.
When we use Dictionary<TKey,TValue> we think that access via key is really fast. We expect O(1) time complexity. We are right when we think about algorithms usually implemented inside dictionaries. The truth is that we are right only when we consider dictionary implementation only. What we are missing sometimes is the time complexity of two operations that must be calculated while accessing the value by the key. Those operations are:
- Hash function of the delivered key – that one is calculated once. Calculated hash can be cached, because keys are read only.
- The second operation is equality function which is executed at least once. The number of calls depends on data distribution in buckets.
Now we should understand the purpose of this article. When TKey is represented by a long sequence of bytes or characters (the string type is a good example), each dictionary access requires comparing sequences of these bytes/characters too. When a sequence is a long one, more bytes/characters must be compared.
When you get millions of items in the dictionary, you still have a benefit of using dictionaries. When string used as a key becomes very long, the time complexity become closer to O(log n) than O(1), because of the cost of calculation of equality each time you want to access dictionary item.
If we need to use the string as a key, there is a workaround. The first option is to use in C# String.Intern method, but this method is not recommended for big set of strings. Therefore I will focus on the second option, a better one, in my opinion. We need to define in our code static readonly string and use this string’s reference everywhere where we used a specific string before. The method String.Equals (see the sources here) uses Object.ReferenceEquals at the beginning. It means that the comparison of the whole characters’ sequence in the string will not be required. NOTE! Be careful with the constant string. Constants can be placed as inline code so there is no guarantee of identical reference.
If you can impact the design of the key definition in dictionaries, always consider using of other types as keys e.g. integer types. Believe me – processors are designed to compare them 😉.