When writing a caching service I recently noticed that the Dictionary object uses huge amounts of memory when working with big datasets. With around 10 million objects, it would eat up to 8GB of memory. Wondering how to reduce that number I ran some tests and found out that simply changing the Dictionary key type to a numeric type reduced the memory usage by at least 500%.
I was able to free quite a considerable amount of memory, but was faced with a new problem. I still needed to access the objects with a string. This turned out to be much easier to solve than I initially expected .
private static SHA256 _sha256 = new SHA256Managed();
public static Int64 GetInt64Hash(string strText)
//SHA256 provider is not thread safe
byte hashText = _sha256.ComputeHash(Encoding.Unicode.GetBytes(strText));
return BitConverter.ToInt64(hashText, 0) ^ BitConverter.ToInt64(hashText, 8) ^ BitConverter.ToInt64(hashText, 24);
This little helper function returns an Int64 hash value of a string. It allows us to map a numeric representation of a string type key into the Dictionary by using an integer hash of the key as a value.
There are two downsides to this solution though:
First one is the added CPU usage used by hashing. A single core on my machine can process around 750,000 hashes per second, which is quite a lot, but if you’re working on something really high performance it might still not be the best idea.
This being my biggest concern, I once again ran some tests and so far so good, found no collisions with 100 million hashed GUID strings. The maximum number of values a 64 bit number can store is 9,223,372,036,854,775,807.