Monday, July 5, 2010

Equality in C# – Part 2(GetHashCode)




In my previous post Equality in C# – Part 1, I had explained how .Equal function behaves in C#. Microsoft recommends if you are writing an own functionality of .Equal method you should implement .GetHashCode method as well. In fact, C# compiler will emit a warning message if you do not override GetHashCode method while overriding .Equal method.

The reason behind this is that implementation of System.Collections.Hashtable type and System.Collections.Generic.Dictionary requires that if any two objects are equal then they should return the same HasCode. If we are writing our own implementation of object equality you should implement .GetHasCode to ensure two equal objects should return same hash value.

Basically, when you add a Key/Value pair in Hashtable / Dictionary, first has code of the key object is obtained first. The integer has code will indicate which “bucket” key/value will be stored. If HasCode of the key is changed then object will become unsearchable in Hashtable / Dictionary. Let’s see one example:
using System;
using System.IO;
using System.Globalization;

public class Example
{
  public static void Main()
  {

    System.Collections.Hashtable table = new System.Collections.Hashtable();
    A obj1 = new A();   
    obj1.Name = "Vikas";

    table.Add(obj1, obj1.Name);// Here the Hascode of Obj1 is stored. 

    obj1.Name = "Changed "; // Here HasCode of obj1 will be changed due to our bad implementation

    Object name = table[obj1];// will return Null .      
 
  }
}
public class A : IEquatable<A>
 
{
  private String _name;
  public String Name
  {
    get
    {
      return _name;
    }
    set
    {
      _name = value;
    }
  }
  public override int GetHashCode()
  {
    return Name.GetHashCode();
  }
  #region IEquatable<A> Members

  public bool Equals(A other)
  {
    if (String.Compare(Name, other.Name, StringComparison.OrdinalIgnoreCase) == 0)
      return true;
    else
      return false;
  }

  #endregion
}

When defining Hash Code try to follow these guidelines:

1)Use an algorithm that gives good random distribution for best performance.
2)The fields used in the algorithm should be immutable. They should be initialized in the constructor of the class and should not be changed in the life cycle of the object.
3)Your algorithm should be executed very quickly.
4)Object with the same value should return same hash code.
5)You can use base.GetHashCode if you want.

Object class does not know anything about its derived class’s values so object’s GetHashCode implementation guarantee a uniquely identified number with in the app domain. Its also guarantee that the number will not be changed throughout object’s life time. If you want to return a unique number for your object you can use System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode function which takes an object as parameter.

System.ValueType’s implementation of GetHashCode function uses reflection so we should avoid using it because reflection is slow.

One last point you should not store HashCode of any object in your database directly until you have your own unique implementation of GetHashCode method. Do not believe in GetHasCode implementation of object class as it might differ in different version of CLR.