Saturday, July 29, 2006

Implementing 'equals()' method is no longer easy

The Object class provides two important methods - equals() and hashCode().

There are number of tutorials and books that suggests the correct way of implementing these methods. Please have a look at the Chapter 3 of the Effective Java book. This book explains with clear examples on implementing equals() method and hashCode() methods, and the common mistakes. In this post, we can look at one possible problem that was not covered in that book or in most other books. But before proceding, please read the chapter 3 of Effective Java Book.


An equals() method implementation said to be correct, if the following conditions are satisfied.
[source: javadocs]


  1. It is reflexive: for any non-null reference value x, x.equals(x) should return true.
  2. It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
  3. It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
  4. It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
  5. For any non-null reference value x, x.equals(null) should return false.
A hashCode() method implementation said to be correct, if the following conditions are satisfied.
[source: javadocs]


  1. Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  2. If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  3. It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Read the condition for consistancy one again for the equals method
It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.

Logically, if the two objects are equal at a particular instant of time, but one object changes its value later, then the two objects should not be equal. This is true as per the specifications. But from an implementation perspective, this will become a source of potential bugs in the system.

Let me explain with an example.
Consider we have a graph or a geographic map. We should mark a point with a text in the graph. Now we have to develope an application that stores the point and the text, like a Server side Java implementation for Google Maps.

Consider the following class.

public final class Location{

private int latitude;
private int longitude;

public Location(){
this(0,0);
}

public Location(int latitude, int longitude){
this.latitude = latitude;
this.longitude = longitude;
}

public int getLatitude() { return latitude; }
public int getLongitude(){ return longitude; }

public void setLatitude(int latitude){this.latitude=latitude; }
public void setLongitude(int longitude){this.longitude=longitude; }

public boolean equals(Object obj){
if( !(obj instanceof Location))
return false;

Location location = (Location) obj;

return latitude== location.latitude
&& longitude== location.longitude;
}

public int hashCode(){
// Following the same instuctions as mentioned
// in Effective Java

int result = 17;
result = 37*result + latitude;
result = 37*result + longitude;
return result;
}
}



This is a perfect implementation of equals() and hashCode() methods as specified in Java API Docs and by Effective Java.

The equals method satisfy all the 5 conditions. The hashCode() method satisfies all the 3 conditions.

Let us write a simple Java program that stores the map of Location --> Place Name pairs in a HashMap and simulates the following situation.

  1. Add the mapping for the "Madras" city. The Latitude : 13, Longitude : 81. (Approximating to integer values).
  2. Make a small correction in the latitude-longitude pair (13,80).
  3. Change the city name from "Madras" to "Chennai". That is add the new value to the Map.

See the sample program.


import java.util.HashMap;

public class LocationTest{

public static void main(String []args) {

HashMap map = new HashMap();

Location loc = new Location(13,81);
map.add(loc, "Madras");

loc.setLongitude(80);

/* Since the key is already present in the map
* The add method should replace the
* existing entry.
* if you want to explicitly remove the prev mapping,
* uncoment the following line.
*/

// map.remove(loc);

map.add(loc, "Chennai");

System.out.println("Number of elements : "+map.size());

}

}

Guess the output!!!

The expected result is 'Number of elements : 1'. But the actual result is 'Number of elements : 2'

Removing the first location explicity will not change the result. Can you guess where we went wrong?

The problem is the element that is already present in the map is modified.

Let us see this with a small illustration.

When adding the location for the first time [13,81] the hashCode is 23835. Assume that the HashMap implementation simply finds the remainder by 3 and chooses the appropriate hash bucket. So the entry is added to the Hash Bucket 0.

The Hash Bucket looks as follows.




----- ---------
0 --> [13,81]
----- ---------
1
-----
2
-----




Now we are changing the longitude from 81 to 80.





----- ---------
0 --> [13,80]
----- ---------
1
-----
2
-----



The new hashCode is 23834. Hence the this entry should go to bucket 2.
In bucket 2, there is no entry that is equal to [13,80]. Hence a new value is added.





----- ---------
0 --> [13,80]
----- ---------
1
----- ---------
2 --> [13,80]
----- ---------




How to solve this issue?


  • Notify the HashMap to update the hash buckets whenever an entry is changed.

  • Change the Map implementation to check all the elements in the HashMap whenever a new entry is added.

  • Make the keys immutable.


The first two solutions try to patch the problem. It will have too much overhead making it very difficult to implement or very inefficient to use. The third solution identifies the root cause of the problem, so we can avoid the problem from the root.

Never use a mutable object as a key for Maps and Sets.
From this example, you would have understood the need for making the Classes immutable. This is also one of the reason for the classes like String, Integer, Long, Float, Double, etc to be immutable.
Hence the correct implementation is,

public final class Location{
private int latitude;
private int longitude;

public Location(){
this(0,0);
}
public Location(int latitude, int longitude){
this.latitude = latitude;
this.longitude = longitude;
}
public int getLatitude() { return latitude; }
public int getLongitude(){ return longitude; }

public boolean equals(Object obj){
if( !(obj instanceof Location))
return false;

Location location = (Location) obj;
return latitude == location.latitude
&& longitude== location.longitude;
}

public int hashCode(){
// Following the same instuctions as mentioned
// in Effective Java

int result = 17;
result = 37*result + latitude;
result = 37*result + longitude;
return result;
}
}

Summary:
  1. Use only immutable objects as Keys for Maps and Sets.
  2. Don't use mutable fields for equals comparison and hashCode comutation. (See how equals and hashCode work for StringBuffer class).
  3. The equals and hashCode methods should be consistent over time, although the Java Specs doesn't mandate them.

2 Comments:

Blogger Shravan said...

Very Good Explanation. It would be great if you can explain a little more about hash buckets(just a couple of lines as to what they are and how the system uses them) that you mentioned in your blog. I'm a java coder since 1 year now and I know that I need to go a long way. Blogs like these would definetely help. Thanks for the good work. Keep going

5:42 AM  
Blogger Shravan said...

http://en.wikipedia.org/wiki/Hash_table

ok,I found the hash bucket explanation myself :-)

5:59 AM  

Post a Comment

<< Home