Skeleton Coder

Saturday, July 29, 2006

Implementing 'equals()' method is no longer easy

The Object class provides two important methods - equals() and hashCode().

There are number of tutorials and books that suggests the correct way of implementing these methods. Please have a look at the Chapter 3 of the Effective Java book. This book explains with clear examples on implementing equals() method and hashCode() methods, and the common mistakes. In this post, we can look at one possible problem that was not covered in that book or in most other books. But before proceding, please read the chapter 3 of Effective Java Book.


An equals() method implementation said to be correct, if the following conditions are satisfied.
[source: javadocs]


  1. It is reflexive: for any non-null reference value x, x.equals(x) should return true.
  2. It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
  3. It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
  4. It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
  5. For any non-null reference value x, x.equals(null) should return false.
A hashCode() method implementation said to be correct, if the following conditions are satisfied.
[source: javadocs]


  1. Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  2. If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  3. It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Read the condition for consistancy one again for the equals method
It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.

Logically, if the two objects are equal at a particular instant of time, but one object changes its value later, then the two objects should not be equal. This is true as per the specifications. But from an implementation perspective, this will become a source of potential bugs in the system.

Let me explain with an example.
Consider we have a graph or a geographic map. We should mark a point with a text in the graph. Now we have to develope an application that stores the point and the text, like a Server side Java implementation for Google Maps.

Consider the following class.

public final class Location{

private int latitude;
private int longitude;

public Location(){
this(0,0);
}

public Location(int latitude, int longitude){
this.latitude = latitude;
this.longitude = longitude;
}

public int getLatitude() { return latitude; }
public int getLongitude(){ return longitude; }

public void setLatitude(int latitude){this.latitude=latitude; }
public void setLongitude(int longitude){this.longitude=longitude; }

public boolean equals(Object obj){
if( !(obj instanceof Location))
return false;

Location location = (Location) obj;

return latitude== location.latitude
&& longitude== location.longitude;
}

public int hashCode(){
// Following the same instuctions as mentioned
// in Effective Java

int result = 17;
result = 37*result + latitude;
result = 37*result + longitude;
return result;
}
}



This is a perfect implementation of equals() and hashCode() methods as specified in Java API Docs and by Effective Java.

The equals method satisfy all the 5 conditions. The hashCode() method satisfies all the 3 conditions.

Let us write a simple Java program that stores the map of Location --> Place Name pairs in a HashMap and simulates the following situation.

  1. Add the mapping for the "Madras" city. The Latitude : 13, Longitude : 81. (Approximating to integer values).
  2. Make a small correction in the latitude-longitude pair (13,80).
  3. Change the city name from "Madras" to "Chennai". That is add the new value to the Map.

See the sample program.


import java.util.HashMap;

public class LocationTest{

public static void main(String []args) {

HashMap map = new HashMap();

Location loc = new Location(13,81);
map.add(loc, "Madras");

loc.setLongitude(80);

/* Since the key is already present in the map
* The add method should replace the
* existing entry.
* if you want to explicitly remove the prev mapping,
* uncoment the following line.
*/

// map.remove(loc);

map.add(loc, "Chennai");

System.out.println("Number of elements : "+map.size());

}

}

Guess the output!!!

The expected result is 'Number of elements : 1'. But the actual result is 'Number of elements : 2'

Removing the first location explicity will not change the result. Can you guess where we went wrong?

The problem is the element that is already present in the map is modified.

Let us see this with a small illustration.

When adding the location for the first time [13,81] the hashCode is 23835. Assume that the HashMap implementation simply finds the remainder by 3 and chooses the appropriate hash bucket. So the entry is added to the Hash Bucket 0.

The Hash Bucket looks as follows.




----- ---------
0 --> [13,81]
----- ---------
1
-----
2
-----




Now we are changing the longitude from 81 to 80.





----- ---------
0 --> [13,80]
----- ---------
1
-----
2
-----



The new hashCode is 23834. Hence the this entry should go to bucket 2.
In bucket 2, there is no entry that is equal to [13,80]. Hence a new value is added.





----- ---------
0 --> [13,80]
----- ---------
1
----- ---------
2 --> [13,80]
----- ---------




How to solve this issue?


  • Notify the HashMap to update the hash buckets whenever an entry is changed.

  • Change the Map implementation to check all the elements in the HashMap whenever a new entry is added.

  • Make the keys immutable.


The first two solutions try to patch the problem. It will have too much overhead making it very difficult to implement or very inefficient to use. The third solution identifies the root cause of the problem, so we can avoid the problem from the root.

Never use a mutable object as a key for Maps and Sets.
From this example, you would have understood the need for making the Classes immutable. This is also one of the reason for the classes like String, Integer, Long, Float, Double, etc to be immutable.
Hence the correct implementation is,

public final class Location{
private int latitude;
private int longitude;

public Location(){
this(0,0);
}
public Location(int latitude, int longitude){
this.latitude = latitude;
this.longitude = longitude;
}
public int getLatitude() { return latitude; }
public int getLongitude(){ return longitude; }

public boolean equals(Object obj){
if( !(obj instanceof Location))
return false;

Location location = (Location) obj;
return latitude == location.latitude
&& longitude== location.longitude;
}

public int hashCode(){
// Following the same instuctions as mentioned
// in Effective Java

int result = 17;
result = 37*result + latitude;
result = 37*result + longitude;
return result;
}
}

Summary:
  1. Use only immutable objects as Keys for Maps and Sets.
  2. Don't use mutable fields for equals comparison and hashCode comutation. (See how equals and hashCode work for StringBuffer class).
  3. The equals and hashCode methods should be consistent over time, although the Java Specs doesn't mandate them.

Saturday, July 08, 2006

Finding Average is no longer easy

Do you think you are good in Programming?
Then solve the following problem.

Write a program to find the average of two integers.


I too thought this is very easy before, till I found the problem. Lets look at the code for finding the average of two numbers.

The detailed specification.

A class Average with a method average() which takes 2 integers as parameters, and returns the average of the two numbers with the decimal part truncated.

Template for this program.

public class Average{
public static int average(int a, int b){
// your code here.
}
}

Sample input and output:

1. { 5, 10 } --> 7 // That is 7.5 truncate to 7
2. { 6, 10 } --> 8
3. { 5, 5 } --> 5
4. {-5, 10 } --> 2
5. {-5, -3 } --> -4

Looks Simple?

Simple Solution (But wrong):


Now, you would have come up with the solution

public class Average{
public static int average(int a, int b){
return (a+b)/2;
}
}
This would be the solution that most programmers (including me) will give.

But there is a big bug in this code.
Yes!

call the method as
Average.average(Integer.MAX_VALUE,1);
or
Average.average(2147483647,1);

Expected Result:
1073741824
Actual Result:
-1073741824

The two numbers are positive. Both the arguments are valid integers. Any mathematician can prove that the average of two integers will be within the range of the smallest number and the largest number.
So I should get only a positive return value.

What went wrong, in our program?

The great culprit is Integer overflow.

Integer.MAX_VALUE = 2147483647 = 0x7FFFFFFF
a = 0x7FFFFFFF
b = 0x00000001
----------------
a+b = 0x80000000 // Value is negative.
----------------
/2 = 0xC0000000 // Since Sign bit is reset to '1' after division

That is the integer overflows before we divide the numbers by 2. This creates the problem.

Solution:

public class Average{
public static int average(int a, int b){
return (a+b) >>> 1;
}
}

That is, in Java we have a bit-wise operator that right shifts along with the sign bit. This will solve the problem. But what if we have to find the average of 3 integers, then for 'n' integers? This is not easy to extend.

Another solution is,

public class Average{
public static int average(int a, int b){
int high = a>b?a:b;
int low = a<=b?a:b; return (low+(high-low)/2);
}
}

This is also not easy to extend, but atleast gives some hope to extend. But surely not easy.
I leave the assignment to find the average of 'n' integers correctly to the readers, to appreciate the difficulty of finding the average.

Binary Search:

If you think why we need an integer average, instead of a float, consider the example of finding the median in an array. To find the middle element, we will use the above specified function. Also, the well known Binary Search algorithm is another example that needs the above function.

In Java, the binary search algorithm is implemented using the initial version of the code.
The java.util.Arrays class implements the binary search algorithm. Have a look at the Java source code.

public static int binarySearch(int[] a, int key) {
int low = 0;
int high = a.length-1;

while (low <= high) {
int mid = (low + high) >> 1;
int midVal = a[mid];

if (midVal <>
low = mid + 1;
else if (midVal > key)
high = mid - 1;
else
return mid; // key found
}
return -(low + 1); // key not found.
}

The line mid = (low + high)>>1 contains the bug. It is exactly equivalent to dividing by 2.

A bug is reported in Sun' Bug Database.

This piece of code was writen by Joshua Bloch, the author of Effective Java: Programming Language Guide, one of the must-read books for Java Developers.