In today's session, I show the group a tricky, tricky integer overflow detail that is pervasive in binary search and mergesort implementations. How tricky is this whole thing? According to the Google Research blog, most of the first year CMU Ph.D. students' implementations of the rudimentary algorithms were flawed due to overflow problems. Moreover, various cybersecurity lists name integer overflow as one of the top software flaws. The Google blog also recommends Jon Bentley's classic Programming Pearls (Amazon affiliate link) book which motivates this and other Google-style large scale questions such as sorting billions of integers efficiently limited memory. You can probably snag that book in the local library, but we'll cover these and other popular interview questions. But back to the main point, integer overflow is a popular conceptual interview question in some quarters. The interviewer wants to see if you understand integer overflow and by extension an understanding of how machines work. I didn't have time to go over this in depth today, but here are some more details.
Integer overflow happens because machines (mostly) keep integers in registers during arithmetic operations and these registers are fixed precision and of a fixed width, usually either 32-bits or 64-bits. To see what happens you need to experiment a bit to see what happens when an int gets near the limits of the fixed width. Consider the following Java program:
class Test {
public static void main(String[] args) {
int maxInt = 1 << 31 - 1; // Integer.MAX_VALUE
int overMax = 1 << 31;
System.out.println(new Integer(maxInt));
System.out.println(new Integer(overMax));
System.out.println(new Integer(maxInt+2));
}
}
Recall that 1 << 31 = 2^31. What would you guess the above program prints out? For a signed integer, the maximum representable value is 2^31-1. The left most bit is the sign bit. When this value is exceeded, the integer overflow happens. There are a few ways programming languages resolve integer overflows. The most popular and cheapest way is to essentially do nothing and let the integer wrap to negative values (i.e., the sign bit is set and the rest of the bits represent the magnitude of the negative int). What are these negative values? They are typically represented as two's complement but that is beyond the scope of this discussion. The main point is that in many programming languages, integer overflow results in wrapping of ints to negative values. This can be a problem if you aren't careful. Consider the following code. The while loop looks innocent enough, but this loop won't be terminating when one intuitively thinks, when we add maxInt to i. Upon that addition, i wraps and we stay in negative territory such that i will mostly definitely be < 1000 for a long, long while. In some cases, integer wrapping may lead to unexpected infinite loops.
int i=0, j=0;
int A[] = { 1, 2, 3, maxInt };
while (i < 1000) {
i += A[j % A.length];
A[j % A.length] = 2;
j++;
System.out.printf("i %d j %d\n", i, j);
}
So the moral of the story is to be careful if your code can conceivably hit integer overflow and you are in an environment where integer overflow is handled by wrapping. It is a tricky interview question, but now you are aware of it. For extra credit, how can you detect that integer overflow happened when ints wrap?
One of you pointed out that in Python, integer overflows don't apply, which is mostly true since Python 3 (and Ruby) defaults to arbitrary-precision integers ("big ints") and Python 2 automatically promotes integers to arbitrary-precision as needed, but if one were to use numpy or pandas, those packages use native fixed-precision integers where wrapping will happen. In the Python 2 case, one would still have integer overflow, but Python deals with the overflow by coercing the integer to a big int. Java, C/C++, and C# deal with integer overflow by silently wrap to negative or 0. C# does have a checked decoration which can be used to have the program generate an exception upon overflow.
Comments
Post a Comment