Unposted drafts – Algorithm optimization: indexation
September 21, 2011 at 12:58 pm Leave a comment
Many a times in programming you’ll use the magic words For or While, or your language equivalents, but sometimes a little too much. Any loop you’ll use will slow down your algorithm by n times t. The moment you add a millisecond of computing to 1 iteration, you’ll automatically end up with n times that millisecond as extra time spent.
In the area of looking up or finding an element in an array of n elements, a lot of times we won’t bother to optimize or perhaps we can’t imagine an optimization to a simple variable comparison. Sure we can cheat our way out with a break or exit from the loop when we found our needle, but that’s as far as it goes.
The moment where we again put our find method inside another loop, the time spent with our lookup is going to matter if we value our users time.
Indexation is a common concept of lookup optimization, but why, and how do we do it?
The basic idea of indexation, in programming that is, is that given a space of addressable memory, and a lot of possible objects to find, we can find out if that object exists – by already knowing where it’s supposed to be.
It seems like the world upside down; instead of trying find the object we’re looking for, we’re supposed to know already know, and perhaps find out if we’re right.
We can’t of course stick our hand in the mud randomly and hope for the best. We need to find a way first to make the objects we’re going to look up later, are either where their supposed to be, or nowhere.
If we look at our indexation masterminds: databases, we can find a complicated amount of indexation techniques that work pretty well. Our fancy scripting languages may have a little bit of that in their non-default datastructures and functions. But I want to focus on the simplest form of indexation, well, second simplest.
About arrays
In the world of programming, the standard array is going to be your template for your algorithm. The array works in a way that you arrange memory at the position of variable x, starting from index 0, counting to n, and you assign or requests values in that array by supplying p as in x[p]. Internally this works as a function resulting in another address, f(x,p) = x + p * c. In this formula, c is the size of your datatype; if your type is a byte then c = 1, if your type is a 32bit integer it’s 4, etc.
There’s no magic lookup in regular arrays, its just calculating the start of the requested fragment in a larger fragment of memory, and this works pretty fast.
Simplification to its core when going back to the indexation aspect; e.g. given the array x = {0,1,2,3,4,5}, we’ll know that the number 4 will reside in x[4]. But of course, why look up something you already know.
Textbook example: IntToHex
Integers to hexadecimal text is a textbook example of indexation (unless the writer is a complete nitwit). Given your base ‘hex’, 16 possible values, represented by the array x={0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}. We can “find” our character by fetching the n-th value in array x, where n is the number we want to translate.
Since in reality there’s a slim chance we only want to translate numbers under 16, we’ll have to find out a way to deduce our translation to 1 or more simple array indexations.
With the hexadecimal system being close friends to the binary system, we know we can physically divide our byte into 2 x 4bit codes, and 4 bits can form 16 combinations.
Easy enough we can thus make a 2 character string of our given bytevalue p in 1 long line;
str = hex[(p & 0xf0) >> 4] + hex[(p & 0x0f)];
Magic
Real life applications might not be so simple or magical as hexing, but the principle is the same:
- determine a way to calculate a predetermined and (unique if non-tree) index
- put the items in the array with your calculated indices on startup and on modifications
- locate your item by calculating the possible position and see if it’s there
Unposted drafts feature; here I stopped writing for some reason, and I’m not going to finish it…
Entry filed under: /roll. Tags: .
Trackback this post | Subscribe to the comments via RSS Feed