Unposted drafts – Algorithm optimization: indexation

Many a times in programming you’ll use the magic words For or While, or your language equivalents, but sometimes a little too much. Any loop you’ll use will slow down your algorithm by n times t. The moment you add a millisecond of computing to 1 iteration, you’ll automatically end up with n times that millisecond as extra time spent.

In the area of looking up or finding an element in an array of n elements, a lot of times we won’t bother to optimize or perhaps we can’t imagine an optimization to a simple variable comparison. Sure we can cheat our way out with a break or exit from the loop when we found our needle, but that’s as far as it goes.

The moment where we again put our find method inside another loop, the time spent with our lookup is going to matter if we value our users time.

Indexation is a common concept of lookup optimization, but why, and how do we do it?

The basic idea of indexation, in programming that is, is that given a space of addressable memory, and a lot of possible objects to find, we can find out if that object exists – by already knowing where it’s supposed to be.

It seems like the world upside down; instead of trying find the object we’re looking for, we’re supposed to know already know, and perhaps find out if we’re right.

We can’t of course stick our hand in the mud randomly and hope for the best. We need to find a way first to make the objects we’re going to look up later, are either where their supposed to be, or nowhere.

If we look at our indexation masterminds: databases, we can find a complicated amount of indexation techniques that work pretty well. Our fancy scripting languages may have a little bit of that in their non-default datastructures and functions. But I want to focus on the simplest form of indexation, well, second simplest.

About arrays

In the world of programming, the standard array is going to be your template for your algorithm. The array works in a way that you arrange memory at the position of variable x, starting from index 0, counting to n, and you assign or requests values in that array by supplying p as in x[p]. Internally this works as a function resulting in another address, f(x,p) = x + p * c. In this formula, c is the size of your datatype; if your type is a byte then c = 1, if your type is a 32bit integer it’s 4, etc.

There’s no magic lookup in regular arrays, its just calculating the start of the requested fragment in a larger fragment of memory, and this works pretty fast.

Simplification to its core when going back to the indexation aspect; e.g. given the array x = {0,1,2,3,4,5}, we’ll know that the number 4 will reside in x[4]. But of course, why look up something you already know.

Textbook example: IntToHex

Integers to hexadecimal text is a textbook example of indexation (unless the writer is a complete nitwit). Given your base ‘hex’, 16 possible values, represented by the array x={0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}. We can “find” our character by fetching the n-th value in array x, where n is the number we want to translate.

Since in reality there’s a slim chance we only want to translate numbers under 16, we’ll have to find out a way to deduce our translation to 1 or more simple array indexations.

With the hexadecimal system being close friends to the binary system, we know we can physically divide our byte into 2 x 4bit codes, and 4 bits can form 16 combinations.

Easy enough we can thus make a 2 character string of our given bytevalue p in 1 long line;

str = hex[(p & 0xf0) >> 4] + hex[(p & 0x0f)];

Magic

Real life applications might not be so simple or magical as hexing, but the principle is the same:

  • determine a way to calculate a predetermined and (unique if non-tree) index
  • put the items in the array with your calculated indices on startup and on modifications
  • locate your item by calculating the possible position and see if it’s there

 

Unposted drafts feature; here I stopped writing for some reason, and I’m not going to finish it…

 

September 21, 2011 at 12:58 pm Leave a comment

color profile madness

I have a Cintiq 21ux since forever it seems now, and I always had issues with the colours being “off” on the normal iMac screen. Obviously the reason lies in the ICC profiles… I thought. They were set the right way however, and embedding the right profile in the picture didn’t help either.

After a lot of fondling around with the profiles on both screens, I decided to do a restore factory settings on the Cintiq. Yet the problem remained. Until I found out that the Default color setting was wrong.

The temperature of the colours was at 6500k, but naturally that doesn’t match the ICC profile at all. The setting “Direct” however did the trick. I have to fix all the colouring now in my pictures, but at least I know they’re going to be the same almost everywhere.

February 8, 2010 at 1:50 am Leave a comment

OSPatch Part 2

I know I’ve kept a bit silent and sketchy about it all, I haven’t had much inspiration and been busy with things and whatnot.

But I finally sat down tonight and tried to just sketch whatever came to mind, erase, sketch again, etc. And after a few tries I came up with [this]. It’s not your typical website, it’s a little risky and new I guess. But it has I think some nice quirky things in it that makes it fun while being a bit minimalistic (I wanted to get rid of sidebars so patches and other texts have free reign over the rest of the page)

And actually, the reason why I did the main menu like this, is because I couldn’t get buttons or or gradients and what not to work in my limited knowledge of Pixelmator. It’s not as fancy as photoshop, which I cannot afford. I’ve already drawn the close/logout button, and it’s rather fun, metallic and weird. Ofcourse it has nothing to do with sourcecode, but that’s where the actual functionality of the website is going to be.

I also thought it to be nice to have a little sliding thing coming out of windows, because Mac OS X has those things about and it looks rather fun. Am hoping no patent is on that *cross fingers*, though there might be.

The actual “patch” is a bit of a joke, the site is about patches, but that patch represent patching the webpage with other content. It’s ridiculous when you think about it ofcourse, different patches to apply to your website sourcecode to create another page?

Anyway, I’m going to draw this up, and since I’m hopeless at it, I’m actually going to draw it by hand brushstroke by brushstroke. It’s going to take ages, but ah well…

Before I got the sketching phase I did some coding around the website and plugged in an OpenID login (which wasn’t easy, Google OpenID was being annoying, but I got it in the end). I’m all done with having to remember another set of credentials and filling in the same forms, and I don’t really need all kinds of information. Just the email and a name would suffice, which I can store and link various website related information to it.

Feel free to say the design sucks!

December 7, 2009 at 12:29 am Leave a comment

Open Source Project

So the project itself is a rather simple little thing. It’s a framework for making websites with complete separation of view and data, which I’m already using for e.g. http://saikosoft.inaspro.nl/ (not that you can notice anything)

Since I figured it might be of value to someone else, I wanted to make it Open Source and let it grow over time into something better, perhaps with help of others.

There is however a slight hitch in my plan, for I dislike the concept of Mailinglists that has been generally accepted to be a Open Source Thing. To me it seems a cumbersome thing that takes up way too much time to go through and pick up the patches, but I don’t want to just give anyone commit access to the repository.

So I had this image in my head where there was a website where people could just upload their patches, put a few comments under it explaining what it does and why, and other people can comment on the same thing, perhaps attach a revision of the patch, and give the thumbs up for specific patches. Once enough thumbs up have been given, the committers can check the list of patches and commit those that have been generally accepted as a good patch (given the committer likes it too of course). (which might be another open source project I might send out if this works)

But images in your head never translate to actual computer graphics in the real world when I’m trying to do the website. Sure I can handle code, but design?

Perhaps more updates and links over here soon…

ASCII Preview: http://www.ospatch.com/

November 12, 2009 at 2:49 pm 9 comments

IP Lookup tool

This tool is to retreive ip addresses for a given hostname. This isn’t very new, but I noticed that if you need to refresh ip without wanting or able to call ipconfig /flushdns, you can still refresh the ip’s for the given hostname.

It also has an option to return the ip address that is used to indicate a hostname is either invalid or could not be found.

getip.exe

(free, freeware, no spyware, use for whatever you want, unlicensed, win32 console exe optimized for i586)

September 13, 2009 at 8:00 pm Leave a comment

hobby multi-isp serving routing scheme revisited

After my last two posts about multi-isp serving over 2 routers, some time has went by and one of my isp’s dsl-modem slash router changed and went ballistic with fixed settings. They decided that nobody ever wants a different subnet from 192.168.2.0/24, and therefor dump everything else to the outside interface. This didn’t only make a router inbetween very much needed (since I didn’t want to change my local network besides a /16 mask), it also made my network a lot more complicated.

Since the router would ditch everything going to 192.168.0.0, I decided to connect the secondary network interfaces (that happened to be configured to 192.168.2.0 as well) to the network as the serving interface. However, as usual this doesn’t matter when it comes to outgoing traffic. Anything it considers outside its network is send to the default interface instead, eventhough the incoming packets came from the other interface.

At some moment of clarity, I found that I could split up 192.168.2.0 into 2 networks as well, namely in this case (since the isp’s dsl-router has its local ip set fixed to 192.168.2.254) into networks 192.168.2.128/25 and 192.168.2.0/25. This allowed everything beneath 192.168.2.128 to be sent to the internal network, and everything above, including the dsl-router, to belong to the other side.

Basically like this:

multiisproutes

The trivial stuff

interface FastEthernet0/0
 ip address 192.168.2.130 255.255.255.128
 ip access-group defaultallin in
 ip mask-reply
 ip nat inside
 ip route-cache flow
 duplex auto
 speed auto
 no cdp enable
!
interface FastEthernet0/1
 ip address 192.168.2.101 255.255.255.128
 ip access-group defaultallout out
 ip mask-reply
 no ip redirects
 ip nat outside
 ip route-cache flow
 duplex auto
 speed auto
!
ip classless
ip route 0.0.0.0 0.0.0.0 FastEthernet0/0 192.168.2.254
ip route 192.168.0.0 255.255.255.0 FastEthernet0/1
ip route 192.168.2.0 255.255.255.128 FastEthernet0/1
!
ip access-list extended defaultallin
 permit ip any any
ip access-list extended defaultallout
 permit ip any any

Note: default interface setting “no ip proxy-arp” is negated and thus on.

The headache stuff, yet in the end not that hard

The normal way of NAT is to e.g. “ip nat inside source list 1 interface FastEthernet0/0 overload”, to replace the source ip, with the ip of interface fa0/0, if the packet is permitted by list 1.
Where list 1 is often casually defined as “access-list 1 permit any”, allowing everything to be translated.

A line like “ip nat inside source static tcp 192.168.0.10 80 interface FastEthernet0/0 80″ is often used to change the destination ip/port to 192.168.0.10 if a packet arrives at the external interface fa0/0 on port 80.

In this particular case, we still want a “normal” NAT source ip translation from the inside to the outside (because we need to force our server to not send traffic to the default gateway), but not All traffic, since we still want to use the other ISP for regular Internet traffic when we fancy such a thing. Unfortunately a standard access-list doesn’t allow us to be very specific unless we want to define the source. Since the source ip could be anything (because the traffic is coming from the Internet), we can’t use a standard ACL.

We can however use a Extended ACL, which allows to be specific about the Destination IP and portnumber, which goes as followed:

ip nat inside source list ourspecialnat interface FastEthernet0/0 overload
!
ip access-list extended ourspecialnat
 permit tcp any host 192.168.2.2 eq www

And there you have it. A specific permission for destination 192.168.2.2 port=80, to be translated. Now all we need to do is to set a portforward from our ISP’s dsl-router from port 80 to 192.168.2.2. It routes traffic inside because the ip matches its 192.168.2.0/24 network, and our special nat router makes sure the packets are sent and received via  the same route due to obscuring the source ip.

PS. Perhaps it should be noted that this is just a funny exploit in search of better solutions. Most of the time you will want to know the actual IP address a visitor of your website is using, and not get a local IP instead. Chances are however that setting up either your server or your network to remember which route they should be going (with an IPv4 NAT-ted network) is considerably more difficult. When no traffic is translated in the first place (like for example when you have a large number of IPv4 addresses or a couple of IPv6 addresses), there wouldn’t be an issue of rejecting source/destination mismatches, because they would always be the same.

September 8, 2009 at 9:45 pm Leave a comment

db component chain

If you want to connect visual components like grids or editboxes to a dataset, you’d expect things to be a bit straightforward, like for example:

DBGrid -> DataSource -> SQLQuery -> DBConnection

Unfortunately, when it comes to db components, this chain will result in a “Operation not allowed on a unidirectional dataset” error. Which would make a tiny bit of sense if you want to make the Grid editable, but even when you switch off every possible property off to edit the grid, the error will still come up.

If you’re like me, you’d probably just put a StringGrid on your form and loop over the SQLQuery to show all the data. That does however take a lot more writing code for something that should be quite easy to do.

Aparantly, this is the way to do chain the components to be able to link your things together:

dbX component chain

July 19, 2009 at 4:25 pm Leave a comment

Short: Happy undocumented quirks

Description of AdjustWindowRectEx() on MSDN:

The AdjustWindowRect function calculates the required size of the window rectangle, based on the desired client-rectangle size. The window rectangle can then be passed to the CreateWindow function to create a window whose client area is the desired size.

Description of GetWindowRect() on MSDN:

The GetClientRect function retrieves the coordinates of a window’s client area. The client coordinates specify the upper-left and lower-right corners of the client area. Because client coordinates are relative to the upper-left corner of a window’s client area, the coordinates of the upper-left corner are (0,0).

You would think that these functions are eachothers counterpart, you give a clientrectangle to AdjustWindowRectEx(), say (200×200) and you’ll get something like 214×234 if you have the Vista look active on your window. When you set your window to those dimensions with for example SetWindowPos(), and afterwards you check the size with GetClientRect(), you’ll get 212×232. This works the same with other themes, and if you copy-paste a screenshot into MS Paint, you’ll be able to measure the same 212×232 while the outside of the window is the exact given 214×234 – properly set by SetWindowPos().

The dirty ugly solution of course is to add 2 pixels to whatever you get back from AdjustWindowRectEx() – but surely someone at Microsoft is supposed to Test these functions?

RECT size;
size.left	= myWindowObj->x;
size.top	= myWindowObj->y;
size.right	= size.left + myWindowObj->w;
size.bottom	= size.top + myWindowObj->h;

AdjustWindowRectEx( &size, GetWindowLong(hWND, GWL_STYLE), FALSE, GetWindowLong(hWND, GWL_EXSTYLE) );
SetWindowPos( hWND, NULL, myWindowObj->x, myWindowObj->y, size.right - size.left + 2, size.bottom - size.top + 2, SWP_NOZORDER | SWP_NOACTIVATE );

July 5, 2009 at 12:31 pm Leave a comment

Probability

*** I’m not a mathematician, the explanation here is probably crap, so have a look on the internet on factorials and sets.

A friend of mine asked me today if I had a script lying around to produce anagrams. After saying no and secretly checking what the heck an anagram was again, I started thinking about what it reminded me of.

My thoughts diverged on the symbol “!”, which is used in probability to calculate what your chance might be when you eliminate your previous results from the equation. So for example if you have a set of [1,2,3,4,5,6] – after rolling 3 with a dice, the set will become [1,2,4,5,6] so you can’t roll 3 again.

Anagrams are a combination of a set of predefined characters, and then mixed around. And while you might think you’ll have 6^6 possibilities, you actually have less, because you can’t use more of the same characters that are available.

Say for example you have a word of “abc”, you can make “acb”, but you can’t make “aab”, because you only have 1 “a”. Thus in the example of [1,2,3,4,5,6] you actually have 6! possibilities. We can write 6! as 6*5*4*3*2*1 = 720 possibilities.

And the rewrite of 6! is exactly what made me think of this simple algorithm:

procedure factorWord( const sBeginning: string; const sWord: string; const lstPermutations: TStrings );
var
  i, c: integer;
begin
  c := Length( sWord );

  if c = 0 then
  begin
    lstPermutations.Add( sBeginning );
  end;

  for i := 1 to c do
  begin
    factorWord( sBeginning + sWord[i], Copy( sWord, 1, i - 1 ) + Copy( sWord, i + 1 ), lstPermutations );
  end;
end;

 

You’ll initially call it for example with factorWord( ”, Edit1.Text, Memo1.Lines ); and you’ll end up with all the possible anagrams. The function works recursively on 2 sets, one that defines the entire set (Edit1.Text), and one thats starts out empty. With every recursion the possibilities of new characters to add gets smaller, while the anagram is filled with the same amount of characters in a different order, and is added to the stringlist when the recursionpath ends up with 0 remaining characters.

The biggest problem however of making anagrams is that there are simply too much possibilities when you go over about 6 characters. The worst of it, and this is after some code rewrites to append the words to a file occasionally to avoid an out-of-memory error, is that a word with 14 characters will make your textfile about the size of 2.4GB until a file streaming error occurs…

14! = 87178291200 lines, multiplied by “14 byte anagrams plus 2 bytes for the crlf”, you’ll end up with 1394852659200 bytes which amounts to 1299GB of data.

Have fun storing that on your desktop pc…

May 26, 2009 at 12:07 am Leave a comment

Search cmp to 0

Search: optimization “cmp eax,0″

You don’t have to compare EAX to 0 to know it’s 0, just make sure the item in EAX is edited last, with any opcode that you see in your cpu manual with the appendage that it modifies the ZF. That’s a fancy flag that says a value you just put in a register is zero.

Any mathematical, bitwise or just plain MOV into a register, modifies the Zero flag. Skip the CMP and just use either JZ (jump if zero) or JNZ (jump if not zero) to see if the zero flag is set and jump somewhere accordingly.

May 7, 2009 at 9:40 am Leave a comment

Older Posts


 

January 2012
S M T W T F S
« Sep    
1234567
891011121314
15161718192021
22232425262728
293031  

Categories


Follow

Get every new post delivered to your Inbox.