Archive for June, 2008

IPv6 readyness

Supporting IPv6 can be a little tricky for the average developer. I have stumbled on so many issues, mainly because I was hooked to IPv4 and I have little means to test IPv6. My Internet Router that’s attached to my ISP’s modem doesn’t support IPv6, even though the store where I bought my Cisco 851 said it did. So the only way to get in touch with the Internet cloud with IPv6 is to set up a 6to4 tunnel from within my network and connect to a fancy external provider.

I could still do that, but luckily I found a way to set up a mini-network between my main-development-pc and a server. First I had to assign v6 IP’s to the machines, and since I didn’t want to bother with DHCPv6 yet, I had to use static ones. I found a website where I just nicked the sample IP address from (since my router doesn’t support IPv6 anyway, it shouldn’t be a problem).

So I punched into my ssh-console; /sbin/ifconfig eth0 inet6 add 2001:0db8:0:f101::1/64
And on Vista I entered 2001:0db8:0:f101::2 mask 64 into the IPv6 protocol configuration screen.

And magically the pinging now works.

Next step was to add an AAAA record to the local DNS server I’m using, which wasn’t that hard.

Now, the first thing you’ll notice is that the posix function gethostbyname() doesn’t give you any results when you try to lookup your AAAA-record. At least, under Vista it doesn’t, I haven’t tested it anywhere else yet. You have to use the getaddrinfo() function (also posix).

The second problem, which is highly annoying, is about the ip-structure to ip-string conversion and vice-versa. Normally with IPv4 you can use the handy function inet_ntoa(), but ofcourse it doesn’t support IPv6. You have to start using inet_pton() / inet_ntop(). But it’s there where posix compliancy ended with Microsoft. You have to improvise the functions using getnameinfo() and getaddrinfo() using special flags.

Looking back at all the struggles concerning these functions, I don’t really get why I didn’t make a custom implementation. All it should be able to do is to translate an ascii patterned string to a structure with all the numbers mixed up. (seriously, the person who made up that network-byte-order-mechanism was insane)

Microsoft obviously got the message in a very late stage, and added a new Vista+ only function baked into ntdll.dll, named RtlIpv6StringToAddress(). I don’t get why no one else thought of that earlier.

Add comment June 23, 2008

Manual Code Optimizations – Loop edition

This is one of those optimizations that a compiler can do perfectly well for itself. Still, I want to explain something about a famous optimization that is often more assumed then understood.
(more…)

Add comment June 6, 2008

Manual Code Optimizations – part 1

During the many years I spent programming, I’ve had a lot of fun in code optimization (for both speed and space). Not by compiler optimization, but just doing it by hand. Specifically, optimization that’s going to work on any kind of x86 offspring, without the fancy new technologies.

I hope to do a series of these things, but here’s the first one.

(more…)

Add comment June 3, 2008

Unicode sans the Uni

When you pick up your first book about programming, you’ll probably have a guess about what you’re going to learn. You’ll guess you will learn how to add 1 and 1 together with code or how to display the text “Hello, World!” on the screen. And you will. You’ll learn your average text-string consists of a few bytes where every character takes up a byte or 8 bits. Heck, you may even memorize the character ‘0′ is 48 in decimal and a space-character is 0×20 in hexadecimal. Not to mention you’ll be forced to recognize \r\n as 13d,10d or 0×0D, 0×0A.

Eventually however, you’ll come across a fancy word called; internationalization, or “i18n”. If you need to support a foreign language that doesn’t use our a-z alphabet, you will need to look this up. If you started surfing around the Internet in the Microsoft Windows 95 age, you probably had several hits to websites explaining Codepages. Codepages is a handy tool that lets the OS know that the string you’re going to display shouldn’t use the normal ascii representations for those byte-strings of yours, but some other odd character-set. These codepages still exist, but most of the world doesn’t use it anymore in favor of something else.

That something else, is Unicode. And this is where the fun starts, because you’ll go mad when you see this. Well, I did. Unicode is a wonderful idea which incorporates all the characters in the world in one big oddly structured table. The problem about this table is that there are more characters then you can squeeze into 8 bits, and this leads to the great debate that makes you pull out your hair and change professions to something where you don’t have to use your brain.

Unicode is one big mess across programming languages and operating systems. Windows uses 16 bit wchar_t’s and seperate “W” versions of their Windows API, when you compile Linux programs with that same wchar_t you will get 32 bit characters, Linux itself uses UTF-8 by default, Mac OSX uses “canonically decomposed UTF-8″, and Java uses their own default 16 bit unicode strings. And we thought we were just talking about 1 standard called Unicode…

Now, wonder if you’re surprised when you read about this april fools prank

Anyway, I’m not sure what my point was… Oh right. So normally, when you have an UTF-8 string and you want to turn that into a widestring you could find a Posix function called mbstowcs() – multibyte string to wide character string. In combination you need to use setlocale() to let the system know you currently have an UTF-8 string, which can be treated as ‘just another codepage’. However, on Windows you’ll initially read you need to use codepage 650001 for UTF-8, and after testing you’ll find this kind of information on MSDN in small print “If you provide a code page like UTF-7 or UTF-8, setlocale will fail, returning NULL.” So, instead, on Windows, you need to use the function MultiByteToWideChar() which is of course Windows only.

So what’s the big deal about all these types of Unicode? Well, currently you need 4 bytes to represent all the characters on the Unicode table. However, you can stick to 2 bytes if you just want the majority and the characters that are most used in the world. And of course someone got really upset and wanted to retain ASCII compatibility and made up UTF-8, which has a size ranging from 1 to 4 bytes.

The fun thing about UTF-8 is that unless you’re using weird characters, it doesn’t look any different from normal ASCII strings, and as a bonus you’ll be still able to check the length of the string by the character 0.
UTF-16 and 32 on the other hand are in most cases fixed width, and use 2 to 4 zero’s to indicate the end of a string.

I really thought I had something interesting to talk about when I started this post… I guess not…

Add comment June 2, 2008


RSS Twitter

 

June 2008
S M T W T F S
« May   Jul »
1234567
891011121314
15161718192021
22232425262728
2930  

Categories

Blogroll

Meta

Top Posts