Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How often would this actually be worth it? My hunch is that the computational time involved in packing and unpacking IP addresses into integers is more valuable than the space saved by storing them as integers.


Your hunch would be wrong.

  #include <stdio.h>

  int
  main (int   argc,
        char *argv[])
  {
    unsigned int ip = 3232235777u;
  
    printf ("%d.%d.%d.%d\n",
            (ip & 0xFF000000) >> 24,
            (ip & 0x00FF0000) >> 16,
            (ip & 0x0000FF00) >> 8,
            (ip & 0x000000FF));
  
    return 0;
  }


What does this code demonstrate, other than that you're unconcerned with endianness?

Converting "192.168.1.1" to an integer in Ruby involved creating multiple array, string, and integer objects, not to mention several multiply-indirected method calls.


This code is not endian-dependent. You can only observe endianness when you address the same memory as two different types:

    int x = 5;
    char *chp = (char*)&x;
    printf("This value is endian-dependent: %hhd\n", *chp);


The code isn't endian-dependent because it doesn't do anything. The only thing you can do with "3232235777" on a little-endian machine is compare it to another number to see if it's also "3232235777".

If you're going to store IP addresses as 32 bit integers, or work with them that way in your C code, 192.168.1.1 should be "16885952", so you can do > and <.

But your point is well taken, and audiguy, I'm sorry for being such an asshole in my comment.


ip addresses are always stored in network-byte-order


Your ALU doesn't care what the RFC says. What's the point of storing addresses in binary if you can't do math on them? There is no point, is the point.


The computational time to pack and unpack an IP to an integer is vanishingly small. My old MacBook Pro does 500,000 per second of the corresponding PHP function.

The difference between an integer and a fifteen byte string is eleven bytes. Our database has a few hundred-million row tables (barely considered big by today's standards) that store IPs. Storing IPs as integers saves us a GB per hundred million rows in addition to a substantial index size reduction.

Your application may not need to store that much data, but it's my experience that tables with IPs are the ones that tend to get big. :)


Storage is cheap - the primary win here is computation time.

if(ip1 == ip2) is a lot faster as ints than strings.


Seeing as how the largest dotted-quad IP address fits inside rax:rdx on a modern CPU, and that two of them fit in a single cache line, I'm guessing x == y, while faster, is not "much" faster with strings than integers.

I wouldn't populate an address trie with strings, but I also wouldn't give a second thought to passing them around a random C program as charstars either.


Most string libs aren't that smart though - string comparisons are still byte by byte. You're now comparing something 15 times instead of 1. You can do an int32/int64 (depending on architecture) compare in a single op.

The point I guess is, you can keep 'em around as charstars, but eventually you'll have to do this cast to compare them...


All memcmp's are this smart. But that's kind of besides the point, right? 1 time, 14 times, if we're talking about L1 cache, we're really epsilon from pure reg/reg ALU operations, implementing effectively constant-time algorithms.

I agree, int32 is faster. Like I said, it's just not "much" faster.


But what if I want to count all the 10.x requests, not sure I can bitshift in an SQL query.


  select count(*) from ipTable
  where ip >= 167772160 and ip < 184549376
IP is four digits in a 256-base integer. You are looking for ips with the first digit 10. So, the value boundaries are: 167772160 = 256 * 256 * 256 * 10; 184549376 = 256 * 256 * 256 * 11.


i've written ip address database management systems that did everything in integers not necessarily because of the size benefits, but just because it's easier to sort ips stored as integers, do addition/subtraction easily when they cross network boundaries (10.10.10.254 + 6 is what?), and do cidr calculations.

if nothing else, storing ips in a sql database as integers will make searches on their indexes faster.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: