Wednesday, January 11, 2012

Building a protocol

Introduction

When I started to rewrite the save file algorithm for djon time tracker, I searched what will be the best way to do this, and I found a lot of people saying XML is the way to go, but everytime I tried this the files got very large and the time to read and write was huge, that's why I started to look at the binary formats.

The binary formats are simple files where you store bytes instead of chars, straighforward definition, but what is that? how do I know that a string started at some point, or I have an integer, when I read it the only think I see is HEX values? Here's where protocols are useful.

from wikipedia: "A communications protocol is a system of digital message formats and rules for exchanging those messages in or between computing systems and in telecommunications. A protocol may have a formal description."

When you write your own protocol you should define an unique way to write and read every single piece of data, then you have to follow the rules to read and write based on the protocol you defined.

Let's build our own protocol.

Let's say that you're going to save or transmit over a wire the data of a customer:

Customer

  • Name
  • Last Name
  • Birth Date
  • Salary

First, we need to define the type of each data:

Customer

  • Name: chars
  • Last Name: chars
  • Birth Date: date
  • Salary: integer

Now we define a unique set of rules to write a Customer:

Data Order: the data will follow the same order everytime (Name, Last Name, Birth Date, Salary)
Labels? if we have a fix set of data (like the example above) it's useless to name each piece of data, so we will avoid this and save space

Now we need to define how to save each type, let's start with the easy one.

Integer

In C an integer is a 2 bytes length data, that means you will have a 2 chars to store. Ex: 65000 as a salary will be FDE8 (2 chars, FD and E8 which could be translated to char:253 and 232).

Let's write some code here, and save an integer in the simplest way:


void saveInt(int a) {
    FILE* f = fopen("test.dat", "wb");
    fwrite(&a, 1, sizeof(a), f);
    fclose(f);
}

This code works well and it's very straightforward, but it has a big problem. It will write the 2 bytes (from the example above: FD and E8) in an unknown order, could be E8FD or FDE8 depending on the architecture of the machine it runs, this means that if the architecture of the machine where you're going to read the file changes you will get a very different result. That is called Little/Big Endian problem. To fix this we will ensure that the order will be the same all the time, this is done using the following code:


void writeInt(int a) {
    FILE* f = fopen("test.dat", "wb");
    unsigned char c = (a & 255);
    fwrite(&c, 1, 1, f);
    unsigned char c2= ((a >> 8) & 255);
    fwrite(&c2, 1, 1, f);
 
    fclose(f);
}

This code will ensure that the order will be same everytime, and it will not depend on the architecture of the machine it runs. Let's break down this instructions:


    unsigned char c = (a & 255);

If you have an integer of 65000 (FDF8) it will do an "and" operation with 00FF, this will "erase" the higher byte:


    FDF8
And 00FF
    ====
    00F8

The next instruction will do a similar operation, it will move the bytes from right to left and erase the higher part:


unsigned char c2= ((a >> 8) & 255)

FDF8 >> 8   = XXFD
XXFD & 00FF = 00FD

With this simple method (called Little Endian) we will ensure that the write will be always in the same order, now the read will be easy:


int readInt() {
    FILE* f = fopen("test.dat", "rb");
    unsigned char c;
    fread(&c, 1, 1, f);
    unsigned char c2;
    fread(&c2, 1, 1, f);
    
    int res = c & (c2 << 8);

    fclose(f);

    return res;
}

c2 will be FD and c will contain F8 doing the "<< 8" the FD will go up and adding will result in FDF8 (the original number)

Now that we solved the big issue, the other things are easier.

Strings

One of the main issues with strings is how to deal with the length of the string, one possible solution could be to put a fixed char at the end of the string and read until reach that character.

Strings solution 1


void writeString(char* c, int len) {
   FILE* f = fopen("test.dat" "wb");
   for (int x = 0; x < len; x++)
      fwrite(&c[x], 1, 1, f);

   char end = '*';
   fwrite(&end, 1, 1, f);
   fclose(f);
}

char* readString() {
   FILE *f = fopen("test.dat", "rb");
   char c;
   char buffer[256];
   int pos = 0;
   do {
      fread(&c, 1, 1, f);
      if (c != '*') {
          buffer[pos] = c;
          pos++;
      }
   } while (c != '*');
   buffer[pos] = '\0'; // terminated-string
   fclose(f);
   return buffer;
}

This solution works pretty well, and it can be improved using stringstreams or others, but it has a big problem, what if the original string contain the character '*' in between? change it for other char? what will be the odds that character is included too? This could be easily fixed if you write down the size of the string and then the content of the string, and you read it in the same way, first the length and then the contents.

Strings solution 2


void writeString(char* c, int len) {
   writeInt(len);
   FILE* f = fopen("test.dat" "wb");
   for (int x = 0; x < len; x++)
      fwrite(&c[x], 1, 1, f);

   char end = '*';
   fwrite(&end, 1, 1, f);
   fclose(f);
}

char* readString() {
   int len = readInt();
   FILE *f = fopen("test.dat", "rb");
   char c;
   char* result = (char*)malloc(len+1);
   for (int x = 0; x < len; x++) {
      fread(&c, 1, 1, f);
      if (c != '*') {
          result[x] = c;
      }
   };
   result[len] = '\0'; // terminated-string
   fclose(f);
   return result;
}

Solved! (off course you could change the methods to open the file, do all the operations and then close it, these were written this way to avoid complexity)

Now, the main code:


write(Customer c) {
    writeString(c.name());
    writeString(c.lastName());
    writeDate(c.birthDate()); // I will let this to the reader
    writeInt(c.salary());
}

Customer read() {
    Customer c;
    c.setName(readString());
    c.setLastName(readString());
    c.setBirthDate(readDate());
    c.setSalary(readInt());
}

This solution could be applied to network transmission, files, or anything you want. You could translate this solution to other languages.

No comments:

Post a Comment