Sunday, October 23, 2011

My own NoSQL DB, progress


The last few weeks have been good... hard... but good, I finally implemented some new layers in the DB, some of them were harder than others, and i realized that I dont know that much of some bare metal components like network layers as I thought I know, but anyway I managed to pass through all of this and the db is now running smoothly, this post is to write down some of the progress made and explain some of the problems I had implementing these new layers.

Command Layer 

This component is really one of the cores of the communications with the user, basically it's meant to process the commands made by an external player and figures out what is the request, how to handle it and what is expected to be returned to the client.

This component was not really a problem, basically because I already knew how to handle the network protocol, then I wrote a writer and a reader and that's it, the command is alive!.

Network Layer

This component was really a hard ball, I didn't know that synchronize the sends with the reads will be that hard, this component drove me crazy for some days, until I realized that I had the network layer does not sign a contract with you to send exactly what you put on the pipe, thus is not suppose to get the expected length at the end of it. Let me put it this simple:

Client A writes 1000 bytes on the pipe.

Server expects 1000 bytes...

But... that does not happened, the client actually sends some chunks of data, 100 bytes at first, 500 later, 200... etc. Thus the server should expect to read this mess in the same way, now that I read some literature about it (that's why it took me 1 week to implement this layer) I know that it's easy... but this layer was a real pain (if you have questions about this, just let me know and I will try to explain how to create a reliable server/client communication), I still to do some improvements in order to get better performance, but anyway it's working right now and I dont want further delays in this project.

Index implementation

Here's the tricky part of the databases, figure out how to create an efficient way to sort the data in order to retrieve ii later, in this part I tried to avoid the conventional methods, and try to figure it out from my own knowledge... big mistake!... hahahaha, anyway this tries outs were good to understand why the people came out with fancy solutions like B+Trees, BTrees, etc. here's the BUT... I read documentation on B+trees and it didnt fit well on what I think should be the separation between data and indexes, so I had to create my own solution to this problem, of course... based on B+trees knowledge and some of my tries outs.

Now, the tests runs very well... the db gets the following output:

20000 inserts per secs and 29978 finds by key operations per sec.

The thing with the network layer is not going that well, but I will fix this later, with the network layer on the output is: 8695 ops, close to the achieved output of MongoDB which makes me very happy for now. I know that when the time comes I will fix the network issues and that will be a nice boost for my server.

Later I will create some posts of my learning process and what problems I had to solve in order to keep on with this project.

The project keeps looking good, and I will keep going on.