I’ve been hearing about node.js for a while and have been wanting to play with it. Node is all about server-side javascript and async I/O. It’s built around Google’s V8 javascript engine and is gaining quite a following.
I found it pretty easy to get started with node — the docs are decent, there’s lots of sample code, and a good number of getting started blog posts. I initially got going with some pre-built cygwin binaries on windows — they worked, but didn’t really provide the full experience. So, I picked up a cheap rackspace instance running Ubuntu and moved my experiments there (great deal, btw, $10.95/month for a 256MB instance).
I’ve always found working on something real is the best way to learn a new technology. I tried to think of a first project that was non-trivial but not too huge, made appropriate use of node, and was something that I might actually use sometime. In the end, I decided to build a node module that turns mysql into a json store with a custom query language. I called the project myjsdb.
I was already pretty familiar with javascript, and the core libraries/modules in node are pretty straight forward, so I found the biggest challenge was getting comfortable with a 100% async style of coding. There’s really no cheating in node — there’s just no way to do blocking i/o — so program flow where i/o is involved makes extensive use of callbacks. The biggest challenge I had was coming up with patterns to deal with those call backs that led to manageable/readable code.
In simple cases, just using anonymous callbacks works ok. E.g:
this.getDocumentId = function(name, fn){
this.client.query('select id from ' + this.name
+ '_json_doc where name=?',
[name],
function(err, res){
fn(err, res?res[0].id:null);
}
);
}
In more complex cases, though, that can lead to highly nested code which I find pretty unreadable. In those cases, I found creating something like a state machine was a better model. E.g:
this.putDocument = function(name, obj, fn){
var store = this;
var docid = 0;
var s1 = function(){
store.client.query('insert into ' + store.name + '_json_doc (name, last_modified)' +
" values(?, now())" +
" on duplicate key update last_modified = now()", [name], function(err, info){
if (err)
return fn(err);
if ((docid = info.insertId) == 0)
s2();
else
s3();
}
);
};
var s2 = function() {
store.getDocumentId(name, function(err, id){
if (err)
return fn(err);
docid = id;
s3();
});
};
var s3 = function() {
store.clearDocument(docid, function(err){
if (err)
return fn(err);
s4();
});
};
var s4 = function() {
var stmts = new Json2SqlHelper(store, docid, obj).getStatements();
store.client.query('insert into ' + store.name + '_json values ' + stmts.join(', '), function(err){
fn(err);
});
};
s1();
}
I imagine an event handler model would work well also (it just seemed a little heavy weight in my cases to use an emitter for purely local flow).
I also found that when I had some underlying object doing some queuing, I could use a model like this:
var Store = require('./myjsdb').Store;
var store = new Store('test', {user:'root', password:'xxx', database:'testdb'}),
doc = store.getDocument(),
person = doc.getObject({age:Number, name:'Geoff', knows:Object}),
p2 = doc.getObject({});
person.age.gt(25);
person.knows.eq(p2);
store.open();
store.create();
store.putDocument('doc1', {name:'geoff', age:44, knows:{name:'derrish'}});
store.query({age:person.age, name:person.name}, function(err, res){
console.log(res);
});
store.remove();
store.close();
In that case, I don’t provide callbacks to many of the methods called on the store object (though they accept them). This works because the underlying mysql driver I’m using is queuing up operations. The code reads like the various operations are synchronous, though in fact they’re not. What really happens is all of the code shown gets executed, queuing up operations against the mysql store (well, the first mysql call will be executed but the rest will be queued). Only once the code shown has completed executing will the thread be available to process the i/o returned from mysql from the first call. Once that i/o is processed, the next queued command will be issued, etc. I guess this is a pretty special case since it relies on having a single i/o processor that queues its operations.
All in all, I’ve found node pretty nice to work with. I do wonder whether a job queueing/thread pooling model wouldn’t be better than the single threaded model used, but perhaps there are technical reasons related to V8 that make that impractical.