Node Buffers

I’ve been doing some more work on my json-in-mysql module for node.js. One of the areas I’ve been looking at is performance. As is often the case with these things, I’ve found that when things take longer than I’d like, it’s not for the reasons I first expect.

I wrote a simple performance test that stored 1000 auto generated json documents and then queried against them. The document writes took a reasonable amount of time — a little over 1ms per document. The queries on the other hand weren’t so blazing and took longer the more results were returned. At first I blamed MySQL, but found that the underlying SQL queries were rarely taking longer than 1ms. I then used the node.js/v8 profiling capabilities to discover that most of my query time wasn’t spent in my code, or in the node-mysql driver I’m using, or in node, but in v8 internals. After some further digging I discovered that node Buffers, used extensively by node-mysql, have some heavier than expected costs.

I ended up creating a test case like this while investigating the performance issues:

var dt = new Date;

for (var i=0; i <10000; i++){
    var buf = new Buffer(1000);
    for (var j=0; j<100; j++){
//        var s2 = buf.toString('utf-8', j*10, j*10 + 10);
        var s2 = buf.slice(j*10, j*10 + 10).toString('utf-8');
    }
}
console.log(+new Date - dt);

I found that the pattern buffer.slice(start, end).toString(encoding) was more than 10 times slower than the pattern buffer.toString(encoding, start, end). In theory a slice is cheap since it doesn’t allocate any new buffer memory — it just creates a new Buffer object that refers to the memory in the parent Buffer. In practice, there seems to be some considerable overhead somewhere.

I’m not too familiar yet with node or v8 internals, but I did take a look around to see if I could figure out what’s going on. I found that the node Buffer class was calling a method in the v8 API called SetIndexedPropertiesToExternalArrayData. I gather this is an optimization that tells v8 that an objects memory buffer will be managed externally, but still allows it fast indexed access. I’m guessing (and this really is all a guess at this point given my limited knowledge) that this has a cost in that it forces v8 to modify the generated class for the object. For big buffers with lots of data access the cost of incrementally compiling the object is far out weighed by the reduced access costs. For small buffers, though, the cost becomes a significant overhead.

If that’s all true, I’m not too sure what the solution is….possibly direct support for binary buffers directly within v8?

Advertisements

About geochap

I'm a software developer living in Belfast, Maine
This entry was posted in node.js. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s