Vstr design

Requirements

The Vstr string library is designed to be as fast as possible for doing IO, specifically scalable Network IO -- and even more specifically when you are doing that non-blocking IO for large numbers of network connections.

This meant that I wanted an API that would:

I'm not the only person who thinks these things, for instance in Jeff Darcy's notes on high performance network server IO design his first and third points are basically the same as my first three requirements.

Implementation

To make the append/delete case very fast, the internal design uses an ordered set of nodes. Each node has some data and a length. Each string is associated with a configuration, which stores the size of the modifiable data nodes. Thus to append data to the string, you can either use the remaining amount of space in the last data node (assuming it's writeable) or append a new writable data node. The configuration also has a set of spare data nodes.

This makes the worst case cost, for append, to be allocation for enough nodes of storage to cover the amount needed to be added to the string. The worst case cost, for delete from the begining, to be a movement from the begining of the set of the string to the spare set in the configuration. However after a delete from a string using a configuration, another string using that same configuration will have spare data and so not need to do allocation for that amount, so the worst case behaviour for append tends to towards never happening.

To deal with mmap() areas, a node can be of a non-writeable type. The two main ones being a pointer to some external data and a reference to some external data (the reference has a cleanup function called on it, when the reference count reaches zero).

To deal with out of bound data, a node can be of a non-data type. This means that there is a specified length of data, but no pointer to said data.

visual data

There is a simple diagram of the internal structure discussed above.

There is a size analysis of a Vstr string as data is added to it, note that this assumes only writeable nodes.


James Antill
Last modified: Sun Jul 31 00:34:33 EDT 2005