A tutorial of the costs of dealing with data using Vstr

This page is an advanced tutorial looking at the costs of the different Vstr calls, given different data. For a simpler starting tutorial see this page.

Adding data to a Vstr_base object

There are four basic methods of adding data to a Vstr:

Each adds a specific type of node in the VSTR_TYPE_NODE prefix, vstr_add_buf adding _BUF nodes vstr_add_ptr adding _PTR nodes etc. Nodes of different type cannot be merged, so if you have 512 byte sized _BUF nodes and write code like...

/* assuming vstr_num(s1, 1, s1->len) == 0 right now... */

 /* add some data via. copying it */
 vstr_add_cstr_buf(s1, s1->len, "abcd");

/* now: vstr_num(s1, 1, s1->len) == 1 */

 /* add some data via. pointing to it */
 vstr_add_cstr_ptr(s1, s1->len, "abcd");

 /* add some data via. copying it */
 vstr_add_cstr_buf(s1, s1->len, "abcd");

 /* add some data via. pointing to it */
 vstr_add_cstr_ptr(s1, s1->len, "abcd");

/* now: vstr_num(s1, 1, s1->len) == 4 */

...you will be wasting 508 bytes of space in each of the _BUF nodes. However, if after the above you do...

/* assuming vstr_num(s1, 1, s1->len) == 4 right now, from above code... */

 /* add some more data via. copying it */
 vstr_add_cstr_buf(s1, 4, "abcd");

/* now, still: vstr_num(s1, 1, s1->len) == 4 */

...no extra _BUF node storage will have been allocated, as it's used some of the spare storage from the first _BUF no ... and so "only" 504 bytes are wasted in the first node, and still 508 in the third (second _BUF) node.

Note that there are many functions that are built on these four primatives, from the simple like vstr_add_cstr_buf() which just automatically works out the length, to vstr_add_rep_chr() which works "as though" you called vstr_add_buf() after calling memset() on a buffer, without having to double copy and then there are the more complicated functions like vstr_add_iovec_buf_beg() and vstr_add_iovec_buf_end() which work as though you called vstr_add_buf but they use an iovec scatter gather list.

Substituting data in a Vstr_base object

Each of the addition functions has a coresponding substitution function by ust replacing the _add_ name with _sub_:

These functions can all be thought of as doing a delete, followed by the coresponding add function. However they are much faster, in many conditions, and they are atomic by default. Specifically when substituting via. vstr_sub_buf() into a _BUF node the original data is overwritten with the new data, note that _BUF is the only writeable node all other nodes are read-only. And when using either vstr_sub_ptr() or vstr_sub_ref() to replace an entire node then the nodes are just swapped in place. For instance:

Adding data and then memcpy()ing over it with new data

 /* add some data... */
 vstr_add_cstr_buf(s1, s1->len, "abcd");

 /* now this will just call memcpy() over the original */
 vstr_sub_cstr_buf(s1, s1->len - 3, 4, "abcd");

Adding data and then swapping the node with new data


 /* add some data, in a new _BUF node... */
 vstr_add_cstr_ptr(s1, s1->len, "abcd");
 vstr_add_cstr_buf(s1, s1->len, "abcd");

/* this makes sure the iovec cache is valid */
 vstr_export_iovec_ptr_all(s1, NULL, NULL);

 /* now this will just swap the pointers for the original _BUF node with the new
  * _PTR node -- and the iovec cache will still be valid */
 vstr_sub_cstr_ptr(s1, s1->len - 3, 4, "abcd");

Adding data and then swapping the contents of the node with new data


 /* add some data, in a new _PTR node... */
 vstr_add_cstr_buf(s1, s1->len, "WXYZ");

 /* now this will just swap the pointer in the original _PTR node with the new
  * pointer ... so only the original _PTR node needs to be allocated */
 vstr_sub_cstr_ptr(s1, s1->len - 3, 4, "abcd");

Moving data between Vstr_base objects

There are five basic methods of moving data from one Vstr to another, the first four of which need to be followed by a delete to remove the data from the source. The last one, however, does this in one operation:

Other methods can be built upon those foundations such as...

 /* Complicated way to move "len" data from begining of "src" to end of "dst" */

 if (!vstr_iter_fwd_beg(src, 1, len, iter)) abort();

 movlen = 0;
 do
 {
   movlen += iter->len;
   addlen  = iter->len;
 } while (vstr_iter_fwd_nxt(iter));
 movlen -= addlen;

 /* move all complete nodes -- note that this still has to copy the first
 * node if it is a _BUF node and has some of the deleted "used" */
 vstr_mov(dst, dst->len, src, 1, movlen);

 /* add/del the data in the last node */
 vstr_add_vstr(dst, dst->len, src, 1, addlen, VSTR_TYPE_ADD_DEF);
 vstr_del(src, 1, addlen);

Deleting data from a Vstr_base object

There is only a single call to delete data from a Vstr, vstr_del(), however it has different speed characteristics depending on what it needs to delete. Much like vstr_mov() the worst case is dealing with data that doesn't go to the end of a _BUF node, with a single exception. This requires a single memmove() of all the later data in the node to the start of the deletion in that node. All node types can delete data from the end of the node by just changing a variable, nodes can be entirely removed by changing pointers (although removing pointers from the middle of the Vstr will destroy the iovec cache -- but that shouldn't be a large hit). All nodes, apart from _BUF, can delete data from the begining of the node ... also there is an optimisation so that deleting from the begining of _BUF nodes located at the begining of the Vstr requires only a "used" variable alteration (this is a very common operation for IO, where you add at the end and delete from the begining).

Deleteing from the middle of a non _BUF node is a somewhat faster way of doing a delete to the end of the node and then doing an add of the data at the end of th node . That means that this kind of delete requires an allocation, which is why vstr_del() returns a SUCCESS or FAILURE. This also applies to _BUF nodes when the VSTR_CNTL_CONF_GET_FLAG_DEL_SPLIT attribute is set in the configuration.


James Antill
Last modified: Sun Jul 31 00:36:26 EDT 2005