Vstr documentation -- overview |
Contents |
|
Download Information |
Current version is 1.0.15
|
About |
Vstr is a string library, it's designed so you can work optimally with readv()/writev() for input/output. This means that, for instance, you can readv() data to the end of the string and writev() data from the beginning of the string without having to allocate or move memory. It also means that the library is completely happy with data that has multiple zero bytes in it. This design constraint means that unlike most string libraries Vstr doesn't have an internal representation of the string where everything can be accessed from a single (char *) pointer in C, the internal representation is of multiple "blocks" or nodes each carrying some of the data for the string. This model of representing the data also means that as a string gets bigger the Vstr memory usage only goes up linearly and has no inherent copying (due to other string libraries increasing space for the string via. realloc() the memory usage can be triple the required size and require a complete copy of the string).
It also means that adding, substituting or moving data anywhere in
the string can be optimized a lot, to require O(1) copying instead
of O(n).
Speaking of O(1), it's worth remembering that if you have a
Vstr string with caching it is O(1) to get all the data to the
writev() system call (the cat example below shows an example of
this, the write call is
The other unusual aspect of the Vstr string library is that it attaches a notion of a locale to the string configuration and not globally (as POSIX, and pretty much everything else does). This means that you can do Network I/O in the C locale and user IO in the users locale. For a look at the internal design of the Vstr string library, you can read this. For a look at the main security problems I wanted to solve you can read this. |
Testing |
The Vstr string
library comes with a "make check" test suite with almost
The test suite has at least one test for each function call, and at least one usage of each constant. This is automatically checked using the "scripts/tst_coverage_diff.sh" script included in the distribution (note that you need to compile without inline support, or inline functions won't be seen to be part of the test suite). Using the coverage analysis available with gcc the test suite has coverage of 100% of the source. Ie. every single line of code in the library is run by at least one unit test. Note that this still doesn't mean that there are no bugs in the code (you need a test for every code path, for that). |
Debugging Vstr APIs |
While I've tried to make the API simple enough that you don't have to do anything complicated to get things done, there might still be times when you do a bunch of calls that you aren't sure are ok or maybe you get some memory management wrong, and pass invalid/NULL pointers to the Vstr API functions. The easiest way to find out what is going wrong is to compile without inline support and with debug support (Eg. --enable-tst-noinline --enable-debug options to ./configure). Then as you call the functions almost all calls check input values for validity, and all calls that modify a Vstr will check the Vstr both before and after their operations. Finaly if you call vstr_exit() the number of memory allocations/deallocations and mmap()/munmap() operations will be counted and assert() calls will be raised if data hasn't been freed. NOTE: If you are using rpms, then there are already rpms for the debug build ... and they should be accepted as "newer" than the normal rpms so you can just install them while you develop. As well as that, in all builds gcc attribute support is checked for the following attrbiutes nonnull, pure, const, format and malloc. the const and pure attributes let the optomiser do some things that can be supprising if you are trying to debug something (for example if you call vstr_cmp() and don't use the return value, gcc will never even do the call in the first place) ... so you need to watch for that. The nonnull attribute should catch errors if you obviously pass NULL pointers to function that don't take them, and the format attribute will catch errors in the calls to printf() like functions. However you may want to temporarily disable attributes due to the opomising problems (if so define VSTR_COMPILE_ATTRIBUTES to be 0 before include the vstr.h header). |
API Reference Documentation |
All exported interfaces are documented, anything which isn't documented isn't guaranteed by the API or the ABI of the library ... so don't use it. There is a script as part of the distribution "scripts/diff_symbols.sh" which checks this. So I haven't just forgotten, if it isn't documented it's undefined what it does and it might change type signature or disappear completely.
|
Mental model to the layout of the API |
At first glace the Vstr API looks huge as there are over Two hundred and eighty functions. However the API was designed so that you can mentally build functions from an API template ... so instead of having to remember 280 functions you just need to remember 10 to 20 pieces of the API template.
Vstr functions try to obey a template where each part alternates between an
object name and an action like...
...which is much less information to remember. It is also consistent in that the same object names are used everywhere and are prefixed by cstr_ when they're length is assumed by looking for a NIL terminator. |
Threads and signals |
All operations are local to the object(s) they are manipulating, and no locking is done inside the library. Synchronization belongs above simple data type primitives like strings. Saying that if you want to use the Vstr string library from multiple threads, then everything should mostly just work if you have a separate Vstr configuration for each thread and operate on strings created by those configurations local to that thread. Using vstr_conf_swap() you could have a pool of objects using Vstr strings and then localize them to a thread's configuration as you want to operate on those objects. For all data that you wish to move between two Vstr strings that are "owned" by different threads you will need to do some higher level locking around the copying. One caveat is if you have a Vstr_ref node inside a Vstr string, and then copy that to a string owned by another thread (or do a VSTR_TYPE_ADD_BUF_REF or VSTR_TYPE_ADD_ALL_REF copy of any data) there will be unlocked reference counting on the Vstr_ref ... so basically you can't do that unless you really know what you are doing. For Vstr string operations you wish to do from a signal handler, life is more complicated, unless you're using a malloc() implementation that is guaranteed to be reentrant safe (this is generally not the case, and not the same as a thread-safe malloc() ... as you can be inside malloc() when you get a signal). The obvious way to get around this is to pre-allocate enough storage in the Vstr configuration to be used in the signal handler, Ie. call vstr_make_spare_nodes(). If you absolutely need to use a Vstr string in a signal handler, that is also used outside a signal handler, you would need to block the signals it could be accessed in around each manipulation of it (or each access to it, if you manipulate it inside a signal handler). Yes, this will be slow, the solution is do not do that. For most sane uses of signals, the only time you want to do things with strings in the handler is from the SIGSEGV handler, so you can create some debugging information etc. At which point you can probably just do it. |
Custom formatters |
If you want to write a number to a string in C, you would normally write code such as... sprintf(my_str, "%d", num); ...and to append the same to a Vstr string it's a simple API change to... vstr_add_fmt(my_vstr, my_vstr->len, "%d", num); ...however if you want to write an IPv4 addres, a Vstr string or any other type that isn't in ISO 9899:1999 to a string you have to resort to doing to by hand. And if you want to format that output you have to either convert it to a C style string and use the "%s" option to the *printf() like function, or do all the formatting yourself. This is all pretty ugly, often unreliable, slow and takes significant programer resources. This is where custom formatters can help and give you back code clarity, reliability, speed and ease of use. Assuming you want to print an IPv4 address, then you can initialize the Vstr configuration like so... vstr_sc_fmt_add_all(my_vstr->conf); vstr_cntl_conf(my_vstr->conf, VSTR_CNTL_CONF_SET_FMT_CHAR_ESC, '%'); ...you then you can write... struct sockaddr_in sa; struct in_addr ipv4; vstr_add_fmt(my_vstr, my_vstr->len, "%-20{ipv4.p}", (void *)&ipv4); vstr_add_fmt(my_vstr, my_vstr->len, "%*{ipv4.p}", 20, (void *)&sa.sin_addr.s_addr); ...and to add the Vstr string you do... vstr_add_fmt(my_vstr, my_vstr->len, "%*.*{vstr}", 50, 50, (void *)my_vstr, 1, my_vstr->len); ...all normal printf() like formatting options work, as you would expect them to including being able to use i18n format specifiers to easily change the orde4r of output for different locales. However if you try the above, you'll note that all of the calls to vstr_add_fmt() will produce warnings with gcc, because "%{" isn't the start of a valid formatting character under gcc's static printf() parsing rules. This deficiency makes custom formatters as used above mostly useless, as you have to either turn warnings off for format strings (which is basically insanity in C) or see at least one warning for every usage of a custom formatter. To deal with this, the Vstr custom formatter code allows you to work around the static checkers by using the following initialization code... vstr_sc_fmt_add_all(my_vstr->conf); vstr_cntl_conf(my_vstr->conf, VSTR_CNTL_CONF_SET_FMT_CHAR_ESC, '$'); ...you can then call the custom formatters, using code like... struct sockaddr_in sa; struct in_addr ipv4; vstr_add_fmt(my_vstr, my_vstr->len, "$-20{ipv4.p:%p}", (void *)&ipv4); vstr_add_fmt(my_vstr, my_vstr->len, "$*{ipv4.p:%d%p}", 20, (void *)&sa.sin_addr.s_addr); vstr_add_fmt(my_vstr, my_vstr->len, "$*.*{vstr:%d%d%p%zu%zu%u}", 50, 50, (void *)my_vstr, (size_t)1, my_vstr->len, VSTR_TYPE_ADD_DEF); ...which although it isn't quite as nice as true support for customer formating in static analyzers like gcc it does make sure that custom formatters will not do anything obviously stupid (without producing spurious warnings) and provides complete protection for non-custom formatter calls. One final note is that in all sane environments you don't need the cast to (void *), however it is "in theory" required to be conforming ISO 9899:1999 C. You may also want to look at the tutorial section on creating custom formatters. |
Simple and heavily commented examples |
Note that some of these are explained in much more detail in the tutorial. To get a rough overview of how to use the library you can see the following heavily commented examples:
To get a better understanding, there are other example programs which aren't as heavily commented but should show how you can solve certain problems. They are:
All of the examples can be seen HERE. For the truly adventurous the "make check" test suite root is HERE (NOTE: the test suite is written to try and break the Vstr string library, so although it uses all of the APIs it may not be code you want to copy and paste into your programs/libraries -- however given that everything in the test suite works, you know that those uses do work). However do note that a couple of the tests do use undocumentation members of structs etc., and you still shouldn't use those. There is also a "port" of the vsftpd FTP server to use the Vstr string library. It can be found here. This was mainly an experiment in how well/easily Vstr would work inside an application designed for a traditional String API model. |