String library comparison

This page is a rough comparison between different string libraries (and some APIs, embedded in other libraries/programs) that I've seen. The comparison deals with: how the library stores the data in the string, how the library deals with IO on the strings (strings have to come from/goto places), how much of the problem of searching/comparing and parsing strings is dealt with for you, and how much testing the library does. This page also only lists code for which I can get the source code, that source code can be altered and is possible to use from another program (it's no good telling you about how great something is, if you can't use it with your code -- note however that not all licenses are compatible still, for instance as far as I know the postfix code isn't compatible with the GPL).

One of the biggest advantages of using a string library in C is the better security that it can provide. However I've tried to not talk about that directly below, as it is somewhat subjective and comprised of a few factors. On this page I give an outline of which things you should look for, to make an informed opinion about the security of all of the libraries.

It's probably worth noting that I wrote two of the string libraries, Vstr and Ustr. However I have a section for links to replies if you feel that you want to say something I've failed to, or just refute what I've said. For a comparison of printf() like functions see this page.





Name: Vstr string library
Language: C
Type: String library
Model: Complicated (list of nodes containing legnth of data of different types, including inline data pointer to data and references to data)
License: LGPL (Lesser)
Namespace: Self contained namespace of "vstr_" on everything but members of structs, internal symbols are hidden if that is available. Typedef's can be removed with a feature macro. The few macro functions can be removed with a feature macro.
IO: IO is easy anywhere into or out of a string, appending and writing from the begining are O(1) operations. Non-blocking IO functions are provided. Deals with any binary data. Has netstring functions.
Testing: Significant amount of user testing. Testsuite every line of code in the default build, 100% coverage. (83% size ratio between implementation and testsuite -- sloccount).
Version: 1.0.15
Pro: Heavily recommended for IO, is as good as it gets when talking to many network connections with non-blocking IO. Due to the API lots of small parts of a big string can be done by just refering to the original with the offset and length of the substring.
Con: Using it directly for lots of different small strings can cause a lot of memory overhead, due to the API this can be worked around ... but that requires some extra work.
Notes:

It is designed for network communication. Its design uses blocks of ptr+length data, so adding, substituting, and deleting data are all fast operations. It has a full API of all the usual string tasks: searching, comparing, splitting, substitution, converting between upper and lower case, and parsing numbers and strings. Has a full POSIX and ISO 9899:1999 compliant printf() implementation including gcc warning compatible custom formatters. For more detail see the other pages on this site.

Name: Ustr micro string library
Language: C
Type: String library
Model: Pointer, length and size. Using compact. storge (expansion of length/pointer is done to the nearest half power of 2).
License: MIT, new BSD or LLGPL (Lesser)
Namespace: Self contained namespace of "ustr_" on everything but members of structs, internal symbols are hidden if that is available and using the shared library. Typedef's can be removed with a feature macro.
IO: IO functions allow simple O(1) appending of data from a FILE * IO source. Only blocking IO functions are provided. Deals with any binary data.
Testing: Moderate amount of user testing. Testsuite currently has over 99% coverage. (45% size ratio between implementation and testsuite -- sloccount).
Version: 0.99.3
Pro: Heavily recommended for when you want lots of small strings, due to the significant space savings over normal ptr + length models. Also very useful if you have a "dependancy problem" and don't want to link against a "non-std." string API. Also covers lots of corner cases well, like adding string data to yourself. Contains some APIs for utf8 usage.
Con: As with most APIs using the simple ptr + length model, doing IO in chunks can be painfully slow due to removing some of the data from the begining of a string requiring moving the remaining data up. Dito. for any kind of non-equal substitution. Documentation isn't full done, yet.
Notes:

It is was designed to be a good compliment to Vstr, or for people who are reluctant to use any string API.

Name: C++ STL rope
Language: C++ (std::string compat. API)
Type: String API
Model: Complicated (list of nodes containing data, "data producing functions" or references to rope strings -- evolution of the cord design)
License: GPL (with exceptions to make it LGPL like, I think)
Namespace: Uses standard C++ namespaces, and no #define's.
IO: Can be used as both a consumer and a producer in the C++ IO model. Deals with any binary data. No netstring support.
Testing: Included in libstdc++-v3, so possibly good user testing. No testsuite.
Version: unknown (libstdc++-v3 cvs from 2003-02-24)
Pro: Recommended for C++ programs implementing simple text editors, possible also a good C++ solution for IO. Comes with g++.
Con: Although the interface is the same as C++ std::string, it is a very different implementation than what code may be written assuming. Using it directly for lots of small strings can cause a lot of memory overhead.
Notes:

This library tries to do something similar to what Vstr does. The major differences are that:

  • ropes can have a function as a string generator whereas Vstr doesn't allow that (error cases are a nightmare here), however the major use of this is file IO, and Vstr has functions to help there.
  • Vstr substitution is much faster than in ropes.
  • Vstr easily allows you to add references to data, so that you can add mmap() memory to the string etc. It's not obvious if it's possible to do this with a rope, so that you can make sure it isn't copied and/or cleanup properly when no strings references the data (this may be possible with inheritance).

Also, due to conforming to the std::string C++ API, it isn't possible to act with a substring of a Rope, you can only make a copy of a substring using the substr call (however that substring should make internal references to all data). Vstr doesn't have a true distinction between the entire Vstr and a portion thereof.

Name: GLib
Language: C
Type: General purpose Library
Model: Pointer, length and size (expansion of length/pointer is done to the nearest power of 2).
License: LGPL (Lesser)
Namespace: Needs entire glib API, so mostly self contained namespace of "g_" on everything but members of structs (a couple of exceptions are made for things like the MAX() macro).
IO: Has request/truncate functions, so appending reads aren't bad. But writing non-blocking is non-trivial. Can add/delete any binary data, but only has an equal function so no generic comparison/search functions. No netstring functions.
Testing: Included in glib, so very good user testing. Testsuite tests 11 functions, implementation has 28 functions (31% size ratio between implementation and testsuite -- sloccount).
Version: 2.2.1
Pro: Comes with glib, so no extra dependancies for glib/gtk/gnome/etc. applications. Fairly efficent for lots of small strings. Small implementation and massive userbase basically guarantees no errors (although you should note the portability guarantees that the printf functions provide).
Con: Very small provided API functionality, can't printf() a GString without doing a convertion to a C-style string, substitution isn't provided and adding string data to yourself will silently die. API is overly large for what it does. Failure to allocate memory calls abort().
Notes:

This is probably the most used C string library, and comes with the glib utility library. This works on a simple start pointer and length, model. This makes it much more memory effiecient for small strings. This feature also makes it pretty much impossible to do IO into the strings, share data between strings and kills performance on substitutions.

There is no substitution API in glib, probably because you can't share data so you just do a memcpy() and an overwrite (but it's far too easy to get this wrong). It's also worth noting that your program may crash if you try and add data in a GString to itself (there is no safe glib API to do this).

Note that vstr_split_chrs()-like functions are available in glib, as part of the "C string" helper functions (Eg. g_strsplit() in the case of vstr_split_chrs()). However this means that although a GString can contain a NIL these helper functions will silently truncate at the embeded NIL. There is also limited support in glib for doing things in ASCII regardless of the current user locale.

Name: Qt QString
Language: C++ (std::string compat. API)
Type: General purpose Library
Model: Pointer, length, size and reference count (shares entire strings only)
License: GPL or QPL (comercial license specifically offered)
Namespace: Needs entire Qt API, so mostly self contained namespace of "Q".
IO: Can be used as both a consumer and a producer in the C++ IO model. Deals with any binary data. No netstring support.
Testing: Included in Qt, so very good user testing. No public testsuite, (I've not seen trolltech's internal testsuite, just spoken to someone who has).
Version: 3.0.5
Pro: Comes with Qt, so no extra dependancies for Qt/KDE applications. Provides printf() like call directly to QString, so you can mix C++ style and C style.
Con: Although entire strings are referenced instead of copied, identical sub-strings are copied ... so lots of small strings, or lots of references to a large piece of data can cause a lot of memory overhead.
Notes:

Again, C++ ... uses a pointer and length model but allows reference counting on entire QString objects. This means that an assignment of an entire string from a to b will share most of the storage, but a substring or altering any part of either object will nullify all sharing. The printf like function has an internal implementation for parsing the format string (which doesn't allow i18n argument number specifiers -- or even l ll h hh size modifiers), but it also calls out to the host sprintf() implementation for numbers, pointers and doubles.

Name: SafeStr
Language: C
Type: String Library
License: Custom (new BSD-ish, I think)
Namespace: Self contained namespace of "safestr_".
IO: Has request/truncate functions, so appending reads aren't bad ... however the "request" function "safestr_truncate" memset()'s the requested extra space (for safety), in theory you can call the internal function safestr_resize(). Writing non-blocking is non-trivial. Some nice helper functions for blocking IO. Also includes a function to read a password from the terminal.
Testing: Testsuite tests 30 functions, implementation has 50 functions (26% size ratio between implementation and testsuite -- sloccount).
Version: 0.9.6
Pro: Has idea of "trusted" and "untrusted" strings. Gives errors for all uses of %n in it's printf() function. Basic type can be casted directly to a (char *), so this library is probably the best for working with APIs that expect C-style strings.
Con: Gives errors for all uses of %n in it's printf() function. Errors are dealt with by using another library to provide "exceptions" in C. Due to embedding the string with the in metadata, you need to pass pointers to your string pointer to all the allocating functions ... this doesn mean that if you want to have a pointer to the string in two or more places and have them update when either is updated, you have to store a pointer to the pointer. However if you just want the value, you can take a "reference" to the string which will keep that string valid.
Notes:

This string library seems partly a project to get a good "security conscious" (because there aren't enough, obviously) string library for C, and partly as a reference for the authors book on security. You can take references to entire strings, and there are quite a few utility funtions.

The strict focus on security also means that the API has a notion of "trusted" and "untrusted" strings, as perl does. However, this is only really useful if there are functions that happen/don't happen depending on the trustedness of the string (Ie. popen/system/etc.). Currently there are no such functions, even in the library, so in my opinion it doesn't buy you much. It also has a cookie in the string header, so that it can tell if the string has been misused.

Name: bstring
Response: Review of string libraries, by author
Language: C
Type: String Library
Model: Pointer, length and size (reallocates in powers of 2) -- uses a negative size to denote read-only strings.
License: Custom (new BSD-ish, I think)
Namespace: Self contained namespace of "b", however there are a few somewhat gratuitiuos exceptions.
IO: Has request/truncate functions, so appending reads aren't bad. But writing non-blocking is non-trivial. Some nice helper functions for blocking IO. Deals with any binary data. No netstring support.
Testing: Some testing via. the C++ wrapper.
Version: 06222003
Pro: Pretty good assortment of comparison/searching functions. Allows you to create read-only strings fairly easyily.
Con: The library isn't 1.0 yet, and still seems to be evolving ... for instance the bvformat function was removed in the last version, but is still in the header file. The function names are mostly very small, but not all of them are uniq to 6 characters ... so the reason it a mystery.
Notes:

This is a good string library, on the pointer, length and size model. The API names could use some work IMO, and the similarity will cause problems ... however the underlying implementation seems good, and the only serious things missing are: 1) non-blocking IO support. 2) substituion/replacement of data. 3) a non-host printf(). The library lets you make constant/read-only strings, which is somewhat unique. However the author has his description here along with comparisons of the library to others, including Vstr.

Name: sz
Language: C
Type: String Library
Model: Complicated (A tree of nodes to make up the string, all of which have a pointer and length).
License: Custom
Namespace: Somewhat confined to namespace of "sz" however there are a few leaks for "mem2" and "str2" importer functions whioch are in the system namespace (also has a str_decode() function which is in the system namespace).
IO: Has functions to read and write to ISO C FILE objects. It's not obvious, to me, if it has a usable request function ... although it does have a truncate call. No netstring functions.
Testing: Testsuite tests 30 functions, implementation has 48 functions (9% size ratio between implementation and testsuite -- sloccount).
Version: 0.9.2
Pro: Is a very ISO C compliant library, apart from some minor namespace violations. All functions adhear to the old 6 character limits. Implementation is comparitivly small given the complexity of design. It seems that lots of read only copies should be very space efficent.
Con: 6 character limits will probably hurt readability until you get very familiar with the code. Non-blocking IO seems like it would be very hard to do. No printf(). The license says it's "BSD or artistic, at your option. It may not be distributed under other terms or licenses without prior written agreement with the author." ... which implies a lot more limits than BSD imposes. The use of (void *) in a lot of places may lead to errors.
Notes:

An interesting library, it uses an opaque type for the string which is suitably non simplistic internally to allow quite a few opimisations. The function names all obey the 6 character C89 identifier limit (a limitation Vstr completely ignores so as to be more consistant, and hopefuly more readable). It's not obvious if it would use more or less memory than Vstr in general ... I'm sure there are cases where either is more, or less.

It uses (void *) in most places and takes either a C string or an (sz *) [the internal opaque type]. It distinguishes between these by a 2 character magic constant, so if you try and use a C string with that constant life becomes interesting. There is no printf like function.

Name: vsftpd
Language: C
Type: Program
Model: Pointer, length and size (expansion of length/pointer is done to exact size needed).
License: GPL
Namespace: Self contained namespace of "str_" on all functions and "mystr" and "str_" for structure tag, structure members are in private namespace of "PRIVATE_HANDS_OFF_".
IO: Has request/truncate functions, so appending reads aren't bad. But writing non-blocking is non-trivial. Internally the program uses blocking IO.
Testing: User testing as part of the application.
Version: 1.0.0 of vsftpd
Notes:

This has a fairly well abstracted namespace, esp. considering it is only bundled with the vsftpd ftp server. It works on the start pointer and length model, does dynamic resizing of strings and has quite a few utility functions. The only missing piece is a printf like function (the vsftpd code itself just calls snprintf() and then only for extremly simple cases). Failure to allocate memory calls abort().

It is somewhat ammusing that even though this isn't a string library, it is much better than most of the other string libraries here.

Name: postfix
Language: C
Type: Program
Model: Pointers, length, and a bunch of memebers (allocation policy can be chaned via. function pointers)
License: IBM Public License
Namespace: There are at least 3 parts you might want, and eash has a seperate namespace for everything but members of structs.
IO: The design is around IO, blocking IO is obviously supported. With a small amount of work non-blocking IO could be done for both reading appends and writing from the begining. Deals with any binary data. Has netstring APIs (and they are used by postfix), although they may mean doing extra copies due to their API.
Testing: User testing as part of the application, possibly other regession tests (it's obviously meant to be moved to other projects easily via. copying the source files).
Version: 1.7.1 of postfix
Pro: Should be easy to just copy into another project, so no extra dependancies are needed. Good APIs for IO.
Con: The stream design will be hard to use for any non-IO related string operations (Eg. comparisons/searches). The string structure is heavy weight for lots of string instances. There is also no way to share string data.
Notes:

This is a set of functions used in the postfix MTA daemon but obviously well abstracted so that they can be easily used in other applications. It works on a pointer and length model, although it also has "end pointer" and "amount left" variables. The abstraction seems somewhat weird to me, as the underlying objects want to look and act like (FILE *) and so have IO error flags ... and then on top of this are built Objects that are variable length strings and they'll never do any IO.

It is upto the user of the library whether you have a fixed or dynamically sized buffer, and I think you can return failure if memory isn't available but the vstream.c and vstring.c implementations just assumes this can't happen. The functions to act on the buffer are just copied APIs of ISO C (FILE *) manipulators, str* and mem* (with the addition of memcat()). Importantly there are no interfaces for removing data or substituting data in the string (you could probably do remove from the end of the string easily by playing with the pointers and counters, but you'd have to write your own function for it). There is no way to access anything but the entire string, using the API, or add data anywhere but the end of the string.

There is an interface for using netstrings, but instead of the simple begin and end semantics in Vstr the interface overloads the string interface ... so you have to say netstring_memcpy( ... ) which will copy data and encapsulate it as a netstring. It's also worth noting that the counters are of type "int", and the negative bit is used in the code ... so it's not possible to have a string bigger than INT_MAX.

The printf like functions are implemented by parsing the format string and then passing known good formats through to the host implementation sprintf() (after requesting enough space to hold them). It doesn't accept long long or long doulbe types, i18n argument number specifiers or thousand seperator modifiers.

Name: DJB string APIs
Language: C
Type: String APIs
Model: Pointer, length and size
License: Marked as "Public Domain" inside daemontools
Namespace: Each bit is in a seperate namesapce (most of the dynamic string API is in the "stralloc_" namespace)
IO: The DJB substdio APIs are missing from daemontools.
Testing: This is DJB code, so you can just assume it works.
Version: 0.9.8
Pro: Infinatly better implementation than the one in libowfat. Interesting to look at, for a view of the world where each function really does do only one thing (highly recommended to read).
Con: DJB style tends to be somewhat hard for other people to use, it's also impossible to do things like i18n and keep your sanity ... due to printf() being a bunch of small functions. There are more than a few useful functions that are in qmail that aren't in daemontools ... so aren't available to use (and the headers haven't been updated).
Notes:

As with all DJB code though, the API is written as a set of small atomic operations. For instance printf like functionality is implemented over 12 different functions named fmt_* (which don't check for overflows, but some are also reimplemented as a as stralloc functions). This design makes using the API much more clumsy, for a minor speedup, makes doing i18n almost impossible and goes directly against "premature optimization is bad".

It's worth nothing that although the stralloc functions deal with dynamic memory a lot of the other function ignore bounds checking and/or assume things are terminated with a '\0' character.

Name: OSSP str
Language: C
Type: String Library
Model: C Style strings (so embedded NIL characters aren't allowed)
License: Custom (new BSD-ish, I think)
Namespace: Mostly self contained namespace of "str_" on everything but members of structs, which is a system namespace in C (exceptions are TRUE and FALSE etc.).
IO: No direct support
Testing: Small amount of user testing. Testsuite tests 7 functions, implementation has 15 functions ... at least 4 constants are also untested (31% size ratio between implementation and testsuite -- sloccount).
Version: 0.9.8
Pro: None
Con: Buffer overflows, no real type, no IO.
Notes:

This library is a slightly saner version of the ISO C str* functions, with a few extentions. It works with (char *) as the native type, and doesn't do automatic allocation ... so buffer overflows are still a concern.

The printf implementation is internal and based on the Apache snprintf() function, '\'' (thousand modifiers), 'a', 'F', 'Lf', 'lld', 'td', 'zd', 'hhd' , etc. and i18n format parameter modifiers are all completly missing Unspecified precision is broken, as is corner cases for octal etc. also infinity/nan output is not correct with regard to case. Buffer overflows are possible in the integer formatting paths You can have custom modifiers, but only triggered on the system '%' character ... so gcc will currently spam warnings. It also looks like the ISO C std. is completely ignored for certain corner cases. Also note that due to the fact that the strings cannot be resized by the library the printf implementation uses a snprint() interface, this means that data can be lost using the interface if the programer isn't carefull.

Name: c2lib
Language: C
Type: General purpose Library
Model: C Style strings (so embedded NIL characters aren't allowed)
License: LGPL (Library)
Namespace: Very bad, many namespace violations in the APIs you'll use directly ... also has other sections in the library that are in other namespaces (that are equally violated)
IO: No direct support
Testing: Testsuite tests all 10 functions (27% size ratio between implementation and testsuite -- sloccount).
Version: 1.4.1
Pro: Interoperation with C-style strings is very good, the only minor problem being that you need to call a special free function for all c2lib strings. Has pcre short cut functions builtin.
Con: Terrible namespace. No IO. Requires pcre to be install to build/run.
Notes:

This library works with (char *) as the base type, although all allocator functions are also passed a "pool" that the (char *) comes from, and so resizing can be done by the library.

By version 1.4.1 there is a printf like function, it uses the system asprintf() call if it is available and fails the Linux test. If asprintf doesn't exist it assumes a snprintf() conforming to the ISO 9899:1999 return value semantics -- also this code path probably won't work on Sparc or other weird platforms. It does declare a "vector" type that is roughly equivalent to a Vstr_sects type however it only contains a "ptr" to the data ... so doing a split on a string involves at least doing a memdup() (it actually does a strndup()).

Major namespace corruption, for example by using the string function you'll get definitions for "pool" and "vector"; Also the constructor functions tend to have names like "new_pool", "new_subpool" and "new_vecotr" as well as "pool_register_malloc" and "vector_push_back" etc. There are also lots of uses of macro functions in lower case, for instance vector_push_back() is a macro function calling _vector_push_back() (which also violates ISO 9899:1999 7.1.3/1 "All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces."

Name: my_string
Language: C
Type: String Library
Model: C Style strings (so embedded NIL characters aren't allowed)
License: GPL
Namespace: No real namespace, a few functions are in the "Str" system namespace, and the rest are named randomly (presumably the same as the php versions).
IO: No direct support
Testing: Testsuite tests 4 functions, implementation has 13 functions (7% size ratio between implementation and testsuite -- sloccount).
Version: 1.0.0
Pro: If you have used php a lot, and C very little ... you might appreciate this.
Con: Buffer overflows, no real type, no IO.
Notes:

This is a bunch of add on functions to the std. C string functions, inspired by PHP.

Name: libtext
Language: C
Type: String Library
Model: C Style strings (so embedded NIL characters aren't allowed)
License: GPL
Namespace: Mostly self contained namespaces of "t_" and "Text" on everything but members of structs, however it also uses "_Text" which is a system namespace in C and a couple starting "__" which is also a system namespace (exceptions are SHOWBUFINFO and PrintTinfo which are for debuging the library and will be removed).
IO: Has truncate function ... but no obvious request function, so non-blocking appending reads are non-trivial. writing non-blocking is also non-trivial.
Testing: Testsuite test 15 functions, implementation has 67 functions (8% size ratio between implementation and testsuite -- sloccount).
Version: 0.0.0-beta1
Pro: Does dynamic allocation and uses malloc/free directly.
Con: Very little help with IO. API has weird names.
Notes:

This library doesn't allow "binary" (!isgraph && !isspace -- so changes depending on global locale) characters. Printf like function calls the host implementation. Has a large API for add, find, delete and substitue. A couple of other APIs for reverse, uppercasing and lowercasing. Allows you to specify a max size for the string.

Name: Cords, part of Boehm GC
Language: C
Type: String API
Model: Complicated (tree of nodes containing data, "data producing functions" or references to cord strings - data is all read only).
License: Custom (new BSD-ish)
Namespace: Mostly self contained namespace of "CORD" on everything except structure members (which are all private -- exceptions are MAX_DEPTH and FUNCTION_BUF_SZ).
IO: Has functions to read and write to ISO C FILE objects. No netstring support.
Testing: Probably good user testing. No apparent testsuite.
Version: not obvious
Pro: Useful if you are using the Boehm Garbage Collector in your C code.
Con: Probably useless otherwise.
Notes:

This is the string implementation that comes with the Boehm Garbage Collector (and so is included in gcc etc.). It pretty much requires a GC as you can't "alter" a string, only make a new string with the alterations in it. Sharing data is a main point of this implementation, however again it isn't possible to do things directly on a "substring" you must first create that substring as a first class string.

Although the basic APIs are there, add/del/sub/etc. there are few added functions to help you deal with the strings (although it does provide something equivilent to vstr_sc_read_len_file() but it uses stdio, and there isn't any good way to deal with IO errors).

Also note that the printf implementation just calls the host implementation of sprintf/asprintf/etc. directly for anything that isn't one of the 's', 'r', 'c' or 'n' format specifiers. The custom format specifier of 'r' is the only one possible and will make gcc barf warnings if you use it. It also doesn't allow i18n argument number specifiers. This is an "old" implementation though, with the last copyright from 1994 so some of these problems probably stem from that.

Name: libretto
Language: C
Type: General purpose Library
Model: Pointer, length, size
License: LGPL (Library)
Namespace: Needs entire libretto API, so terrible ... each subsystem takes it's own namespace, with some other random stuff.
IO:
Testing: Testsuite tests 32 autostr and 32 autobuf functions, implementation has 53 autostr functions and 36 autobuf functions (86% size ratio between implementation and testsuite -- sloccount).
Version: 2.1
Pro:
Con: Glib is a lot more common, is maintained and has a real namespace ... use that instead.
Notes:

Similar implementation to glib, however there are a lot more utility functions for finding data dna comparing strings. There are really two string APIs, one for strings (Autostr) and one fo "dynamic buffers" (Autobuf) ... these type would be interchangable apart from the fact that the members are ordered differently in their structure definitions. Autobuf's can handle data with NIL bytes in it, while Autostr calls the ISO C str*() functions and silently fails. Both string types share features, however the Autostr API has a few more functions. This library is now unmaintained, according to the original author.

Like glib printf() like function calls the host seperately for each % token after calculating the max possible size of that tokens output. It doesn't include '\'' (thousand modifiers), 'A', 'a', 'td', 'zd' and i18n format parameter modifiers. However they will be equally unportable everywhere.

Note that there are static extentions to the printf() like function so that you can print the Autostr string type.

Name: cfl
Language: C
Type: General purpose Library
Model: Dynamic allocation with pointer, length, and size and "C style strings".
License: LGPL (Lesser)
Namespace: Mostly confined to namespace of "c_", however types are in the POSIX namespace of *_t and there are preprocessing symbols in the system namespace __*. Has exceptions for `TRUE', `FALSE', `NUL', and `CRLF'
IO: Has a few different IO functions, all designed for sync. c_buffer can but used for request/truncate, but it is very simple and you can't convert that into something more useful. C_dstring_load can overflow memory.
Testing: No public test suite.
Version: 1.2
Pro: Has at least 4, string like APIs. c_dstring, c_string, c_strbuffer and c_buffer. Pretty good documentation.
Con: Has at least 4, string like APIs. Some of which differ mostly by their allocation policies, and there seems no way to move data between them (without just copying it). c_string and c_strbuffer call strncpy() which is grossly inifficent.
Notes:

It has functions for slightly saner versions of the ISO C str* functions. And also has dynamic allocation APIs. The library has some vector functions, but it uses (char **) as the vector type and alters the original string in it's verison of split. As a general observation the library seems like it should split out some of the code, as it has low level code like strings, trees, linked lists ... and high level things like http server, and fortune file loader. Bad direct uses of ISO C str* functions in other parts of API (specifically strncpy() in httpserver allows non-terminated strings).

Has nice texinfo documentation.

Name: Mimelib DwString (part of kdenetwork)
Language: C++ (std::string compat. API)
Type: Mime Library, which includes a string API
Model: Pointer, length, size and reference count (shares any substring)
IO: Can be used as both a consumer and a producer in the C++ IO model. Deals with any binary data. No netstring support.
Namespace: Needs entire MimeLib API, so mostly self contained namespace of "dw".
Testing: Included in MimeLib, so good user testing. Testsuite doesn't test entire string API (EG. DwStrcasecmp isn't tested), however being C++ it is non-trivial to tell automatically how much is (this is made worse by the fact that there are not specific string tests, just entire mimelib tests).
Version: 3.0.0 of kdenetwork
Pro: If you are writting for KDE, and need/want sub-string auto sharing of data then it's easily available.
Con: Qt has to be available to everfy application that uses this, and the Qt version will have more users ... so it's the better choice if you don't need the sub-string sharing. Also doesn't have a printf() call, and has support to compile using just std::string (but I assume they'd only do that if the std::string did sub-string sharing as well). Project pages seem to have disappeared, so it's possible it's unmaintained.
Notes:

C++ ... uses a pointer and length model but allows reference counting the Dwtring objects. This means that an assignment of an entire string, or a substring from a to b will share most of the storage, but altering any part of either object will nullify all sharing. Apart from that it is much like QString in the QT library.

Name: libast (formerly libmej)
Response: Response by author
Language: C
Type: General purpose Library
Model: Pointer, length and size
License: Custom (new BSD-ish)
Namespace: Needs entire libast API, so terrible ... namespace is all over the place for large parts of the library, although strings tend to stay in spif_str_*() ... but there is no way to get just those functions.
IO: Has request/truncate functions, so appending reads aren't bad -- however as of 0.5 the set_len() function which is the only way of truncating in the API prints a message on stderr saying you are a moron. writing non-blocking is non-trivial. Blocking read functions are in the API for file descriptors and ISO C FILE objects. Can deal with binary data as input, but a lot of the utility functions assume C-style strings.
Testing: Testsuite tests 40 APIs, implementation has 55 APIs (46% size ratio between implementation and testsuite -- simple line count)
Version: 0.5
Pro: Has some nice utility functions, including things like turning an int into a string ... which are uncommon.
Con: Hidden assumption of C-style strings. Stupid error message when you use the truncate API for a non-blocking read(). All the extra baggage, and the namespace destruction, you get.
Notes:

Has a start pointer and length model, however it does grow the strings itself ... and call abort() is the allocations fails. However, note that altough the library includes a bad snprintf() implementation (see the printf comparison page) it doesn't have a sprintf() call to write into the "spif string". typedef's and macro's appear to be used just to make the code less readable. It has an "interesting" set of APIs, mainly due to the overhead of adding data to a string or getting a substring. For instance you can "splice" part of a string and another string, but you can't substitue data inside a string without copying it multiple times.

The fact that almost all searching/comparing APIs map onto C library APIs means that embeded NIL characters silently fail -- even though there are APIs to initialise strings from a file descriptor.

It has terrible abuse of the namespace, outside of the str.c file (and even in the string.c file it exports a function called "join") however it looks like you could seperate the string.c code out without a lot of work -- at which point the namespace is well contained (but it also is built assuming that you'll be using the "spif" object model -- however this doesn't seem to be a requirement).

Name: librock text processing
Language: C
Type: General purpose Library C Style strings (so embedded NIL characters aren't allowed)
License: Custom (new BSD-ish, I think)
Namespace: Self contained namespace of "librock_".
IO: Has request/truncate functions, so appending reads aren't bad. But writing non-blocking is non-trivial. Some nice helper functions for blocking IO. Binary data with NIL values will screw up the representation of the string.
Testing: No public testsuite, private testsuite doesn't cover everything (I've not seen it, just spoken to the author).
Version: unknown (2003-02-25)
Pro: Has some nice functions to use if you need to keep compatability with C-style strings elsewhere in your code above all things. Allows different allocators.
Con: Requires a lot of baggage. Weird API names, and large function prefix.
Notes:

This library has allocating versions of most of the std. modifying str*() library functions. It has a specialised version of strspn() for matching "C identifier" like names. It also has a couple of utility functions for reading IO ... however blocking is mandatory, and speed may be a problem due to numerous realloc() calls for large datasets.

The printf() like function just calls the host implementation. Note that before 2003-02-25 the failure path was broken, and would crash.

Name: knetstring
Language: C
Type: String Library
License: GPL
Namespace: Self contained namespace of "kns_".
IO: Implementing the request/truncate functions, so appending reads aren't bad, isn't too hard ... and within the API. But writing non-blocking is non-trivial. Some nice helper functions for blocking IO.
Testing:
Version: 1.0.4
Pro: If you want to just add netstring APIs, this isn't too bad.
Con: Needs another string library for anything but dealing with netstrings (and some IO).
Notes:

Has a start pointer and length model. Only has APIs specific to doing IO using netstrings. Recovering from temporary IO errors is next to impossible.

Name: toolbox
Language: C
Type: General purpose Library (although not much more than just the string APIs)
Model: Pointer and size
License: Custom
Namespace: Self contained namespace of "tb_".
IO: Just strcpy() etc. anything else you'll have to do yourself.
Testing:
Version: 2003-03-10
Pro: It returns (char *) types that, in theory, you can treat as if you allocated them yourself with "malloc" ... however this isn't true.
Con: The theory doesn't match up with reality. The price for the theory that doesn't really work is massive overhead for each function call. SafeStr is about as good as you are going to get, just don't mixup where your data came from and call the correct free function.
Notes:

This library was the basis of an "article" in CUJ, however try not to judge CUJ that badly based on that ... it is often ok.

The idea is that everything the library deals with looks like a generic (char *), so you can forget about having to maintain state about which of your strings is from strdup() and which from the library. However you can't pass a malloc()'d string to the library (it'll call abort()), and if you free()/realloc() a pointer returned from the library then you'll corrupt it's internal state on a subsequent allocation.

However to keep this illusion that you could do those things the library has to do a looup for the metadata, which is kept seperately, on each call into the library. Currently this is done via. a sequential scan through an unsorted array (O(n/2) time).

In theory you might be able to reduce the errors to not calling abort() on invalid strings when you should, by doing an extra lookup at allocation/reallocation time. Although this would reduce the speed even further, and is not strictly conforming C.

Name: firestring
Response: Response by author
Language: C
Type: String Library
Model: Pointer, length and size (expansion of length/pointer is done to exact size needed). Some functions will dynamically allocate, some will just return an error if the size is exceeded. Also contains C-style string wrappers (although note that firestring_strcasecmp("a", "ab") isn't valid).
License: GPL
Namespace: Mostly confined to namespace of "firestring_", however types are in the POSIX namespace of *_t and there are two "ESTR_" namespaced macros.
IO: Although it doesn't have a direct API for request/truncate functions, it seems like you should be able to alter the structure manually. Also it has a read function that is just as efficient as a request/truncate API.
Testing: Testsuite tests 20 functions, implementation has 68 functions (%15 size ratio between implementation and testsuite -- sloccount).
Version: 0.9.1
Pro: Has intgrated encode/decode functions for a few things and a config. parser.
Con: There is a function that purports to be a printf like function however about the only commonality is that "%d" prints an int and "%s" prints a NIL terminated character array. This will almost certainly cause confusion and problems. Due to the useage of restrict it isn't possible to compare a string with itself. Failure to allocate memory calls abort().
Notes:

It provides a function to do a read, which is nice. Also has a simple to use "parse configuration file" function. It provides functions to operate on C-style strings, with the advantage that they are broken everywhere that honours restrict. Functions which operate on real dynamically sized strings and ones that take the same type that just limit the length of strings. This may well lead to consufion, for instance there is a function called "firestring_estr_chug()" which would corrupt data data when used with a dynamically allocated freistring (and could lead to a remote sercurity vulnerability) but would just be confusing with stack allocated limited data (note that at least "firestring_estr_trim()" calls this function).

Gratuitous usage of "restrict" in the API makes a lot of the functions severly suspect, and some just obviously broken. Note that the test suite "tests" this case, so it's likely that the default compiler is ignoring these obvious errors.

Name: libowfat
Language: C
Type: General purpose Library
License: GPL
Version: 0.13
Pro:
Con: Most of the original string functions that DJB wrote, are now released as "public domain" ... just see those.
Notes:

This is a reimplementation of the functions in qmail, but under a GPL license. It does have a stralloc set of calls that operate on a start pointer and length model, they do dynamically reallocate memory and pass memory failure back to the caller. As with all DJB code though, the API is written as a set of small atomic operations. For instance printf like functionality is implemented over 12 different functions named fmt_* (which don't check for overflows, but some are also reimplemented as a as stralloc functions). This design makes using the API much more clumsy, for a minor speedup, makes doing i18n almost impossible and goes directly against "premature optimization is bad".

It's worth nothing that although the stralloc functions deal with dynamic memory a lot of the other function ignore bounds checking and/or assume things are terminated with a '\0' character. Even more so than the DJB functions, although this is probably bad just implementation rather than deliberate -- but then why would you think you can write good code with an interface when the implementor can't (for instance the scan_long and scan_8long functions are almost completely broken in libowfat ... but fine in qmail).

Name: xstring
Language: C
Type: String Library
Model: Pointer, length and size
Pro:
Con: There are much better implementations of the design.
Notes:

This library works on a simple start pointer and length, model. It is mostly a subset of the glib functionality, and like glib also calls abort() on memory errors. One of the obvious missing pieces is a printf() like function.

The namespace isn't terrible, apart from the fact that the identifiers "xmalloc", "xrealloc" and "xfree" are all defined and exported from the library (which is compiled shared, but given a static library ".a" suffix).

Name: VSTRING
Language: C++
Type: String Library
Pro:
Con: It would be better to use the std::string class.
Notes:

This is currently just a C++ wrapper class around a (char *), with a couple of extra functions not available in std. C. It does to auto matic resizing. It also look looks like the implementation could be changed to fix a lot of problems with this (Eg. the length is calculated using strlen(), so embeded '\0' characters corrupt data) without affecting source compatability. The printf like function calls the host implementation.

Name: MString
Language: C++
Type: String Library
Pro: Is you need MFC compatability, this might be your only choice.
Con: It would be better to use the std::string class.
Notes:

This is a C++ library designed to be compatible with the string library that comes with the Microsoft Foundation Classes. Each character in the string is actually a class itself. This could probably be fixed without changing source compatability.

There are a few other string APIs that, for one reason or another, I haven't got artound to looking at. Feel free to have a look at them, or check back every now and again when I might have looked at them.

Any corrections or omissions you see in the above, feel free to contact me at the address below


James Antill
Last modified: Sun Jun 24 19:10:09 EDT 2007