Vstr is a string library designed to work easily and efficiently in network applications. This means that it is designed to perform IO operations in a non-blocking fashion, and allow the programer to take data from various places and join them in an IO operation without having to do any copies.
Often IO primitives are assumed to be blocking (Ie. all possible data will be done before IO operation completes). This does make "Hello World" type applications somewhat simpler, however networked applications have different requirements. However, don't be detered functions will be introduced for those simple applications to make them more readable.
Also note that all error checking is included in every example, it may make the examples somewhat easier to read if it wasn't included ... however including error checking is what the code must look like in a real application.
Here is about the most simple Vstr application, this is a single function that prints "Hello World" on a single line in a POSIX environment...
/* hello world - Self contained, using a single piece of data at * initialisation time (no data copying) */ #define VSTR_COMPILE_INCLUDE 1 #include <vstr.h> #include <errno.h> /* errno variable */ #include <err.h> /* BSD/Linux header see: man errx */ #include <unistd.h> /* for STDOUT_FILENO */ int main(void) { Vstr_base *s1 = NULL; if (!vstr_init()) /* initialize the library */ err(EXIT_FAILURE, "init"); /* create a string with data */ if (!(s1 = vstr_dup_cstr_buf(NULL, "Hello World\n"))) err(EXIT_FAILURE, "Create string"); /* output the data to the user -- * assumes POSIX, assumes blocking IO but should work without */ while (s1->len) if (!vstr_sc_write_fd(s1, 1, s1->len, STDOUT_FILENO, NULL)) { if ((errno != EAGAIN) && (errno != EINTR)) err(EXIT_FAILURE, "write"); } /* cleanup allocated resources */ vstr_free_base(s1); vstr_exit(); exit (EXIT_SUCCESS); }Last modified: Wed Jan 14 08:19:40 2004
...however this example is somewhat too simplistic because, normally, a Vstr string contains multiple nodes of informtion that are treated internally as a single entity.
So first we'll clean up the above example to move all the header inclusion into one file and create some helper functions to simplify the actually "hello world" code. Here's a quick overview of the functions moved into the header file...
This example is available here (2.07KB, Last modified: Mon Sep 22 15:07:00 2003).
Now using the above header file, we can re-write the initial example in a much more readable form...
/* hello world - Using a single piece of data (no data copying) */ #include "ex_hello_world.h" /* helper functions */ int main(void) { Vstr_base *s1 = hw_init(); vstr_add_cstr_buf(s1, s1->len, "Hello World\n"); if (s1->conf->malloc_bad) errno = ENOMEM, err(EXIT_FAILURE, "Add string data"); while (io_put(s1, STDOUT_FILENO) != IO_NONE) {} exit (hw_exit(s1)); }Last modified: Mon Sep 22 14:42:56 2003
We can now alter the above to still print "Hello World" on a line, but have the data come from multiple sources which is then stored internally in mutliple nodes (remember there is no copying of data still). However can treat this all as a single string from our point of view. Although this example is obviously contrived, this is much more representative of what a networked application looks like...
/* hello world - multiple sources of data (still no data copying) */ #include "ex_hello_world.h" /* helper functions */ int main(void) { Vstr_base *s1 = hw_init(); vstr_add_cstr_ptr(s1, s1->len, "Hello"); vstr_add_cstr_ptr(s1, s1->len, " "); vstr_add_cstr_ptr(s1, s1->len, "World\n"); /* we are checking whether any of the above three functions failed here */ if (s1->conf->malloc_bad) errno = ENOMEM, err(EXIT_FAILURE, "Add string data"); /* loop until all data is output... */ while (io_put(s1, STDOUT_FILENO) != IO_NONE) {} exit (hw_exit(s1)); }Last modified: Mon Sep 22 14:44:56 2003
This is the final Hello World example, this is the first one that actually copies some of the data for the string. It also shows how you can add data at any point in the string, and substitute data within the string. Note that when giving the position to add data to the string you give the position before the position you wish the start of the data to be at, and when giving the position/length for a section (or sub-string) the position given is included within the section.
For people familiar with C++ this works out to be the same way that C++ std::string encodes positions for adding data (Ie. insert()), but not for getting sections (C++ often explains it as having a 0 index'd string and data is added before the point given, Vstr does uses a 1 poition for the first byte as that means that appending data is always done at the current length ... and a position of 0 can be used as invalid for searching etc.).
Description of operation | Position | Legnth |
Prepend data to X | 0 (Zero) | N/A |
Append data to X | X->len (Length of Vstr string) | N/A |
Last byte | X->len (Length of Vstr string) | 1 (One) |
First byte | 1 (One) | 1 (One) |
Entire Vstr string | 1 (One) | X->len (Length of Vstr string) |
All of X, but the first and last bytes | 2 (Two) | X->len - 2 (Length of Vstr string minus Two) |
/* hello world - multiple pieces of data, includes substitution and * inserting data into the middle of a string */ #include "ex_hello_world.h" /* helper functions */ int main(void) { Vstr_base *s1 = hw_init(); Vstr_ref *ref = NULL; vstr_add_cstr_ptr(s1, s1->len, "Hello"); vstr_add_rep_chr(s1, s1->len, 'W', 5); /* add "WWWWWW" */ if (s1->conf->malloc_bad) errno = ENOMEM, err(EXIT_FAILURE, "Add string data"); /* substitute an 'o' for a 'W' */ if (!vstr_sub_rep_chr(s1, strlen("HelloWW"), 1, 'o', 1)) errno = ENOMEM, err(EXIT_FAILURE, "Substitute string data"); /* substitute "WWW" for a "rld\n" -- */ if (!vstr_sub_cstr_buf(s1, strlen("HelloWoW"), strlen("WWW"), "rld\n")) errno = ENOMEM, err(EXIT_FAILURE, "Substitute string data"); if (!(ref = vstr_ref_make_ptr((char *)"XYZ ", vstr_ref_cb_free_ref))) errno = ENOMEM, err(EXIT_FAILURE, "Create data reference"); /* now ref->ptr is "XYZ " */ /* add space after "Hello", by skipping "XYZ" in reference */ vstr_add_ref(s1, strlen("Hello"), ref, strlen("XYZ"), strlen(" ")); vstr_ref_del(ref); /* delete our reference to the Vstr_ref */ if (s1->conf->malloc_bad) errno = ENOMEM, err(EXIT_FAILURE, "Add string data"); while (io_put(s1, STDOUT_FILENO) != IO_NONE) {} exit (hw_exit(s1)); }Last modified: Mon Sep 22 17:12:18 2003
This is a full header file needed to do simple non-blocking IO operations, it also puts into functions the common init and exit sections. This will be used by all of the following examples. The Simple GETOPT implementation isn't used for a while, so you can ignore that for now. A quick overview of the changes from the hello_world.h header file are...
And a quick review of the additions...
This example is available here (12.14KB, Last modified: Tue May 17 20:04:00 2005).
This is the unix cat command, implemented with the help of the functions in the above ex_utils.h header file. This uses the same Vstr string for both input and output and uses non-blocking IO, both of which are done for efficiency.
If you have seen something like the "simple" cat in LAD this version looks much bigger. However one of the main reasons for this is that the LAD version has many bugs. The main problem is the lack of checking on the IO calls, this is most easily demonstrated by running it like so...
perl -e 'use Fcntl; fcntl(STDIN, F_SETFL, O_NONBLOCK); exec(@ARGV);' ./cat
...will cause the LAD version to exit immediatley due to an EAGAIN being returned from read. This problem also affects the LAD version being used in a blocking pipe due to the fact that write() isn't required to write() all it's data.
The LAD version also doesn't open() any files, which is significant functionality. So after fixing those bugs we get something that is much closer to the Vstr version and it still suffers from performance problems due to the need to block on input and output separately. It is possible to create a version using read, write and poll that would perform the same as the Vstr version ... however even the simplest method would have to implement it's own ring buffer which is very prone to error and would almost certainly make it bigger than ex_cat.c and ex_utils.h combined.
/* This is a _simple_ cat program. * Reads from stdin if no args are given. * * This shows how to use the Vstr library at it's simpelest, * for easy and fast IO. Note however that all needed error detection is * included. * * This file is more commented than normal code, so as to make it easy to follow * while knowing almost nothing about Vstr or Linux IO programming. */ #include "ex_utils.h" #define CONF_USE_MMAP_DEF FALSE /* Keep reading on the file descriptor until there is no more data (ERR_EOF) * abort if there is an error reading or writing */ static void ex_cat_read_fd_write_stdout(Vstr_base *s1, int fd) { while (TRUE) { int io_w_state = IO_OK; int io_r_state = io_get(s1, fd); if (io_r_state == IO_EOF) break; io_w_state = io_put(s1, STDOUT_FILENO); io_limit(io_r_state, fd, io_w_state, STDOUT_FILENO, s1); } } static void ex_cat_limit(Vstr_base *s1) { while ((s1->len >= EX_MAX_W_DATA_INCORE) || (s1->len >= EX_MAX_R_DATA_INCORE)) { if (io_put(s1, STDOUT_FILENO) == IO_BLOCK) io_block(-1, STDOUT_FILENO); } } /* This is "cat", using non-blocking IO and Vstr for buffer space */ int main(int argc, char *argv[]) { Vstr_base *s1 = ex_init(NULL); /* init the library etc. */ int count = 1; /* skip the program name */ int use_mmap = CONF_USE_MMAP_DEF; /* parse command line arguments... */ while (count < argc) { /* quick hack getopt_long */ if (!strcmp("--", argv[count])) { ++count; break; } else if (!strcmp("--mmap", argv[count])) /* toggle use of mmap */ use_mmap = !use_mmap; else if (!strcmp("--version", argv[count])) { /* print version and exit */ vstr_add_fmt(s1, 0, "%s", "\ jcat 1.0.0\n\ Written by James Antill\n\ \n\ Uses Vstr string library.\n\ "); goto out; } else if (!strcmp("--help", argv[count])) { /* print version and exit */ vstr_add_fmt(s1, 0, "%s", "\ Usage: jcat [FILENAME]...\n\ or: jcat OPTION\n\ Output filenames.\n\ \n\ --help Display this help and exit\n\ --version Output version information and exit\n\ --mmap Toggle use of mmap() to load input files\n\ -- Treat rest of cmd line as input filenames\n\ \n\ Report bugs to James Antill <james@and.org>.\n\ "); goto out; } else break; ++count; } /* if no arguments are given just do stdin to stdout */ if (count >= argc) { io_fd_set_o_nonblock(STDIN_FILENO); ex_cat_read_fd_write_stdout(s1, STDIN_FILENO); } /* loop through all arguments, open the file specified * and do the read/write loop */ while (count < argc) { unsigned int ern = 0; if (use_mmap) vstr_sc_mmap_file(s1, s1->len, argv[count], 0, 0, &ern); if (!use_mmap || (ern == VSTR_TYPE_SC_MMAP_FILE_ERR_FSTAT_ERRNO) || (ern == VSTR_TYPE_SC_MMAP_FILE_ERR_MMAP_ERRNO) || (ern == VSTR_TYPE_SC_MMAP_FILE_ERR_TOO_LARGE)) { /* if mmap didn't work ... do a read/alter/write loop */ int fd = io_open(argv[count]); ex_cat_read_fd_write_stdout(s1, fd); if (close(fd) == -1) warn("close(%s)", argv[count]); } else if (ern && (ern != VSTR_TYPE_SC_MMAP_FILE_ERR_CLOSE_ERRNO)) err(EXIT_FAILURE, "add"); else /* mmap worked */ ex_cat_limit(s1); ++count; } /* output all remaining data */ out: io_put_all(s1, STDOUT_FILENO); exit (ex_exit(s1, NULL)); }Last modified: Thu Oct 21 20:48:09 2004
This is somewhat like the "nl" unix command, this is implemented in much the same way as the cat command. However the data has to have something added to the start of each line before it can be output, so we now have two string objects: One for input and one for output. Note that as the data is "moved" from the input to the output string object, it isn't copied instead a reference is created and shared between the two strings.
This example is available here (6.07KB, Last modified: Fri Oct 15 16:21:07 2004).
This is somewhat like the "hexdump" unix command, it also uses the same simple IO model used in the cat and nl commands. However the data is now output twice once as hex values, and a second time as characters (converting unprintable characters into '.' characters). So again we have two string objects: One for input and one for output. To get two copies of the data, we initially export the data to a buffer and then convert that to hex via vstr_add_fmt() (the printf like function). We then convert the data that is still in the string object so it is printable and move it from the input to the output string objcet. Note that as the data is "moved" from the input to the output string object, it is always copied even if it was a reference on input (Ie. mmap()ed).
One other thing that is new in the hexdump command is the use of the VSTR_FLAGXX() macro fucntion, this is a convienience feature for when you need to specify multiple flags at once.
This example is available here (5.36KB, Last modified: Mon Mar 14 23:46:21 2005).
Custom formatters are an exteremly useful feature of the Vstr string library, allowing you to safely implement one or more methods of printing a pointer to an arbitrary object. The most simple uses are to use the builtin custom formatters. This examples shows how you enable the IPv4 and Vstr custom formatters and then use how you can use them.
This example also includes vstr_sc_basename() which acts in a similar way to the POSIX basename function.
This example is available here (2.56KB, Last modified: Mon Feb 2 08:45:42 2004).
So, printing ipv4 address is nice ... but the big benifit, with custom formatters, come with using them on types that you have to deal with a lot. This usually means types that you've defined yourself, or are defined in a library you are using. So this example will show you how to create your own custom formatters. I'll use the GNU multiple precision arithmetic library which is a well used library for creating arbitrary precision numbers (Ie. numbers that can represent any value). One of the annoying features of using this library is that it is non-trivial to easily turn these numbers into strings and/or output them as you would with a normal int/long/size_t/intmax_t/etc.
The GMP library has a set of printf like functions, and while you can create a limited length string, newly allocated string or create output to a file using these functions they fail the "Easy" test for a number of reasons...
It is not possible to use the grouping format specifier with the GNU MP variables. This means that when running under Linux/glibc, while...
int d; char *ret = NULL; /* ... */ gmp_asprintf(&ret, "%s is an num %'d.\n", "int", d);
...creates a C style string for the number in a readable format for the locale, the GNU MP variable equvilent...
mpz_t z; char *ret = NULL; /* ... */ gmp_asprintf(&ret, "%s is an bignum %'Zd.\n", "mpz_t", z);
...doesn't do anything other than create the number.
Any static printf like function format specifier checking has to be disabled for these functions due to the way the hard coded custom formaters are implemented. This means that although if you do...
int d; char *ret = NULL; /* ... */ asprintf(&ret, "%.*s is an num %d.\n", "int", d);
...gcc will tell you there is an error, when you do...
mpz_t z; char *ret = NULL; /* ... */ gmp_asprintf(&ret, "%.*s is an bignum %Zd.\n", "mpz_t", z);
...the code will happily compile, and then almost certainly crash when you run it.
You also lose all of the great benifits of using vstr_add_fmt(), the biggest problem here is that the gmp printf like functions pass all non gmp types directly to the underlying snprintf/etc. call. Although the extra speed/memory benifits of Vstr are nice too :).
Even if all of these problems were fixed in some future version, this still only solves the problem for GMP types. So if you have one or more other custom types that you need to format, you'd need yet another function.
The custom formatter for mpz_t is about 25 lines of code in this example, it could be a little less if you removed some features (Ie. supporting positive or negative values) or always got libgmp to allocate storage and then free it (or if some libgmp APIs were defined in a more user friendly manner).
However the actual complexity is pretty small, and this not only fully implements everything that gmp_printf() can do for that variable (safely), but also implements grouping.
This example is available here (7.13KB, Last modified: Fri Mar 5 17:46:26 2004).
The above implements the equivalent of "%d", however with a couple of 1 line changes "%x" and "%o" can be done (and they can be signed, if desired). For instance, here is just the custom formatter callback code to implement all three...
This example is available here (9.88KB, Last modified: Tue Jul 26 04:48:22 2005).
This is a mapping from std. C functions operating on C-style strings to Vstr string functions...
Common C string functions | Vstr string functions |
memcpy | vstr_add_buf |
memmove | vstr_add_vstr |
memmove | vstr_sub_vstr |
memmove | vstr_mov |
strcpy | vstr_add_cstr_buf |
strncpy | vstr_add_cstr_buf |
strcat | vstr_add_cstr_buf |
strncat | vstr_add_cstr_buf |
memcmp | vstr_cmp_buf |
strcmp | vstr_cmp_cstr_buf |
strcmp | vstr_cmp |
strncmp | vstr_cmp_buf |
strncmp | vstr_cmp |
strcasecmp | vstr_cmp_case_cstr_buf |
strcasecmp | vstr_cmp_case |
strncasecmp | vstr_cmp_case_buf |
strncasecmp | vstr_cmp_case |
strncmp | vstr_cmp_cstr_buf |
strcoll | N/A |
strxfrm | N/A |
memchr | vstr_srch_chr_fwd |
strchr | vstr_srch_chr_fwd |
strnchr | vstr_srch_chr_fwd |
strrchr | vstr_srch_chr_rev |
strtok | vstr_split_chrs |
memset | vstr_add_rep_chr |
memmem | vstr_srch_buf_fwd |
strstr | vstr_srch_cstr_buf_fwd |
strspn | vstr_spn_chrs_buf_fwd |
strcspn | vstr_cspn_chrs_buf_fwd |
sprintf | vstr_add_sysfmt |
strlen | ->len (member variable, also always passed to functions) |
This is a mapping from C++ std::string functions to Vstr string functions...
Common C++ string functions | Vstr string functions |
append | vstr_add_buf |
append | vstr_add_rep_chr |
append | vstr_add_vstr |
insert | vstr_add_buf |
insert | vstr_add_rep_chr |
insert | vstr_add_vstr |
replace | vstr_sub_buf |
replace | vstr_sub_rep_chr |
replace | vstr_sub_vstr |
substr | Fundamental part of Vstr |
find | vstr_srch_cstr_buf_fwd |
find | vstr_srch_buf_fwd |
find | vstr_srch_chr_fwd |
find | vstr_srch_vstr_fwd |
find_first_of | vstr_srch_cstr_chrs_fwd |
find_last_of | vstr_srch_cstr_chrs_rev |
find_first_not_of | vstr_csrch_cstr_chrs_fwd |
find_last_not_of | vstr_csrch_cstr_chrs_rev |
rfind | vstr_srch_cstr_buf_rev |
rfind | vstr_srch_buf_rev |
rfind | vstr_srch_chr_rev |
rfind | vstr_srch_vstr_rev |
erase | vstr_del |
resize | vstr_sc_reduce |
compare | vstr_cmp |
compare | vstr_cmp_cstr |
swap | vstr_mov |
== | vstr_cmp_eq |
== | vstr_cmp_cstr_eq |
c_str | vstr_export_cstr_ptr |
data | vstr_export_cstr_ref |
reserve | vstr_make_spare_nodes |
assign | vstr_sub_vstr |
copy | vstr_export_cstr_buf |
copy | vstr_export_cstr_ptr |
copy | vstr_export_cstr_malloc |
copy | vstr_dup_vstr |
length | ->len (member variable, also always passed to functions) |