Tutorial

Vstr is a string library designed to work easily and efficiently in network applications. This means that it is designed to perform IO operations in a non-blocking fashion, and allow the programer to take data from various places and join them in an IO operation without having to do any copies.

Often IO primitives are assumed to be blocking (Ie. all possible data will be done before IO operation completes). This does make "Hello World" type applications somewhat simpler, however networked applications have different requirements. However, don't be detered functions will be introduced for those simple applications to make them more readable.

Also note that all error checking is included in every example, it may make the examples somewhat easier to read if it wasn't included ... however including error checking is what the code must look like in a real application.

Simplest Hello World

Here is about the most simple Vstr application, this is a single function that prints "Hello World" on a single line in a POSIX environment...

/* hello world - Self contained, using a single piece of data at
 *               initialisation time (no data copying) */ 

#define VSTR_COMPILE_INCLUDE 1
#include <vstr.h>
#include <errno.h>  /* errno variable */
#include <err.h>    /* BSD/Linux header see: man errx */
#include <unistd.h> /* for STDOUT_FILENO */

int main(void)
{
  Vstr_base *s1 = NULL;

  if (!vstr_init()) /* initialize the library */
    err(EXIT_FAILURE, "init");

  /* create a string with data */
  if (!(s1 = vstr_dup_cstr_buf(NULL, "Hello World\n")))
    err(EXIT_FAILURE, "Create string");

  /* output the data to the user --
   *    assumes POSIX, assumes blocking IO but should work without */
  while (s1->len)
    if (!vstr_sc_write_fd(s1, 1, s1->len, STDOUT_FILENO, NULL))
    {
      if ((errno != EAGAIN) && (errno != EINTR))
        err(EXIT_FAILURE, "write");
    }

  /* cleanup allocated resources */
  vstr_free_base(s1);

  vstr_exit();

  exit (EXIT_SUCCESS);
}
Last modified: Wed Jan 14 08:19:40 2004

...however this example is somewhat too simplistic because, normally, a Vstr string contains multiple nodes of informtion that are treated internally as a single entity.

Hello World Header

So first we'll clean up the above example to move all the header inclusion into one file and create some helper functions to simplify the actually "hello world" code. Here's a quick overview of the functions moved into the header file...

This example is available here (2.07KB, Last modified: Mon Sep 22 15:07:00 2003).

Simple Hello World, using the helper functions

Now using the above header file, we can re-write the initial example in a much more readable form...

/* hello world - Using a single piece of data (no data copying) */

#include "ex_hello_world.h" /* helper functions */

int main(void)
{
  Vstr_base *s1 = hw_init();

  vstr_add_cstr_buf(s1, s1->len, "Hello World\n");

  if (s1->conf->malloc_bad)
    errno = ENOMEM, err(EXIT_FAILURE, "Add string data");

  while (io_put(s1, STDOUT_FILENO) != IO_NONE) {}

  exit (hw_exit(s1));
}
Last modified: Mon Sep 22 14:42:56 2003

Hello World, using multiple sources of data

We can now alter the above to still print "Hello World" on a line, but have the data come from multiple sources which is then stored internally in mutliple nodes (remember there is no copying of data still). However can treat this all as a single string from our point of view. Although this example is obviously contrived, this is much more representative of what a networked application looks like...

/* hello world - multiple sources of data (still no data copying) */

#include "ex_hello_world.h" /* helper functions */

int main(void)
{
  Vstr_base *s1 = hw_init();

  vstr_add_cstr_ptr(s1, s1->len, "Hello");
  vstr_add_cstr_ptr(s1, s1->len, " ");
  vstr_add_cstr_ptr(s1, s1->len, "World\n");

  /* we are checking whether any of the above three functions failed here */
  if (s1->conf->malloc_bad)
    errno = ENOMEM, err(EXIT_FAILURE, "Add string data");

  /* loop until all data is output... */
  while (io_put(s1, STDOUT_FILENO) != IO_NONE) {}

  exit (hw_exit(s1));
}
Last modified: Mon Sep 22 14:44:56 2003

Complicated Hello World

This is the final Hello World example, this is the first one that actually copies some of the data for the string. It also shows how you can add data at any point in the string, and substitute data within the string. Note that when giving the position to add data to the string you give the position before the position you wish the start of the data to be at, and when giving the position/length for a section (or sub-string) the position given is included within the section.

For people familiar with C++ this works out to be the same way that C++ std::string encodes positions for adding data (Ie. insert()), but not for getting sections (C++ often explains it as having a 0 index'd string and data is added before the point given, Vstr does uses a 1 poition for the first byte as that means that appending data is always done at the current length ... and a position of 0 can be used as invalid for searching etc.).

Description of operationPositionLegnth
Prepend data to X0 (Zero)N/A
Append data to XX->len (Length of Vstr string)N/A
Last byteX->len (Length of Vstr string)1 (One)
First byte1 (One)1 (One)
Entire Vstr string1 (One)X->len (Length of Vstr string)
All of X, but the first and last bytes2 (Two)X->len - 2 (Length of Vstr string minus Two)
/* hello world - multiple pieces of data, includes substitution and
 *               inserting data into the middle of a string */

#include "ex_hello_world.h" /* helper functions */

int main(void)
{
  Vstr_base *s1 = hw_init();
  Vstr_ref *ref = NULL;

  vstr_add_cstr_ptr(s1, s1->len, "Hello");

  vstr_add_rep_chr(s1, s1->len, 'W', 5); /* add "WWWWWW" */

  if (s1->conf->malloc_bad)
    errno = ENOMEM, err(EXIT_FAILURE, "Add string data");

  /* substitute an 'o' for a 'W' */
  if (!vstr_sub_rep_chr(s1, strlen("HelloWW"), 1, 'o', 1))
    errno = ENOMEM, err(EXIT_FAILURE, "Substitute string data");

  /* substitute "WWW" for a "rld\n" -- */
  if (!vstr_sub_cstr_buf(s1, strlen("HelloWoW"), strlen("WWW"), "rld\n"))
    errno = ENOMEM, err(EXIT_FAILURE, "Substitute string data");

  if (!(ref = vstr_ref_make_ptr((char *)"XYZ ", vstr_ref_cb_free_ref)))
    errno = ENOMEM, err(EXIT_FAILURE, "Create data reference");
  /* now ref->ptr is "XYZ " */

  /* add space after "Hello", by skipping "XYZ" in reference */
  vstr_add_ref(s1, strlen("Hello"), ref, strlen("XYZ"), strlen(" "));

  vstr_ref_del(ref); /* delete our reference to the Vstr_ref */

  if (s1->conf->malloc_bad)
    errno = ENOMEM, err(EXIT_FAILURE, "Add string data");

  while (io_put(s1, STDOUT_FILENO) != IO_NONE) {}

  exit (hw_exit(s1));
}
Last modified: Mon Sep 22 17:12:18 2003

IO header file

This is a full header file needed to do simple non-blocking IO operations, it also puts into functions the common init and exit sections. This will be used by all of the following examples. The Simple GETOPT implementation isn't used for a while, so you can ignore that for now. A quick overview of the changes from the hello_world.h header file are...

And a quick review of the additions...

This example is available here (12.14KB, Last modified: Tue May 17 20:04:00 2005).

Unix cat command

This is the unix cat command, implemented with the help of the functions in the above ex_utils.h header file. This uses the same Vstr string for both input and output and uses non-blocking IO, both of which are done for efficiency.

If you have seen something like the "simple" cat in LAD this version looks much bigger. However one of the main reasons for this is that the LAD version has many bugs. The main problem is the lack of checking on the IO calls, this is most easily demonstrated by running it like so...

perl -e 'use Fcntl; fcntl(STDIN, F_SETFL, O_NONBLOCK); exec(@ARGV);' ./cat

...will cause the LAD version to exit immediatley due to an EAGAIN being returned from read. This problem also affects the LAD version being used in a blocking pipe due to the fact that write() isn't required to write() all it's data.

The LAD version also doesn't open() any files, which is significant functionality. So after fixing those bugs we get something that is much closer to the Vstr version and it still suffers from performance problems due to the need to block on input and output separately. It is possible to create a version using read, write and poll that would perform the same as the Vstr version ... however even the simplest method would have to implement it's own ring buffer which is very prone to error and would almost certainly make it bigger than ex_cat.c and ex_utils.h combined.

/* This is a _simple_ cat program.
 * Reads from stdin if no args are given.
 *
 * This shows how to use the Vstr library at it's simpelest,
 * for easy and fast IO. Note however that all needed error detection is
 * included.
 *
 * This file is more commented than normal code, so as to make it easy to follow
 * while knowing almost nothing about Vstr or Linux IO programming.
 */
#include "ex_utils.h"

#define CONF_USE_MMAP_DEF FALSE

/*  Keep reading on the file descriptor until there is no more data (ERR_EOF)
 * abort if there is an error reading or writing */
static void ex_cat_read_fd_write_stdout(Vstr_base *s1, int fd)
{
  while (TRUE)
  {
    int io_w_state = IO_OK;
    int io_r_state = io_get(s1, fd);

    if (io_r_state == IO_EOF)
      break;
    
    io_w_state = io_put(s1, STDOUT_FILENO);

    io_limit(io_r_state, fd, io_w_state, STDOUT_FILENO, s1);    
  }
}

static void ex_cat_limit(Vstr_base *s1)
{
  while ((s1->len >= EX_MAX_W_DATA_INCORE) || (s1->len >= EX_MAX_R_DATA_INCORE))
  {
    if (io_put(s1, STDOUT_FILENO) == IO_BLOCK)
      io_block(-1, STDOUT_FILENO);
  }
}

/* This is "cat", using non-blocking IO and Vstr for buffer space */
int main(int argc, char *argv[])
{
  Vstr_base *s1 = ex_init(NULL); /* init the library etc. */
  int count = 1; /* skip the program name */
  int use_mmap = CONF_USE_MMAP_DEF;  
  
  /* parse command line arguments... */
  while (count < argc)
  { /* quick hack getopt_long */
    if (!strcmp("--", argv[count]))
    {
      ++count;
      break;
    }
    else if (!strcmp("--mmap", argv[count])) /* toggle use of mmap */
      use_mmap = !use_mmap;
    else if (!strcmp("--version", argv[count]))
    { /* print version and exit */
      vstr_add_fmt(s1, 0, "%s", "\
jcat 1.0.0\n\
Written by James Antill\n\
\n\
Uses Vstr string library.\n\
");
      goto out;
    }
    else if (!strcmp("--help", argv[count]))
    { /* print version and exit */
      vstr_add_fmt(s1, 0, "%s", "\
Usage: jcat [FILENAME]...\n\
   or: jcat OPTION\n\
Output filenames.\n\
\n\
      --help     Display this help and exit\n\
      --version  Output version information and exit\n\
      --mmap     Toggle use of mmap() to load input files\n\
      --         Treat rest of cmd line as input filenames\n\
\n\
Report bugs to James Antill <james@and.org>.\n\
");
      goto out;
    }
    else
      break;
    ++count;
  }
  
  /* if no arguments are given just do stdin to stdout */
  if (count >= argc)
  {
    io_fd_set_o_nonblock(STDIN_FILENO);
    ex_cat_read_fd_write_stdout(s1, STDIN_FILENO);
  }
  
  /* loop through all arguments, open the file specified
   * and do the read/write loop */
  while (count < argc)
  {
    unsigned int ern = 0;

    if (use_mmap)
      vstr_sc_mmap_file(s1, s1->len, argv[count], 0, 0, &ern);

    if (!use_mmap ||
        (ern == VSTR_TYPE_SC_MMAP_FILE_ERR_FSTAT_ERRNO) ||
        (ern == VSTR_TYPE_SC_MMAP_FILE_ERR_MMAP_ERRNO) ||
        (ern == VSTR_TYPE_SC_MMAP_FILE_ERR_TOO_LARGE))
    { /* if mmap didn't work ... do a read/alter/write loop */
      int fd = io_open(argv[count]);

      ex_cat_read_fd_write_stdout(s1, fd);

      if (close(fd) == -1)
        warn("close(%s)", argv[count]);
    }
    else if (ern && (ern != VSTR_TYPE_SC_MMAP_FILE_ERR_CLOSE_ERRNO))
      err(EXIT_FAILURE, "add");
    else /* mmap worked */
      ex_cat_limit(s1);

    ++count;
  }

  /* output all remaining data */
 out:
  io_put_all(s1, STDOUT_FILENO);

  exit (ex_exit(s1, NULL));
}
Last modified: Thu Oct 21 20:48:09 2004

Unix nl command

This is somewhat like the "nl" unix command, this is implemented in much the same way as the cat command. However the data has to have something added to the start of each line before it can be output, so we now have two string objects: One for input and one for output. Note that as the data is "moved" from the input to the output string object, it isn't copied instead a reference is created and shared between the two strings.

This example is available here (6.07KB, Last modified: Fri Oct 15 16:21:07 2004).

Unix simple hexdump command

This is somewhat like the "hexdump" unix command, it also uses the same simple IO model used in the cat and nl commands. However the data is now output twice once as hex values, and a second time as characters (converting unprintable characters into '.' characters). So again we have two string objects: One for input and one for output. To get two copies of the data, we initially export the data to a buffer and then convert that to hex via vstr_add_fmt() (the printf like function). We then convert the data that is still in the string object so it is printable and move it from the input to the output string objcet. Note that as the data is "moved" from the input to the output string object, it is always copied even if it was a reference on input (Ie. mmap()ed).

One other thing that is new in the hexdump command is the use of the VSTR_FLAGXX() macro fucntion, this is a convienience feature for when you need to specify multiple flags at once.

This example is available here (5.36KB, Last modified: Mon Mar 14 23:46:21 2005).

Lookup a hostname and print the IP, a simple custom formatter example

Custom formatters are an exteremly useful feature of the Vstr string library, allowing you to safely implement one or more methods of printing a pointer to an arbitrary object. The most simple uses are to use the builtin custom formatters. This examples shows how you enable the IPv4 and Vstr custom formatters and then use how you can use them.

This example also includes vstr_sc_basename() which acts in a similar way to the POSIX basename function.

This example is available here (2.56KB, Last modified: Mon Feb 2 08:45:42 2004).

Factorials, GMP MPZ variables with custom formatters

So, printing ipv4 address is nice ... but the big benifit, with custom formatters, come with using them on types that you have to deal with a lot. This usually means types that you've defined yourself, or are defined in a library you are using. So this example will show you how to create your own custom formatters. I'll use the GNU multiple precision arithmetic library which is a well used library for creating arbitrary precision numbers (Ie. numbers that can represent any value). One of the annoying features of using this library is that it is non-trivial to easily turn these numbers into strings and/or output them as you would with a normal int/long/size_t/intmax_t/etc.

The GMP library has a set of printf like functions, and while you can create a limited length string, newly allocated string or create output to a file using these functions they fail the "Easy" test for a number of reasons...

The custom formatter for mpz_t is about 25 lines of code in this example, it could be a little less if you removed some features (Ie. supporting positive or negative values) or always got libgmp to allocate storage and then free it (or if some libgmp APIs were defined in a more user friendly manner).

However the actual complexity is pretty small, and this not only fully implements everything that gmp_printf() can do for that variable (safely), but also implements grouping.

This example is available here (7.13KB, Last modified: Fri Mar 5 17:46:26 2004).

The above implements the equivalent of "%d", however with a couple of 1 line changes "%x" and "%o" can be done (and they can be signed, if desired). For instance, here is just the custom formatter callback code to implement all three...

This example is available here (9.88KB, Last modified: Tue Jul 26 04:48:22 2005).

Convertion table for C functions to Vstr functions

This is a mapping from std. C functions operating on C-style strings to Vstr string functions...

Common C string functionsVstr string functions
memcpyvstr_add_buf
memmovevstr_add_vstr
memmovevstr_sub_vstr
memmovevstr_mov
strcpyvstr_add_cstr_buf
strncpyvstr_add_cstr_buf
strcatvstr_add_cstr_buf
strncatvstr_add_cstr_buf
memcmpvstr_cmp_buf
strcmpvstr_cmp_cstr_buf
strcmpvstr_cmp
strncmpvstr_cmp_buf
strncmpvstr_cmp
strcasecmpvstr_cmp_case_cstr_buf
strcasecmpvstr_cmp_case
strncasecmpvstr_cmp_case_buf
strncasecmpvstr_cmp_case
strncmpvstr_cmp_cstr_buf
strcollN/A
strxfrmN/A
memchrvstr_srch_chr_fwd
strchrvstr_srch_chr_fwd
strnchrvstr_srch_chr_fwd
strrchrvstr_srch_chr_rev
strtokvstr_split_chrs
memsetvstr_add_rep_chr
memmemvstr_srch_buf_fwd
strstrvstr_srch_cstr_buf_fwd
strspnvstr_spn_chrs_buf_fwd
strcspnvstr_cspn_chrs_buf_fwd
sprintfvstr_add_sysfmt
strlen->len (member variable, also always passed to functions)

Convertion table for C++ std::string functions to Vstr functions

This is a mapping from C++ std::string functions to Vstr string functions...

Common C++ string functionsVstr string functions
appendvstr_add_buf
appendvstr_add_rep_chr
appendvstr_add_vstr
insertvstr_add_buf
insertvstr_add_rep_chr
insertvstr_add_vstr
replacevstr_sub_buf
replacevstr_sub_rep_chr
replacevstr_sub_vstr
substrFundamental part of Vstr
findvstr_srch_cstr_buf_fwd
findvstr_srch_buf_fwd
findvstr_srch_chr_fwd
findvstr_srch_vstr_fwd
find_first_ofvstr_srch_cstr_chrs_fwd
find_last_ofvstr_srch_cstr_chrs_rev
find_first_not_ofvstr_csrch_cstr_chrs_fwd
find_last_not_ofvstr_csrch_cstr_chrs_rev
rfindvstr_srch_cstr_buf_rev
rfindvstr_srch_buf_rev
rfindvstr_srch_chr_rev
rfindvstr_srch_vstr_rev
erasevstr_del
resizevstr_sc_reduce
comparevstr_cmp
comparevstr_cmp_cstr
swapvstr_mov
==vstr_cmp_eq
==vstr_cmp_cstr_eq
c_strvstr_export_cstr_ptr
datavstr_export_cstr_ref
reservevstr_make_spare_nodes
assignvstr_sub_vstr
copyvstr_export_cstr_buf
copyvstr_export_cstr_ptr
copyvstr_export_cstr_malloc
copyvstr_dup_vstr
length->len (member variable, also always passed to functions)

James Antill
Last modified: Sun Jul 31 00:37:45 EDT 2005
Last regenerated: Sun Jul 31 00:38:00 EDT 2005