Security with string APIs

If you are writing a program in C without a a real string API then it is likely your program will end up on a list like this, and all programs need to care about security and/or reliability. As a somewhat graphic example of this, consider the following two groups of programs found on many Linux machines...

vsftpd wu-ftpd
postfix sendmail
djbdns bind
And-httpdLPRng
cups
nfs
vixie-cron
squid
apache-httpd
openssh

...the left column all have had zero remote buffer overflow attacks, three of the four have had "security guarantees" where the authors were so sure they were secure money was offered for anyone who could prove otherwise. They all consistently use a dynamic string API.

While the right column have all had at least one remote buffer overflow and try to use just the standard C-style string.h functions, and possibly a couple of helpers in the vein of strdup() ... this is far from enough.

The "common" reasons for not using a string library include: 1) You "just have to be careful" and you'll be alright. 2) It is too slow to use one.

The first argument just ignores reality, the squid maintainers are good but ... they still missed one, the apache team are careful and have code inspections, but ... they still missed one, and in neither case was it "obvious" by looking at the code that there would be a problem. For a list of recent security bugs against Red Hat Linux, and an indication of how many wouldn't exist with a string API see this page.

Humans just don't do that well with repeated tasks, in fact I'd say that given a formula of "do X marginally difficult thing Y times" X has to get smaller exponentially as Y gets bigger ... and when dealing with string types in code Y gets very big. The only tenable solution is to make X be 0. The way to do that is to have all of your string operations go through an API that guarantees you can't overflow buffers. This means that for the API X is comparatively large but Y, the problem of not overflowing a buffer, is reduced to 1. Whereas for each call to the API (which is a lot so Y is big), X is zero.

So given that you want a good string API, then the quickest/safest way to get one is just to reuse one that is known to do what you want. There is a good list on this page. However there are certain things that you should look for when evaluating which library to choose (or possibly writing your own, although I'd heavily recommend against that). Note that, as I said on the string library API comparison page security is somewhat subjective so you must think about your situation. It's possible that IO is much less important to you than having a simple sprintf that always aborts() on '%n'. Or that sprintf won't be used, and so that important to you.

The speed argument is discounted pretty well by the information here.

Security relevant things to look for in a string library API

The standard C-style string functions, and their many problems

The standard C string functions have a number of problems, some of which you'll see above as things you should watch for when evaluating other string APIs and some of which are somewhat "unique" to the standard C functions.

Problem 1: Copying inside a string.

ISO 9899:1999 says this in the last sentence about memcpy() (at 7.21.2.1)...

If copying takes place between objects that overlap, the behavior is undefined.

...where "undefined" is std. speak for broken. The safe way to do this is seen in the last sentence about memmove() (at 7.21.2.2)...

Copying takes place as if the n characters from the object pointed to by s2 are first copied into a temporary array of n characters that does not overlap the objects pointed to by s1 and s2, and then the n characters from the temporary array are copied into the object pointed to by s1.

...both of which most C programmers are at least vaguely aware of, due to the two functions doing the same thing apart from that single difference. However a lot of C programmers aren't aware that strcpy(), strncpy(), strcat(), strncat() and sprintf() all follow the memcpy() behavior for overlapping objects. So the following code is broken...


/* broken example ... common mistake */
char *s1;
...
strcpy(s1, s1 + 1); /* chop the first character from the C string */

...as is the following...


/* broken example ... fairly common mistake */
char *s1;
int X;
...
sprintf(s1, "%s:%d", s1, X); /* append a number */

...it also means that there is no way to copy n bytes within a string using the std. string functions. You have to resort to calculating the right lengths and calling memmove().

If that wasn't enough, strncpy() the std. function that most C programmers use to safely copy no more than N characters to a string is all but useless in all conditions. Consider...


#define SZ 4
char s1[SZ];

strncpy(s1, "abcd",   SZ); /* 1 */
strncpy(s1, "abc",    SZ); /* 2 */
strncpy(s1, "a",      SZ); /* 3 */

...in case 1 strncpy() copies {'a', 'b', 'c', 'd'}, which means that the string isn't terminated and will almost certainly cause problems very soon, even strncat() doesn't work in this case and that's described on the next page. In case 2 strncpy() copies {'a', 'b', 'c', '\0'}, which is exactly what memcpy() would have done ... but it does it slower. In case 4 strncpy() copies {'a', '\0', '\0', '\0'} which is a nice 100% overhead, that is useless to pretty much everybody. Of course normally the array have a much bigger size than 3, so the overhead can be 25,500% (a 16 byte filename into a PATH_MAX array).

The problem in case 1 means that it's common to see code calling strncpy() do either...


char s1[SZ];
...
strncpy(s1, external_data, SZ - 1);
s1[SZ - 1] = 0;

...or...


char s1[SZ + 1];
...
strncpy(s1, external_data, SZ);
s1[SZ] = 0;

...or...


static char s1[SZ + 1];
...
strncpy(s1, external_data, SZ);

All of which are a source of bugs, when one of the values is even slightly wrong. The second is particularly sad, as the reason it's done that way is to improve performance (using an interface that can cause over 25,000% overhead, in the common case).

Note that the last example listed above...


static char s1[SZ + 1];
...
strncpy(s1, external_data, SZ);

...works by "getting rid" of the problem (not remembering to write NIL terminator) by "knowing" that the last byte will always be a NIL and so not overwriting that. However this is a very big trap for those that don't intimately know what is happening, due to...


/* broken example */
char s1[SZ + 1];
...
strncpy(s1, external_data, SZ);

...or...


/* broken example */
char *s1 = malloc(SZ + 1);
...
strncpy(s1, external_data, SZ);

...being errors.

It's also worth noting that two very recently introduced interfaces have been included in a number of systems: strlcpy() and strlcat(). Neither of these are specified to fix the copying inside a string problem, and both can cause different problems as they assume that all input is NIL terminated. However they aren't specified by anything more than their source, which is different for a few of the implementations. This has already led to the Solaris version acting differently than the original OpenBSD version, and a broken implementation in an O'Reilly article.

Problem 2: sprintf() and snprintf() etc.

The original specification for the C run time only specified the function sprintf() as an easy way to put formatted data into a C-style string. The obvious problem was that you always have a limited amount of space allocated, so to make sure you don't overflow the buffer the "extension" snprintf() was added ... or I should say at least 3 different implementations of an extension called snprintf() was added. The differences centered around what the function should return. The choices being: 1) The value sprintf() would return, Ie. the space needed. 2) The value sprintf() would return or the limit, whichever is smaller. 3) The value sprintf() would return if it is smaller than the limit, and -1 if it is not. All of these different return values have to be dealt with by a portable program, or you can have buffer overflows (esp. in the second case). This is especially a problem due to the fact that a lot of uses of snprintf() are so that people can implement a function that allocates the correct amount of storage for sprintf() and then call that, because doing blanket limiting is a bad idea. The ISO 9899:199 standard chose the first choice (the value sprintf() would return), so now we just wait 5 years for most systems to have the standard semantics.

This led to implementations offering an extension called "asprintf()" which will allocate the correct amount of space and sprintf() into it. However the interface had to be created to be able to return an error separate from the obvious "could not allocate memory" (mainly due to wchar_t conversions, which hardly anyone cares about) defying all previous experience with snprintf() and all expectations of the average programmer the version of the interface shipped with most Linux distributions is both different to every other implementation and is the only interface that returns a pointer to memory that you cannot use (this has been changed in a CVS commit on 2004-06-06). For instance the following code works everywhere but on those glibc based Linux systems, where it is a security problem...


/* broken code, on Linux only */
char *s1;
int off;

asprintf(&s1, "%d:%n%s,", strlen(data), &off, data);
if (!s1) goto error;

...
if (strlen(other_data) == strlen(data))
  strcpy(s1 + off, other_data);
else
{
  free(s1);
  asprintf(&s1, "%d:%s,", strlen(other_data), other_data);
}

...due to a strcpy() buffer overflow and an invalid value passed to free().

i18n causes more joy with sprintf() functions

So here's a quick quiz, what does "ret" contain in the following code...


char buf[4];
int ret = snprintf(buf, sizeof(buf), "%.*s", 14, "\x80\x80\x80\x80 123456789");

...of course after the discussion about what the return value should be for overflow you'll have said depends on the implementation. But let's assume that in the following code "ret" is 14 (Fourteen)...


char buf[4];
int ret = snprintf(buf, sizeof(buf), "%s", "\x80\x80\x80\x80 123456789");

Ah, you say "it follows the ISO 9899:1999 behaviour" so the value of ret must be 14 (Fourteen). However, you'd be wrong, the value returned is -1 (Minus one) on Linux if you are in a UTF-8 locale due to the multibyte "issue". More generally ISO 9899:1999 says:

7.19.6.5/3
The snprintf function returns the number of characters that would have been written had n been sufficiently large, not counting the terminating null character, or a negative value if an encoding error occurred. Thus, the null-terminated output has been completely written if and only if the returned value is nonnegative and less than n.

Also note that the std. declares that mixing use and non use of i18n parameter numbers is "undefined" in std. C ... which is great for the libc developers, but not so much fun from a security point of view.

You also have to remember that there is no way to tell what version of a sprintf() routine you are going to be using, or if it is as bad a quality as say the dietlibc implementation. So you have all of the problems that other string APIs have with re-implementing a printf like function, but no way to tell which you are using (and test against them) when you write it.


James Antill
Last modified: Fri Oct 21 19:26:20 EDT 2005