2011-01-21

How to implement the C printf function in C++ in a typesafe way?

This blog post explains how to implement the C printf() function in C++ in a typesafe and convenient way.

The most important advantage of the ANSI standard C printf() function are its compactness and expressive power. For example: printf("The %s answer is %05d.\n", "best", 42); means: take the next argument, treat it as a null-terminated string (const char*), and insert it. The format specifier %05d means: take the next argument as a signed integer, pad it with leading zeros to at least 5 digits, and insert it. Please note that printf() is not typesafe: the format specifier has to match the type of the argument. For example, %d wouldn't work for a long long (but %lld would be needed), and %d or %s wouldn't work for a double (but %g would be needed, and %Lg would be needed for a long double). On a mismatch, some compilers would indicate a compile error, others would generate misbehaving (i.e. with undefined behavior) code.

There is no equally expressive message formatting functionality in C++ with a similarly compact syntax. Of course, printf() as it is can be used in C++, but it isn't integrated conveniently with the C++ standard library: e.g. it's not possible to insert an std::string, it's not possible to write to an std::ostream, and it's not possible to append to an existing std::string. C++ streams (e.g. cout << "The " << "best" << " answer is " << 42 << ".\n";) bring type safety (e.g. no matter which integer, floating point or string type 42 and other arguments have, they will be inserted properly), but they don't provide compact and powerful formatting capabilities (e.g. how to specify to pad 42 with zeros to at lest 5 digits).

Would it be possible to combine the compactness and expressive power of printf() with the typesafe behavior of C++ streams? It isn't too hard to create a class named Print and implement an overloaded operator<< so that Print("The %s answer is %05d.\n") << "best" << 42 would work in a typesafe way (raising a run-time error reliably if the type of the argument doesn't match the format specifier, and converting the the size and type of the argument if needed, e.g. %d would work for both int and long long). But is it possible to do it while keeping the original syntax, like this: Printf("The %s answer is %05d.\n", "best", 42);?

Yes, that's possible, using two tricks: overloading operator, (the comma operator), and using a GCC feature of macro expansion with any number of arguments. The macro Printf would expand so that Printf("The %s answer is %05d.\n", "best", 42); expands to (C("The %s answer is %05d.\n"), "best", 42);, which is equivalent to operator,(operator,(C("The %s answer is %05d.\n", "best"), 42);. Here C is a helper class, who has a constructor C(const char* fmt) for the format string, and the relevant const C& operator,(const C&, const char*) and const C& operator,(const C&, int) comma operators are overloaded to do the actual printing, consuming a few more characters in the meantime from the format string, and using that format specifier to format the argument. Type safety is provided by the built-in overloading features of the C++ language (i.e. a different operator, is called based on the type of the argument).

One more trick is needed to get an compile-time error when printing of values of unsupported type is attempted, e.g. class Bad {}; Printf("%s", Bad());. By default, since there is no operator, overridden for the class Bad, the default comma operator is used silently, which is quite useless (compute the first argument, discard it, compute the second argument, return it), and effectively prevents the rest of the arguments from being printed. The solution here is to define template<typename T>void operator,(const C&, T t) { ... }, which would match all types (which are not matched by the non-template version of operator,), and if we write ... so that it produces a type error, then the problem is solved.

The source code of such a typesafe printf() implementation for C++ is available at http://code.google.com/p/pts-mini-gpl/source/browse/#svn/trunk/pts-printf. For the convenience of the programmer, it implements all these:

int Printf(const char* fmt, ...);  // To stdout.
int Printf(FILE*, const char* fmt, ...);
int Printf(std::ostream&, const char* fmt, ...);
int Printf(std::ostringstream&, const char* fmt, ...);
int Printf(std::ofstream&, const char* fmt, ...);
int Printf(std::string*, const char* fmt, ...);  // Append to string.
std::string SPrintf(const char* fmt, ...);

Please note that the implementation is not optimized for speed. Most probably it's not possible to implement a typesafe version of printf() which is also as fast.

3 comments:

Unknown said...

Did you know about http://boost.org/doc/libs/1_45_0/libs/format?

pts said...

@Timo Savola: Thanks for mentioning the Boost Format Library. The advantage of the design in my blog post over the Boost Format Library is that my design uses the original C printf() syntax.

Balázs Szabó said...

How it is typesafe if it does not check that the parameter type actually matches the defined type in the string?