Little wonders of C++ (7)

C++ is transforming into a better language, that is even more powerful than before and much easier and robust to write.

Already six posts have been made, what could possible still be there? This issue will contain exciting known and probably unknown (or at least less used) features of the STL. The STL itself is already a C++ wonder. After all it contains very well performing code that is as flexible as it could be.

With the recent updates to the C++ language, most notably the C++11 version, we got additional functionality and performance boosts. Our programs have become more reliable and faster, due to the use of move constructors, shared pointers and iterators. Also the C++14 release brings even more features. With the upcoming versions we expect another set of highlights such as an improved standardized IO API.

This time we will exclusively focus on STL features. We discuss regular expressions, formatting with the STL, improved time options, as well as complex and rational numbers.

STL

The STL itself is quite remarkable. C++ was slowly gaining momentum, when the STL was released. This was kind of a killer argument for using C++. It is also a good point in favor of standard libraries in general, since such libraries not only give functionality, but also express a standard style and usage for a language.

Following a standardized way of writing code is always a good thing. It encourages exchange between programmers and makes communication easier. A standard library proposes a standard way. It tells authors how naming should look like and how APIs could be used.

The STL contains exclusively data structures. It offers a lot of typedefs to simplify usage. Why should we always use basic_string<char> if this is the solution we (nearly) always want to use? Therefore having a direct string offering is useful. On the other side it is really advantageous to be flexible and have a template instead of a fixed version. For all we know real characters are not fixed 8-bit, but up to 32-bit. If we want to use a certain encoding we better have a special kind of string variant. The STL does already offer this variant, by offering us a templated container. Why shouldn't we write our own string variant? Performance is key here, and creating a string type that matches the speed of the one offered by the STL is hard. One optimization is the stack placement of the contained data, if the string is shorter than a certain number of characters.

There are many reasons why we want to use STL types. Are there reasons to avoid the STL variants? The only one I can think of (besides requiring a special API / structure or very special needs) is cross-compiler. Some compilers (on some platforms) may be delivered with a different STL - at least feature-wise. An example would be the Intel Compiler. It (still) does not offer a shared_ptr<T> type, even though the STL of C++11 standardizes it. However, even in such cases I would not exclude the STL variant.

// Instead of #include <memory>
#include "shared_ptr.h"

/* ... */

With the contents of shared_ptr.h being as follows:

#pragma once
#ifdef __INTEL_COMPILER
// The Intel compiler does not have an implementation ready.
namespace std {
	/* Our implementation of a shared_ptr */
}
#else
// Otherwise (clang, GCC) we can just use the memory header.
#include <memory>
#endif

This way we use the very well-tested, highly performing STL variant if we do not encounter a compiler that does not know about the shared_ptr template.

Regular expressions

Ah, regular expressions! The formal specification of regular expressions should be familiar to every programmer. Why? It is super useful to use regular expressions for searching or replacing text snippets in a single or a multitude of files. Also regular expressions may be used for filtering, e.g., to select files depending on their name without specifying a name or a list of names.

It is also a good exercise to implement a regular expression evaluator. Nevertheless, the preferred way is to use an already existing, well-tested and flexible solution. Many frameworks contain such a solution. In the .NET-Framework we have the Regex class. Boost offers us the basic_regex<T> template, which has already defined typedefs a la regex (which uses char as elementary currency).

Now C++11 also specifies a regular expression evaluation engine contained in the STL. As with many other new types offered in the STL, the Boost variant has been considered as a role model. Let's have a look at an example:

#include <iostream>
#include <iterator>
#include <string>
#include <regex>

using namespace std;
 
int main() {
    string s = "Some people, when confronted with a problem, think "
        "\"I know, I'll use regular expressions.\" "
        "Now they have two problems.";
 
    regex self_regex("REGULAR EXPRESSIONS", regex_constants::ECMAScript | regex_constants::icase);

    if (regex_search(s, self_regex)) {
        cout << "Text contains the phrase 'regular expressions'\n";
    }
 
    regex word_regex("(\\S+)");
    auto words_begin = sregex_iterator(s.begin(), s.end(), word_regex);
    auto words_end = sregex_iterator();
 
    cout << "Found " << distance(words_begin, words_end) << " words\n";
 
    constexpr int N = 6;
    std::cout << "Words longer than " << N << " characters:\n";

    for (sregex_iterator i = words_begin; i != words_end; ++i) {
        smatch match = *i;
        string match_str = match.str();

        if (match_str.size() > N)
            cout << " " << match_str << endl;
    }
 
    regex long_word_regex("(\\w{7,})");
    string new_s = regex_replace(s, long_word_regex, string("[$&]"));
    cout << new_s << '\n';
}

The iterator in the previous example is only shown for completeness. Usually we would prefer the C++11 iterator loop. The new regular expression library contains everything starting from basic match iterators to direct replace and search functions.

Formatting

In C everything seemed easy. One calls the printf() function and puts in a string, that contained special characters denoted to formatting. If we actually want to use these characters, we would be required to use another special character - just to escape them. That already does not sound so delightful, especially since the string needs to be inspected, which makes the printf() function slower than more direct variants such as puts(). The downside is that the latter function also includes a newline character at the end.

There are two things we are missing from printf(). We miss flexibility and we miss robustness. What if we output a string, that has been generated from user input? We have to be very careful in a situation like that. The best way is to never directly output user input, but only via a formatting flag, i.e. we have:

printf(user_string); // no, no!
printf("%s", user_string); // that's the way

In C++ everything got better. A unified stream model has been developed, which has two special kinds of streams for console interaction: std::cout for writing to the console and std::cin for reading from the console. Now we just use the bit-shift operator to pipe things in the right direction (from or to our stream). But one thing seems to be missing ... formatting!

With printf() it was pretty easy to specify what type should be used with how many digits, leading and trailing whitespaces and a little bit more. But, and that is important, the possibilities where limited to the ones specified by the compiler / standard library.

With C++ we can specify our own formattings. Also there are much more formattings out of the box - and they pretty much document themselves by using obvious names. These formattings are just elements that take in a value (or all incoming values) on the right side, modify and forward them to the stream.

Let's see an example of such a formatting:

#include <iostream>
#include <iomanip>
using namespace std;

int main() {
    cout << setw(10) << "ten" << "four" << "four" << endl;
}

The above sample will print out ten fourfour. The formatter, setw, is applied before passing in "ten". The other strings are not modified. Therefore the previously seen formatter is only modifying the next instance.

There are other formatters, such as setprecision. This one will change the precision of the rest. An example:

#include <iostream>
#include <iomanip>
using namespace std;

int main() {
    cout << setprecision(3) << "2.71828" << "1.412" << endl;
}

Now we receive 2.71 1.41, i.e. 3 significant digits. Fine - but how could we write our own formatters? It is actually not really hard. Let's see an illustrative example, of a formatter that puts quotes around the next incoming element.

using namespace std;

struct quoting_proxy {
public:
    explicit quoting_proxy(ostream& os)
    	: _os(os) {
	}

    template<typename T>
    friend std::ostream& operator<<(quoting_proxy const& q, T const& rhs) {
        return q._os << rhs;
    }

    friend std::ostream& operator<<(quoting_proxy const& q, string const& rhs) {
        return q._os << "'" << rhs << "'";
    }

    friend std::ostream& operator<<(quoting_proxy const& q, char const* rhs) {
        return q._os << "'" << rhs << "'";
    }

private:
    ostream& _os;
};

struct quoting_creator { } quote;

quoting_proxy operator<<(ostream & os, quoting_creator) {
    return quoting_proxy(os);
}

int main() {
    cout << quote << "hello" << endl;
}

That is not too hard to understand. We prepare a proxy to do the actually magic. This proxy still needs to contain the stream object. Additionally we create an instance called quote of a dummy structure, which defines an operator that triggers the object creation of our proxy. That's it!

Complex and rational numbers

A great thing about the STL is the number of algorithms and data structures provided. Not only algorithmically, but also numerically this makes sense. For instance the complex<T> template already contains everything we require to deal with complex numbers.

Here it makes sense again to use a typedef.

#include <iostream>
#include <complex>
typedef std::complex<double> cmplx;

int main() {
	using std::cout;
	using std::endl;
	cmplx a(1.0, 0.0);
	cmplx b(0.0, 1.0);

	cout << "a + b = " << (a + b) << endl;
	cout << "a - b = " << (a - b) << endl;
	cout << "a * b = " << (a * b) << endl;
	cout << "a / b = " << (a / b) << endl;
}

There are many possible ways and opportunities where complex numbers are useful. But C++ offers even more. With the C++11 STL we also got rational numbers. At least as a compile-time computed quantity.

The magic therefore lies in a templated type, that computes its quantities therefore at compile-time, yielding a runtime constant. Let's see a very simple example:

#include <iostream>
#include <ratio>
typedef std::ratio<2, 3> two_third;
typedef std::ratio<1, 6> one_sixth;

int main() {
	using std::cout;
	using std::endl;
    typedef std::ratio_add<two_third, one_sixth> sum;

    cout << "2/3 + 1/6 = " << sum::num << '/' << sum::den << endl;
}

Of course we could debate how useful a compile-time constant is. This discussion could be combined with the std::array<T, N> container. Nevertheless, its better to have it and not use it, than requiring a missing feature.

Also the STL uses the ratio type a lot. Some standard units, such as giga have been implemented using our rational type. The definition of giga reads:

typedef std::ratio<1000000000, 1> giga;

It is also used in the chrono header, which contains the STL's time types.

Time

The STL offers the chrono header, which contains a flexible collection of types that track time with varying degrees of precision. As already mentioned, the degree of precision is set via ratio types. Most notably we have three kinds of types in there:

Time points
Durations
Clocks

There are some typedefs and more, however, we are mostly interested in all the available options via a simple time measurement class. Our design will be close to the one provided by the Stopwatch, known from the .NET-Framework. It should be noted that chrono also gives us access to the C-style date and time structures.

#pragma once
#include <chrono>

template<typename TElapsed>
class Stopwatch final {
    typedef std::chrono::high_resolution_clock ClockType;
#ifdef __INTEL_COMPILER
    typedef std::chrono::microseconds DurationType;
#else
    typedef std::chrono::nanoseconds DurationType;
#endif
    
public:
    Stopwatch() :
        _started(false),
        _offset() { 
    }

    std::chrono::time_point<ClockType> elapsed() {
        if (_started)
            _end = ClockType::now();

        return std::chrono::time_point<ClockType> { _end - _start + _offset };
    }

    TElapsed elapsed_nanoseconds() {
        if (_started)
            _end = ClockType::now();

        const auto delta = _end - _start + _offset;
        const auto ns = std::chrono::duration_cast<std::chrono::nanoseconds>(delta);
        return static_cast<TElapsed>(ns.count());
    }

    TElapsed elapsed_milliseconds() {
        if (_started)
            _end = ClockType::now();

        const auto delta = _end - _start + _offset;
        const auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(delta);
        return static_cast<TElapsed>(ms.count());
    }

    TElapsed elapsed_seconds() {
        if (_started)
            _end = ClockType::now();

        const auto delta = _end - _start + _offset;
        const auto s = std::chrono::duration_cast<std::chrono::seconds>(delta);
        return static_cast<TElapsed>(s.count());
    }

    bool is_running() {
        return _started;
    }

    void start() {
        if (!_started) {
            _started = true;

            if (_start != _end)
                _offset += _end - _start;

            _start = ClockType::now();
        }
    }

    void stop() {
        if (_started) {
            _started = false;
            _end = ClockType::now();
        }
    }

    void reset() {
        _end = ClockType::now();
        _start = _end;
        _offset = DurationType { };
    }

    void restart() {
        reset();
        _started = true;
    }

    static Stopwatch start_new() {
        Stopwatch sw { };
        sw.start();
        return sw;
    }

private:
    bool _started;
    ClockType::time_point _start;
    ClockType::time_point _end;
    DurationType _offset;
};

Not much magic going on here. Nevertheless unfortunately we need to include a compiler-switch again. Intel's compiler seems to have a bug (?!). We cannot compile this code with std::chrono::nanoseconds ratio being considered. According to the compiler we a way below the resolution. In a way that is true, however, since gcc and others are able to compile this code, I wonder who is the one to blame. Why introduce a resolution (std::chrono::nanoseconds), when it can't be handled anyway?

Hopefully such annoying behaviors will be eliminated in the near future. The previous example also shows the most interesting properties of the chrono header in action. By default we would use the Stopwatch<T> with the integer type to only get full seconds, nanoseconds, ... back from the elapsed getters.

Created 9/19/2014 1:57:43 PM +00:00. Last updated 10/7/2014 8:59:56 AM +00:00.

Little wonders of C++ (7)

STL

Regular expressions

Formatting

Complex and rational numbers

Time

References

Sharing is caring!