Little wonders of C++ (6)

C++ is transforming into a better language, that is even more powerful than before and much easier and robust to write.

The sixth edition of my series about (novel) C++ features starts with some classic features. We will explore how multiple inheritence can be tamed and why it is so important in C++. The topic of overloading operators will be discussed as well as why methods should be decorated noexcept.

Finally we will also have a short look at the new random number generators (sources and distributions), as well as the possibility of static assertations.

Multiple inheritence

Inheritence is one of the pillars of object-oriented programming. Often similarity between classes is considered to rely on inheritence. Most programming languages, however, forbid to inherit from multiple classes. Instead they provide a concept called interfaces, which are implementation-free classes. C++ does not know the interface concept. Instead it only knows classes and abstract methods. The latter will transform normal classes to abstract classes. A class that only contains abstract methods, i.e. no fields or other implementation-specific details, is therefore an interface from C++'s interface.

Hence it is obvious why C++ must allow multiple inheritence. It is the only way to also implement interfaces. As usual this feature does not only solve a problem, but also creates a responsibility for us programmers. It is easy to abuse multiple inheritence, which will usually lead to very interesting bugs and undefined behavior.

Let's consider a simple setup consisting of 4 classes. This setup leads to the so called diamand problem. Here we derive from a class, which will then in the end be used twice. The problem is the potential doubling of fields and methods of the root (A).

class A {
};
class B : public A {
};
class C : public A {
};
class D : public B, public C {
};

How can this be circumvented? Well, the obvious answer is to use a better architecture. This, however, is not always possible. The interface problem already tells us that we actually want to have architecture's like this, even though we may face the diamand problem in a similar fashion.

Nevertheless C++ also has a solution to the problem within the language. We can use a special kind of inheritence, called virtual inhertence. Even though we will use the keyword virtual, the process itself is similar to polymorphic methods. In the end the instance of class A will be placed in a lookup table, which will be used once the class D will instantiate B and C. Since both rely on A, A will be registered in the lookup table by B and obtained from the lookup table by C.

class B : virtual public A {
};
class C : virtual public A {
};
class D : public B, public C {
};

When does this make sense? It practically makes only sense if we anticipate such a problem. Therefore, if we want to program defensively, we should use it for interface-like classes. Otherwise the drawbacks for virtual inheritence make it practically avoidable since all classes will have to initialize all its virtual bases all the time. For example if A is a virtual base of B, and C derives from B, then it also has to initialize A itself. Additionally we have to use the more expensive dynamic_cast operation everywhere instead of just using a static_cast.

Operator overload

Overloading standard operators, such as +, - or * is possible in some programming languages. C# allows a subset of all operators to be overloaded, without restricting the types that participate in the operator calls. C++ makes things more flexible (and unfortunately complicated). We can basically overload all operators, however, we cannot overload operators for elementary types (one of the operands has to be a custom type).

There are amazing and annoying features in the operator overloading approach that is taken from C++. For instance we can overload the function call operator, (). In C# we cannot do it, even though delegates implicitely follow this path. On the other side, we can only provide one implementation of the index operator, [], which contains a single argument. There is no option for multiple arguments. Here we would be required to fall back to the function call operator.

Additionally there is the assignment operator, =. Even though there are situations, where overloading this operator seems natural and desired, we should be very cautious. The assignment operator is an integral part of every type and doing something nasty with its implementation is a well-defined way into undefined behavior or program crashes.

There are two kinds of operators: Operators that act on existing instance (provided as methods) and operators that consider instances as arguments (provided as functions). The first category contains operators such as +=, -=, or (), as well as []. The assignment operator can also be found in this group. The latter category is also containing the traditional operators such as *, + or -.

Depending on the type of operator we require a slightly different style. Let's assume we want to provide a custom += operator implementation.

class Complex {
public:
    Complex(double re, double im) :
        _re(re),
        _im(im) {
    }

    double re() const {
        return _re;
    }

    double im() const {
        return _im;
    }

    Complex operator+=(const Complex& other) {
        _re += other._re;
        _im += other._im;
        return *this;
    }

private:
    double _re;
    double _im;
};

Here we just modify the current instance. It should be said, that this is actually not required. Additionally we might provide a different implementation for a const version. Let's have a look at other operators:

// Method (of Complex)
double Complex::operator()() const {
    return sqrt(_re * _re + _im * _im);
}

// Function
Complex operator+(const Complex& a, const Complex& b) {
    return Complex(a.re() + b.re(), a.im() + b.im());
}

We should realize that these operators do not work with pointers. Here we need to use a cast first, or supply an overload with the pointer type. Nevertheless, we can also call these operators explicitely, which makes sense when using a pointer type. C++ does also call these functions explicitely. Therefore the following snippet,

Complex a(1.0, 0.0);
Complex b(0.0, 1.0),
auto c = a + b;

will be translated to the following code:

// ...
auto c = operator+(a, b);

The function call operator could be used with the variable c to compute the absolute value, i.e.

auto norm = c();

Operators need to be used wisely, as the might not always improve the code's readability. Some operators should be overloaded to provide code, that follows standard conventions. An example is overloading the bit-shift operator for stream operations. For example:

ostream& operator<<(ostream& out, const Complex& c) {
    out << "(" << c.re() << ", " << c.im() << ")";
    return out;
}

We actually won't have a look at the assignment operator, as actual use-cases are rare and special. However, what is interesting is the differentiation between the pre-and post-increment (or similar: decrement) operator.

// prefix
Complex& Complex::operator++() {
    ++_re;
    return *this;
}
 
// postfix
Complex Complex::operator++(int unused) {
    Complex result = *this;
    ++_re;
    return result;
}

Both are methods, however, the latter has a different signature. Not only does it return a copy of the original, it also has an additional parameter, which is usually unused and only available for differentiating the two versions.

Last but not least we also may define conversion operators to to other, usually elementary types. While casts to the current type are defined via the constructor overloads, the casts from the current type can be defined with these operators.

Let's see some code:

// no return type here, since conversion operator
operator Complex::int() const {
    auto c = *this;
    return (int)c();
}

Random numbers

Random numbers are important for a lot of interesting algorithms. Having a good random number generator (RNG) is therefore an important piece for every programmer. Traditionally neither C nor C++ had a state-of-the-art RNG. Instead we usually used some custom implementations provided by libraries such as Boost, the GSL or any other mathematical oriented framework.

The architecture that is used to supply us with a nice RNG is flexible. Instead of following a fixed path, such as having a class (e.g. called Random) with a set of methods (e.g. Next, NextDouble, ...) we find a clear seperation between the random number source and the random number distribution. This way, we can use various RNG implementations, without having to write some standard distributions (such as Gaussian, Exponential or Uniform).

The most important distributions are directly included in the STL. Additionally we find several implementations of RNGs. If we do not care about the specific implementation, we may just use the provided Mersenne-Twister implementation. This should be sufficient for most cases.

#include <random>

// ...

std::mt19937 rng;
std::uniform_real_distribution<double> dist(0.0, 1.0);
auto number = dist(rng);

The STL does provide the following random number generators. Engines are template classes for specific specializations:

linear_congruential_engine,
subtract_with_carry_engine, and
mersenne_twister_engine (e.g. mt19937)

Additionally the following distributions are available. Every distribution is a templated type, that requires the return type of a random variable to be specified. Caution is required, as for instance integer distributions (discrete) won't work with floating point types (and vice versa). The call operator is overloaded, such that obtaining a random number is just a matter of calling the distribution with the random number generator instance:

uniform_int_distribution,
uniform_real_distribution,
bernoulli_distribution,
binomial_distribution,
geometric_distribution,
negative_binomial_distribution,
poisson_distribution,
exponential_distribution,
gamma_distribution,
weibull_distribution,
extreme_value_distribution,
normal_distribution,
lognormal_distribution,
chi_squared_distribution,
cauchy_distribution,
fisher_f_distribution,
student_t_distribution,
discrete_distribution,
piecewise_constant_distribution and
piecewise_linear_distribution.

One more important reason to prefer the new random header over the traditional rand() is that it is now very clear and obvious how to make random number generation threadsafe: Either provide each thread with its own, thread-local engine, seeded on a thread-local seed, or synchronize access to the source object.

Noexcept

Throwing exceptions if anything went wrong is recommended. The compiler has to anticipate these situations, however, usually there is just not enough information to exclude such cases. If an external library is called, the compiler cannot detect if an exception could be thrown in the external method. Even though the performance is not influenced too negatively, we can still improve the optimization capabilities of the compiler, if we annotate special methods in such a way, to notify the compiler that no exception will be raised.

Let's see how this looks in code:

void foo() noexcept {
    // No exception in this code, or any called code
}

Usually the problem is that some external function is called, which does not carry the noexcept keyword. Sometimes these functions may in fact throw exceptions at some points. Since the noexcept attribute can only propagate from an atomic function, we cannot (obviously) use the keyword at calling functions.

This is also a good cross-check. The idea is similar to const variables or methods. If the variable would indeed be changed (or if the method would indeed modify the internal state), then a compiler error is raised, which indicates that we actually missed something in the code. Here we obtain an additional level of security.

Additionally the noexcept operator performs a compile-time check that returns true if an expression is declared to not throw any exceptions. It can be used within a function template's noexcept specifier to declare that the function will throw exceptions for some types but not others.

noexcept(foo());//true
noexcept(foo_without_noexcept());//false

We should use noexcept when it's obvious that the function will never throw. Even though we cannot realistically expect to observe a performance improvement after using noexcept, the increased freedom provided to the compiler to safely apply certain kinds of optimizations, should be enough motivation. Most compilers follow a no-penalty-if-no-exception-is-thrown handling method. Therefore not much would change on the machine code level of our code, although the binary size might be reduced by removing the handling code.

Using noexcept in constructors and assignment will likely cause the best improvements. Destructors are excluded as they'll already specify to not throw exceptions by default. It should be noted that noexcept checks are common in template code such as in the std types. For instance, std::vector does not use the class's move unless it's marked noexcept (or the compiler can deduce it otherwise).

Static assert

Sometimes we want to impose conditions that should be checked by the compiler. The most obvious conditions are type constraints. Here the compiler ensures that objects at runtime will be a certain type or a specialization of a given type. The whole construction works as long as we are not trying to impose type constraints on templates. Before C++11 this was not possible out-of-the-box.

Nevertheless we could construct a work-around that involved creation of a pseudo object of the template argument. The pseudo object is then casted to a given type, explicitely imposing the type constraint. The compiler will then check for these constraints once the templated function is created with template arguments.

There are some problems with this approach. Some are obvious, others may seem hidden at first. However, these problems just tell us that the approach is not really ideal. What is missing, is a reliable way for doing a static assertation. A static assertation will check for (any) condition during compilation. A popular choice is putting additional type constraints when using templates. This may seem odd at first, at templates can be considered to use duck-typing, however, in reality this is quite handy to only allow types that derive from a specific base class.

In order to use the static assertation we only need to call static_assert(). The function takes two arguments, the first being a constexpr function that returns a boolean value and the second being a message to be displayed.

Let's see some sample for improving a templates swap() function:

#include <type_traits>

using namespace std;
 
template <class T>
void swap(T& a, T& b) {
    static_assert(is_copy_constructible<T>::value, "Swap requires copying");
    auto c = b;
    b = a;
    a = c;
}

It is also ideal to combine static assertation with the noexcept check. Let's have a look:

static_assert(noexcept(is_nothrow_move_constructible<T>::value && std::is_nothrow_move_assignable<T>::value), "Swap may throw");

Finally we look at another example, which makes use of the is_default_constructible function. Again we may solve the problem differently, however, this solution is certainly much more elegant. If the provided type has a default constructor, we get true, i.e. the assertation won't throw an exception. Otherwise we will be informed about a possible error in our code.

template <class T>
struct data_structure {
    static_assert(is_default_constructible<T>::value, "Data Structure requires default-constructible elements");
};

Static assertations are quite powerful and provide a simple mechanism for including (and validating) constraints to C++. We should always emphasize correct typing. In the end the question is: Should possible errors be reported by the compiler or do we want to risk unexpected runtime behvavior and problems?

Created 9/6/2014 4:24:35 PM +00:00.