Little wonders of C++ (8)

C++ is transforming into a better language, that is even more powerful than before and much easier and robust to write.

We are coming close to an end of this blog post series. Last time we've started with the STL. The STL has of course been a good companion in every part. The STL contains smart pointers, vectors and lists, and much more. Language features, such as initializers, iterator loops and more are constructed upon structures presented in the STL initially. There will be two more wonders, before I will discuss C++'s greatest blunders.

It should therefore be no surprise that types that are introduced via the STL will be discussed in this part. We will see that the STL now has a lot to offer in terms of multi-threading. Additionally C++ includes some nifty concepts that make programming more enjoyable in general.

We will therefore start with bitsets, which are super useful in a lot of applications. Then we will have a closer look at streams, which is C++'s answer to the old question of abstracting away common I/O capabilities. Finally the whole threading topic that has been introduced with C++11 is presented.

Bitsets

Storing or manipulating raw bits can be tedious. Of course one could work with bit-shift or general bit-wise operators, however, most often an array-like representation feels the most natural. In the end a working algorithm can always be improved. Luckily for us the STL already offers such a specialized data-structure, that is also quite efficient. It is the std::bitset<N> type. The N is the number of bits to represent. Therefore we benefit from compile-time optimizations.

#include <iostream>
#include <bitset>

int main () {
  std::bitset<8> one_empty_byte;
  std::bitset<16> two_filled_bytes(0xfa2);

  std::cout << "Empty byte: " << one_empty_byte << std::endl;
  std::cout << "Filled w/c: " << two_filled_bytes << std::endl;
}

As we expect, this type offers a lot of methods that can be used for manipulating its content. Additionally array-like access, and creation methods such as to_string() or to_ulong() have been included. The implementation of a bitset is fundamentally different than, e.g., using a std::vector<bool> instance. Usually the bitset will easily outperform the vector, however, if we want to deal with really big bitsets, we might look for a different solution.

The bitset type has been created to deal with standard / small to medium bitwise representation of data. Since C++ is close to the hardware, we expect C++ to outperform Java / C# bitsets by far. C++ archives this by putting the bit-array on the stack.

Streams

C# (and of course Java) introduce the concept of a Stream to a huge number of programmers. Most people are using them without realizing how important this concept it. However, C++ had the idea long before Java or C# have been born. And even though C++ was not the first one to introduce them, it certainly was responsible for making streams as succesful abstraction.

Why are streams so useful? First of all we could back to C and we should realize that puts() and printf() are not really suited for printing text on any other device than the terminal. There are functions that demand more parameters, but these standard functions always use stdin and stdout. What's so cool about std::cin and std::cout? They depend upon streams and we could easily change their assigned device. This is particularly useful if our application uses some other code, which, e.g., uses std::cout for printing something. We could omit that output or redirect it, e.g., to a textbox some other useful part depending on our application.

Also a function that uses std::istream or std::ostream as its input will also work with more specialized streams. That is particularly useful when we do not want to be constraint on a certain type of stream. In general this makes it possible to use a file stream (e.g. std::ofstream) for storing data and then to use the same function again for printing the content in the console (or sending it over the network). Even though streams may offer some functionality such as seeking, some instances / specializations may not support such a behavior.

The most basic class in C++ is called std::ios_base. This type is not templated and alone not very useful. A much better base is provided by the templated derived version std::basic_ios<T>. It is also the base for templates such as std::basic_istream<T> and std::basic_ostream<T>.

Our usual terminal streams are based on those. For instance std::cin is just an object of the class std::istream<char>. It corresponds to the C stream stdin. However, it offers all methods that are specified for input streams, such as getline():

#include <iostream>

int main () {
  char name[256];

  std::cout << "Enter your name: ";
  std::cin.getline(name, 256);
}

Since the bit-shift operator is usually also used for passing objects to a stream (either in an input stream >>, or to an output stream <<), we should definitely provide this operator for our types.

template <class T>
std::ostream& operator<<(std::ostream& os, const vec4<T>& v) {
	for (size_t i = 0; i < 4; ++i)
		os << v[i] << std::endl;

	return os;
}

Here we just include an operator to stringify our type called vec4<T>. The code can only compile if the object that is returned from the array access can be handled in similar fashion. If that object is an elementary value, such as bool, int or double, an operator has already been defined.

This way C++ does not need the object tree that is available in, e.g., C#, where every type specifies a ToString method. However, while ToString is a virtual function, providing polymorphism with late-binding, the operator is statically linked (early-binding). We should never forget this subtle difference!

Threading

Threading is essential for unleashing the computing capabilities of modern CPUs. Threads are units of work maintained by the operating system. By dividing our code into various functions that can be separated into threads, we allow the responsible OS to schedule work in our application across different cores. Besides the obvious scaling benefits we get a more responsive program. The reason is simple: finally we can put work on a different thread than the main (UI) thread.

C++11 includes an abstraction on top of the the OS threading API. That is useful to have a common API in C++, that does not depend on the specific OS. We can forget PThreads or similar approaches, which are limited to certain systems by their nature. Even better, C++11 also brings useful types to tame multi-threading. With the included std::mutex we are able to synchronize access to resources. In the next section we will also discuss futures.

Let's see how easy it is to spawn additional threads in C++. We also care about destroying (ending) running threads.

#include <iostream>
#include <thread>

void hello() {
  std::cout << "Hello from another thread!" << std::endl;
}

int main() {
  std::thread t(hello);
  t.join();
}

We find all the core utilities for threading in the header thread. In the previous example we also call the join() function. Calling it forces the current (calling) thread to wait for the thread, which is passed the function, i.e. the responsible object. So in this case, the main thread has to wait for the thread located in t1 to finish. Each thread has a single id allowing us to distinguish different threads. The std::thread class has a get_id() function returning an unique id as a string.

The following example shows the usage of threads by spawning them from an anonymous function, provided in form of a lambda expression.

#include <iostream>
#include <thread>
#include <vector>

int main() {
  std::vector<std::thread> threads;

  for (int i = 0; i < 5; ++i) {
    threads.push_back(std::thread([]() {
        std::cout << "Hello from thread " << std::this_thread::get_id() << std::endl;
    }));
  }

  for (auto& thread : threads)
    thread.join();
}

Together with mutexes, RAII and other strategies, we can easily conquer multi-threading in C++. It has been time for a really good way to do multi-threading cross-platform with C++!

Futures

Now that we have threads we be interested in certain patterns that make threads more useful. A particularly useful one is the future / promise pattern. Here we pack a running thread in an object, that knows the current state of the thread with respect to its result, if it has one. Additionally we may put glue together such objects. For instance if one future finishes we may want to start another.

A promise in C++ is an object that can store a value of a specified type to be retrieved by a future object. This future object may live in another thread. Therefore a promise is offering a synchronization point. The promise object is an asynchronous provider and is expected to set a value for a shared state at some point. This shared state can be associated to a future object by calling the method get_future(). After the call, both objects share the same shared state: Hence we have an asynchronous return object in form of a future that can retrieve the value of the shared state, waiting for it to be ready, if necessary.

Obviously we have two types for the same thing, however, this separation has a good reason. We want to provide a read-only view to all other threads (consumers). The only way to modify the result is by the creator (producer). Here we have a clear separation of concerns, that is similar to other constructs in C++ such as the std::weak_ptr<T> type.

#include <iostream>
#include <thread>
#include <future>

int main() {
  auto promise = std::promise<std::string>();

  auto producer = std::thread([&]() {
    promise.set_value("Hello World");
  });

  auto future = promise.get_future();

  auto consumer = std::thread([&]() {
    std::cout << future.get();
  });

  producer.join();
  consumer.join();
}

There is also an abstraction called std::async, which practically spawns a new thread with a given function and returns the result of the started thread as a future. Therefore we could also write:

#include <iostream>
#include <thread>
#include <future>

int main() {
  auto future = std::async([&]() {
    return "Hello World";
  });

  auto consumer = std::thread([&]() {
    std::cout << future.get();
  });

  consumer.join();
}

The whole thing is really neat and provides a lot of possibilities. It does not make asynchronous programming as enjoyable as C#'s async / await, however, it is a good start and still close to the machine. There is no state-machine magic going on here, just good design with elementary mechanisms.

Thread-local storage

Threading may lead to problems such as race conditions. That happens if two or more parts of a code try to access the same resource. If accessing the resource is not protected by some mechanism, such as a std::mutex lock, we will face the problem of undefined behavior at runtime. Even though a collision might be as unlikely as 1 in a billion, we still have 109 cycles in a second. If the code is running 10 seconds and the critical instruction is one of ten in the execution loop, we will have a guaranteed collision.

Maybe there is a simpler solution to the problem than adding the overhead of a mutual exclusive state. Either we can live without global states, making every global variable immutable, or global states need to be thread-mutable only. Maybe the latter constraint is even stronger. Maybe each thread has its own global state, which only needs to be accessed and modified from the currently exeucting thread. If this is the case, then we need a so called thread-local storage.

Even before C++11 we could introduce thread-local storage. However, as usual, the only way to do this was by using compiler-specific keywords. For instance VCC had a special call to the __declspec intrinsic to perform this magic. Here we had to write __declspec(thread) in front of a variable declaration. These days can be considered over!

C++11 introduces a new keyword called thread_local. It can be used to declare global (or at least namespace global) or static (function or member) variables thread-local.

#include <iostream>
#include <string>
#include <thread>
#include <mutex>
 
thread_local int counter = 1;
std::mutex key;
 
void test_counter(const std::string& name) {
  ++counter;
  std::lock_guard<std::mutex> lock(key);
  std::cout << "Counter for " << name << ": " << counter << std::endl;
}
 
int main() {
  std::thread a(test_counter, "a"), b(test_counter, "b");

  {
    std::lock_guard<std::mutex> lock(key);
    std::cout << "Counter for main: " << counter << std::endl;
  }

  a.join();
  b.join();
}

No matter how often we will run this code, we will always see that the output from the two threads, denoted by a and b, is 2. The main thread, however, will output 1. Therefore we use three threads, with three different versions of the variable counter. This is why we say that the variable is thread-local. Each thread has its own copy of the variable. Modification is only done to the local copy. There are no race conditions for this resource.

It should be said that the implementation of the TLS is compiler specific (as usual). Some compilers offer very fast TLS implementations, but sometimes a fallback is used, when a system does not seem to have a proper management unit for scheduling the TLS. Such a fallback will usually result in at least three system calls. These calls are (again: this is nothing new) quite expensive. We need to get the system specific thread-id. Then we need to take a handle for the thread-id (semaphore) and we also need to release this handle again in the end. Unfortunately it's not uncommon to see the slow version of the TLS being compiled.

Created . Last updated .

References

Sharing is caring!