Fun with Flynn

For my latest project I required an architecture that allows me to scale indefinitely without compromise.

Currently, there are a number of big topics in the IT industry. All of these topics, however, have one common problem: how to scale? Of course, we do our homework by splitting the application in several parts (services). We also make intelligent choices regarding our database systems to allow replication, scaling, and other mechanisms to enter our system. We use an efficient cache such as Redis. It seems that we figured it all out, but even the most optimized code is now missing something...

What is actually needed is a mechanism to scale up any service. This usually works by placing single services in dedicated containers, which can easily be duplicated without interfering with each other. A central method to do service discovery (and load balancing for these service instances) is also required.

All in all this sounds like a lot of work. We do not only need to write our application as optimal as possible, we also need to supply auxiliary functionality to make scaling possible and efficient. How can we solve this problem?

The answer to this question is given by some great software out there. Luckily, there are certain software stacks that are presented as a PaaS approach - without delivering the platform. After having a look at some of these solutions I ended up using Flynn, which tries to be unique by tackling the state problem. Indeed, Flynn makes having databases (document store, blobs, relational, cache, spatial, ...) in the application really simple. Furthermore, the provided stack supports incremental deployments, which are staged via git.

Flynn operates by using applications to communicate with each on different layers. A central discovery service takes care of providing internal DNS names for the services (easily reaching either a particular instance, a leader, or any instance) and checking their health. A logging mechanism aggregates the output from standard output and error from all services (found via the discovery mechanism). A router can be used to create routes for external requests reaching the infrastructure.

Officially, Flynn can only be used with a couple of languages such as Node.js, Go, Java, or others. We do not find .NET here. However, the system is based on the buildpacks way of provisioning applications, i.e., we have a standardized way of supplying an installation process. Luckily, a buildpack for dotnet core applications already exists. Unfortunately, out of the box it is not working. The reason is that the multibuildpack (delivered implicitly via a *.buildpacks file) cannot resolve git submodules. Either we change the buildpack to include the missing submodules directly (in this case the compile extensions), or we use the BUILDPACK environment variable to set the buildpack.

Besides the described problems all is working fine for now. The system can scale and handle anything. I decided to drop my Flynn cluster on Azure. It is also possible to start with a local installation (e.g., on a Raspberry Pi) and then scale out using some cloud provider. I figured out that starting with the cloud provider certainly has some benefits. In the end we do not care anymore which resources are available, as Flynn takes care of distributing all the service instances and infrastructure among the granted resources. Having multiple virtual machines (on different computing centers) is a scenario that can be covered with just a few commands.

Is the problem of scale now solved? Answering this question with yes would be a little bit too naive. I think we are one step closer to a good solution, but the main issues may now be found in my own code again - optimizing algorithms and laying good data paths.

Created .

References

Sharing is caring!