Auto-vectorization in VS 2012

Visual Studio offers new performance benefits for C/C++ programmers by automatically applying vectorization where possible.

Visual Studio 2012 might not look as good as Visual Studio 2010 (even though it might be just a matter of taste and acclimatization), however, it comes with some really advanced features. One of those features is the new C++ compiler, which is not only C++11 standard conform, but also does some really important steps for you.

A long time ago I found out that the Intel compiler is the only one that should be used if one cares about performance (on Intel machines and also in general). The Microsoft compiler was certainly more advanced in areas like optimizing algorithms and avoiding memory leaks, but could not match the speed for elementary operations. This issue is now gone with the newly introduced auto-vectorization. This feature makes use of MMX, SSE and other more advanced CPU abilities, which are standardized and result in great performance benefits. If we wanted to use these posibilities before, we either had to use a really advanced compiler (like the one from Intel) or make use of compiler intrinsics.

Those compiler intrinsics work like inline-functions, however, the main difference is that intrinsics can be optimized by the compiler (the compiler decides what to do with the statement and how to resolve the commands), while inline-functions cannot.

The code that has been used for this benchmark can be downloaded. It calls a sub-routine three times with such sizes that they should fit into (lowest level) L1, L2 and RAM. The function executes some elementary operations with a growing number of operators. After each operation is finished the same one is executed with compiler intrinsics.

Let's have a look at the data first:

	Intel i3 VS2010 (Normal)	Intel i3 VS2010 (Intrinsics)	AMD Athlon64 VS2012 (Normal)	AMD Athlon64 VS2012 (Intrinsics)	Intel i7 VS2012 (Normal)	Intel i7 VS2012 (Intrinsics)
L1: c = a + b	12.105	3.822	22.745	18.081	1.024	0.992
L1: c = a² - b²	13.556	3.963	31.496	30.732	1.293	1.265
L1: c = a⁴ - b⁴	40.279	4.96	57.642	65.132	2.218	1.986
L1: c = a⁸ - b⁸	93.585	11.606	109.248	133.366	4.65	4.63
L2: c = a + b	12.511	5.71	25.163	19.703	1.68	1.594
L2: c = a² - b²	15.054	5.631	33.166	33.088	1.719	1.603
L2: c = a⁴ - b⁴	39.234	5.257	58.142	65.396	2.419	2.186
L2: c = a⁸ - b⁸	92.68	11.154	109.809	134.333	4.626	4.586
RAM: c = a + b	17.082	15.444	27.488	23.322	7.327	7.475
RAM: c = a² - b²	18.174	15.32	34.18	35.256	7.974	8.016
RAM: c = a⁴ - b⁴	38.891	15.195	59.327	68.344	8.461	8.148
RAM: c = a⁸ - b⁸	92.399	17.144	110.885	144.207	9.035	9.103

While we see that with the new compiler (included in Visual Studio 2012) we gain a lot (the speedup is always around 1, at most 1.1 with the new i7 and 1.27 with the old AMD). The speedup with the old compiler was quite large, with a factor of 8.3 as maximum.

Additionally we see the performance benefit of using and i7, compared to a really old AMD Athlon64 3200+ with 2 GHz. Here larger L1, L2, faster RAM and an extended set of registers are in favor of modern CPUs.

Download main.cpp (6 kB)

Created 9/5/2012 3:43:05 PM +00:00. Last updated 9/6/2012 8:35:05 PM +00:00.

Auto-vectorization in VS 2012

References

Sharing is caring!