Visual Studio 2012 might not look as good as Visual Studio 2010 (even though it might be just a matter of taste and acclimatization), however, it comes with some really advanced features. One of those features is the new C++ compiler, which is not only C++11 standard conform, but also does some really important steps for you.
A long time ago I found out that the Intel compiler is the only one that should be used if one cares about performance (on Intel machines and also in general). The Microsoft compiler was certainly more advanced in areas like optimizing algorithms and avoiding memory leaks, but could not match the speed for elementary operations. This issue is now gone with the newly introduced auto-vectorization. This feature makes use of MMX, SSE and other more advanced CPU abilities, which are standardized and result in great performance benefits. If we wanted to use these posibilities before, we either had to use a really advanced compiler (like the one from Intel) or make use of compiler intrinsics.
Those compiler intrinsics work like inline-functions, however, the main difference is that intrinsics can be optimized by the compiler (the compiler decides what to do with the statement and how to resolve the commands), while inline-functions cannot.
The code that has been used for this benchmark can be downloaded. It calls a sub-routine three times with such sizes that they should fit into (lowest level) L1, L2 and RAM. The function executes some elementary operations with a growing number of operators. After each operation is finished the same one is executed with compiler intrinsics.
Let's have a look at the data first:
|Intel i3 VS2010 (Normal)||Intel i3 VS2010 (Intrinsics)||AMD Athlon64 VS2012 (Normal)||AMD Athlon64 VS2012 (Intrinsics)||Intel i7 VS2012 (Normal)||Intel i7 VS2012 (Intrinsics)|
|L1: c = a + b||12.105||3.822||22.745||18.081||1.024||0.992|
|L1: c = a2 - b2||13.556||3.963||31.496||30.732||1.293||1.265|
|L1: c = a4 - b4||40.279||4.96||57.642||65.132||2.218||1.986|
|L1: c = a8 - b8||93.585||11.606||109.248||133.366||4.65||4.63|
|L2: c = a + b||12.511||5.71||25.163||19.703||1.68||1.594|
|L2: c = a2 - b2||15.054||5.631||33.166||33.088||1.719||1.603|
|L2: c = a4 - b4||39.234||5.257||58.142||65.396||2.419||2.186|
|L2: c = a8 - b8||92.68||11.154||109.809||134.333||4.626||4.586|
|RAM: c = a + b||17.082||15.444||27.488||23.322||7.327||7.475|
|RAM: c = a2 - b2||18.174||15.32||34.18||35.256||7.974||8.016|
|RAM: c = a4 - b4||38.891||15.195||59.327||68.344||8.461||8.148|
|RAM: c = a8 - b8||92.399||17.144||110.885||144.207||9.035||9.103|
While we see that with the new compiler (included in Visual Studio 2012) we gain a lot (the speedup is always around 1, at most 1.1 with the new i7 and 1.27 with the old AMD). The speedup with the old compiler was quite large, with a factor of 8.3 as maximum.
Additionally we see the performance benefit of using and i7, compared to a really old AMD Athlon64 3200+ with 2 GHz. Here larger L1, L2, faster RAM and an extended set of registers are in favor of modern CPUs.