.NET Struct Performance

On this page I attempt to measure how .NET deals with simple data structures that one might employ in numerical algorithms and computational geometry, and compare the results to various other languages (C++, Java, JavaScript).

Related Benchmarks

This page focuses on method inlining in the presence of user-defined value types, a test case that is notoriously problematic for the .NET CLR. The article Head-to-head benchmark: C++ vs .NET by “Qwertie” on Code Project offers a much broader comparison of computational performance on both platforms, including cases that are more favorable for the CLR and sometimes allow C# to approximate Visual C++ speed.

Is C++ worth it? by Daniel Lemire compares a simple numerical loop in Oracle Java 7 and various C++ compilers, with embarrassing results for the latter. Running his test on my own system, I found that Java outperformed both Visual C# and Visual C++ by a factor of three!

Ternary operator is twice as slow as an if-else block? reveals an amazing CLR optimization failure: C# ternary operators are 10% (x64) or up to 170% (x86) slower than equivalent if-else statements!

Finally, those interested in the relative performance of JavaScript and Java should also check out Box2D as a Measure of Runtime Performance by Joel Webber.

Classes and Structs

The .NET Common Language Runtime (CLR) offers two kinds of user-defined objects: reference types (declared as class in C#) and value types (declared as struct in C#). This is not equivalent to the C++ distinction of class and struct (which only affects default visibility), but rather to the C++ distinction of allocating an object with or without the new keyword to place it on the heap.

CLR classes are always allocated on the heap and accessed by reference, just like all user-defined objects in Java and JavaScript (unless the optimizer puts them on the stack), whereas CLR structs are allocated directly within their surrounding memory context. Struct member access thereby avoids the extra dereferencing step that classes require.

Structs are allocated on the stack when not embedded within other objects, and so do not increase the garbage collector’s workload. Struct allocation within other objects (e.g. array elements) has the same benefit and also avoids storing an extra reference per element, potentially saving large amounts of memory.

Another consequence is that struct contents are copied wholesale on variable assignments, whereas each assignment of a class variable only copies a single reference. These “copying assignments” occur more often than you might expect, e.g. when an object is passed to a method or retrieved from a collection. From a performance viewpoint, the extra time spent on copying large structs eventually erases the benefits of embedded allocation – hence the general recommendation to use structs only for small amounts of data.

Copying contents versus references also constitutes an important semantic distinction, but here we’ll focus on runtime performance. Structs should perform better than classes when objects are frequently created and accessed, provided content copying is inexpensive or can be optimized away. The applications that should benefit most are numerical algorithms and computational geometry, as they require efficient types for small tuples of floating-point values: complex numbers, two- or three-dimensional coordinates, etc.

Primitive Types and Structs

The following benchmark does not compare structs to classes, but rather user-defined structs (where available) or classes (where not) to equivalent tuples of built-in primitive types. Passing a struct to a method (by value) is semantically equivalent to passing its individual fields, and accessing its fields is equivalent to accessing individual variables of the same type within the same storage context.

A good optimizer should be able to exploit this equivalence and produce struct handling code that is indistinguishable from using the “naked” field-equivalent variables directly. This is what we’re going to examine, for the specific case of methods that don’t change their parameters and are small enough to be inlined.

Struct Test Programs

All results shown below were obtained with a suite of small test programs. The download package StructTest.zip (118 KB, ZIP archive) comprises the precompiled executables and their complete source code. Please refer to the enclosed ReadMe.txt file and the various batch files for the required development tools and expected file paths.

All tests perform 1,000,000,000 loop iterations over two pairs of double-precision values, representing a point’s x- and y-coordinate. We initialize all coordinates to 1, then in each iteration assign the cross-wise sum of all coordinates to the first pair: a := (ax + by, ay + bx). The final coordinates are printed before each result to ensure the calculations were performed correctly (and not optimized away entirely, which C++ can actually do!).

Most tests represent points as a simple user-defined type: a struct in C#, a class with stack-allocated instances in C++, and reference-type objects in Java and JavaScript which allow no other option. All tests use property accessors to read a point’s coordinates. We run 2–4 tests for each language and runtime, each calling a different static method to add the coordinates. All methods are short enough to be inlined. The test methods are identified as follows:

  • AddByVal — The simplest variant. Two Point arguments are supplied by value (i.e. their contents are copied), and a new Point is returned by value. We skip this test for Java and JavaScript which don’t support passing or returning the actual contents of user-defined objects.
  • AddByRef — Two Point arguments are supplied by reference, and a new Point is returned by value. Wrapping a reference around a tiny struct should reduce performance, but has actually the opposite effect on all tested CLRs due to optimization. Again, we skip this test for Java and JavaScript.
  • AddByOut — Two Point arguments are supplied by reference, and a new Point is returned by reference in a third argument. Java and JavaScript directly return the method result, since their objects are always passed “by reference” (see below).
  • AddNaked — This variant uses no Point objects at all. All coordinates are defined, supplied, and returned as “naked” double values. Java and JavaScript use the popular array trick to return two coordinates by reference.

Technical Note: Java and JavaScript always use pass-by-value for their method arguments, but what is actually being passed by value in the AddByOut case is itself a reference to an object, and therefore equivalent (for the purpose of our benchmark) to C++ and C# passing that object by reference.

Sample Test Results

The following tables show sample test results on my system, comprising Windows 8.1 (64 bit) on an MSI X58 Pro-E motherboard with an Intel Core i7 920 CPU (2.67 GHz) and 6 GB RAM (DDR3-1333). The tests were not conducted with any kind of scientific rigor; I simply ran each test several times and picked a nice round average. All non-JavaScript tests use full optimization and were conducted in both 32-bit and 64-bit mode. All times are in milliseconds.

The present test results were obtained in February 2014. The Struct Performance tests were first published in June 2011 and updated in July 2012 and February 2013, so I have test results for various older versions of the tested compilers and runtimes. Some of the old results were originally obtained on Windows 7 SP1 (64 bit) and an Intel DX58SO motherboard, but the MSI board was a drop-in replacement with identical specifications and components. Those tests that I remeasured with old versions showed very little performance change, so the results should be comparable.

GCC & Visual C++

Table 1 shows C++ test results for gcc 4.8.2 (Windows port by MinGW-w64), Microsoft Visual C++ 2010 SP1, and Visual C++ 2013. Previously tested versions with identical results are not shown, including gcc 4.5.2 (32-bit only) and 4.7.0.

The VC++ 2010 results were obtained with the keyword inline preceding all measured functions. Retesting with current compilers I found this keyword made no difference, so I removed it.

Table 1 gcc VC++ 2010 SP1 VC++ 2013
32 bit 64 bit 32 bit 64 bit 32 bit 64 bit
AddByVal 1,030 1,030 1,030 8,660 5,940 4,160
AddByRef 1,030 1,030 1,030 8,660 9,260 8,920
AddByOut 1,030 1,030 1,030 8,660 9,260 8,910
AddNaked 1,030 1,030 1,030 1,030 1,030 1,030

gcc — The only compiler in this comparison that correctly optimizes all test cases and delivers the same excellent performance in each of them.

Visual C++ — Microsoft had one optimizer that could match gcc, and that was the 32-bit version of VC++ 2010. The 64-bit version of that compiler and both versions of VC++ 2013 exhibit embarrassing optimization failures with simple user-defined types, falling behind both CLR and JVM. This should caution you against using VC++ as representative for “C++ performance,” by the way.

.NET CLR & Mono

Table 2 shows C# test results for Microsoft Visual C# 2013 (.NET Framework 4.5.1) and Mono 3.2.3. Previously tested versions with nearly identical results are not shown, including VC# 2010 (.NET 4), VC# 2012 (.NET 4.5), Mono 2.10.9 (32-bit only), and Mono 3.0.3.

.NET 4.5 introduced the method attribute Aggressive Inlining which does have a noticeable effect – but not always a good one. Decorating all measured methods with this attribute yielded speedups of 10-25% but also slowdowns of 25-70%, depending on the test case. I decided to omit the attribute.

Table 2 Visual C# Mono
32 bit 64 bit 32 bit 64 bit
AddByVal 5,450 8,600 22,000 22,100
AddByRef 3,430 6,880 15,830 15,880
AddByOut 3,430 5,150 8,270 8,300
AddNaked 3,430 3,430 5,850 5,870

Visual C# — Counterintuitively, the optimizer of the 32-bit CLR works correctly only when structs are passed by reference rather than by value – not a great alternative due to the changed semantics. While the 64-bit CLR also profits from this trick, it is slower to begin with, and struct handling never reaches the speed of naked double values. The latter are over 3× slower than gcc for either CLR, which is also rather unimpressive.

The likely cause for the call-by-reference speedup is the fact that our small test methods can be inlined. My guess: 1. The optimizer identifies call-by-reference structs with caller’s objects, and so wastes no time creating references. 2. The optimizer realizes that naked double values are not changed in the test method except on return, and so wastes no time copying them. 3. However, the optimizer fails to correctly analyze the use of call-by-value structs, and so always wastes time copying them.

Mono — The major third-party CLR is slower than Microsoft’s implementation by a factor of 1.7–4.6, and we again note the counterintuitive result that passing a small struct by reference is faster than passing it by value. This result is not a scathing criticism of Mono – merely keeping up with new .NET features while porting the CLR to many more platforms is quite an achievement! However, it does demonstrate that Mono is not an option if you’re looking for better performance.

Oracle Java

Table 3 shows Java test results for Oracle Java Development Kit 7u13 and 8, using all available flavors of Client and Server VM. Other tested versions with nearly identical results are not shown, including JDK 7u3 (same as 7u13) and JDK 7u51 (same as 8).

Table 3 Oracle JDK 7u13 Oracle JDK 8
Clt/32 Svr/32 Svr/64 Clt/32 Svr/32 Svr/64
AddByOut 8,780 5,785 4,875 8,290 5,480 5,400
AddNaked 3,775 1,645 1,645 3,775 3,440 3,440

Java Client VM — This obsolete 32-bit VM performs as expected, i.e. somewhat slower than the CLR.

Java Server VM — On the other hand, the Server VM’s optimizer is so excellent that user-defined types roughly match pass-by-value structs on the 32-bit CLR, and all struct tests on the 64-bit CLR. As of 7u13, naked double values even came within 60% of the gcc baseline!

Sadly, this amazing result was lost in some optimizer tweak between 7u13 and 7u51. Running the Client vs Server benchmarks, I found that 7u51 is just as fast as 7u13 in the Fibonacci tests and actually 10-20% faster in SciMark, so this does not represent an overall performance regression.

JavaScript

Table 4 shows JavaScript test results for the three major desktop browsers across several versions, all running in 32-bit mode.

Table 4 Chrome Firefox Internet Explorer
16 24 33 10 19 27 9 10 11
AddByOut 31,200 14,875 10,400 25,900 27,045 11,550 107,000 53,650 23,800
AddNaked 34,800 8,975 8,390 12,500 14,300 2,270 27,000 17,095 20,400

Chrome & Firefox — Both browsers come within 50% of the best Mono performance for user-defined types, making them perfectly suitable for general application development.

Shockingly, Firefox is second only to C++ for naked double values! I suspect it cheats a bit, though: my test does not require fractional values, so these double values may be internally represented as integers. Also note that browsers are prone to regressions in this benchmark (FF 10-19, IE 10-11).

Internet Explorer — While remaining proudly a year or two behind the competition in terms of performance, even IE11 finally matches the worst case of the Mono runtime for user-defined types. Only primitive operations remain problematic.

Test Conclusions

From a C# perspective, these results are fairly depressing. Our test was tailored to the CLR’s user-defined value types which neither Java nor JavaScript offers, so one would expect the CLR to win handily. Instead, it can barely keep up with ancient Java! I applaud the designers of the Server VM for what they managed to tweak out of that language. Meanwhile, the CLR’s greater complexity provides no clear performance benefits, despite the greater burden on the developer – its speed is unimpressive even for hand-optimized code.

To be sure, most developers won’t care. The chief purpose of .NET was to replace various older Microsoft technologies, including Visual Basic, ASP, and Office scripting, which were mostly used for business in-house projects. .NET continues to fill these roles very well, and the CLR (even Mono’s!) is certainly fast enough for them. Other applications are rarely based on .NET and this is unlikely to change, given Microsoft’s current shift back to C++ and onward to JavaScript. If a platform already dominates its appointed niche, why bother improving it?

But one can’t help feeling disappointed that the CLR’s great potential goes to waste in this way. Couldn’t Microsoft have spent one percent of one percent of its annual monopoly rent to write a decent optimizer for the CLR? Instead, C# is being eclipsed by JavaScript or modern derivatives like TypeScript – tellingly created by C# designer Anders Hejlsberg. For great performance you’ll need abstruse C++ and a compiler not written by Microsoft. Otherwise, you might as well use Java or even JavaScript.