Ever wonder why your 64-bit Delphi Apps run so slowly? I’ve found that my 64-bit performance is roughly half what my 32-bit performance is.
Delphi’s 64-bit compiler basically performs no optimizations at all in many or even most situations.
This blog has come together in a round-about sort of way. I had originally posted it making a claim that the Delphi 64-bit compiler performs no optimizations at all, but comments to the contrary led me to find that the real problem was far more complex and finicky than I originally had observed. I removed the original blog post while I investigated why my findings didn’t match up with some of my readers’ comments and now I have reposted it with some almost equally shocking findings.
I have a bit of code that needs to be 64-bit (allocates many GB of RAM) and also needs to be super fast and optimized. I was shocked yesterday when I set out to try and figure out why my throughput was settling around 60MB/sec on average when I expected it to run around 200MB/sec or more. I concentrated on two functions that 99% of all the data went through, the “WriteData” and “ReadData” requests. These two functions were fairly complex and I wanted to see if I could change the orders of some calls, introduce new local variables, and triple check that there was appropriate use of inline functions and assembler when beneficial.
I cracked it open in disassembly mode. I was immediately shocked to find all kinds of redundant instructions, sometimes repeatedly just loading the same constant into the same register over and over for no real reason.
I did more investigation to find that the EXE generated with $O+ was identical in size to the EXE that is created with $O-.
I did some more prodding and added instructions into my project that served no purpose whatsoever.
var useless: nativeint; //...etc... begin useless := 0; //...etc... end;
I did this with the purpose of determining whether the optimizer was running at all and to my shock, initially it appeared as if there was basically no optimization in my 64-bit application.
I posted a blog with my findings in anger, showing all the evidence and screen-captures, until one reader responded that he was not seeing the behavior that I was reporting.
I immediately went searching for answers. Was it a difference between XE5 and XE6? Were my project settings corrupt? Did I need to rebuild my dproj from scratch? What if I start a new, simple app with just a few lines of code, would it optimize then?
Eventually, through some kicking and prodding, I found out the secret formula and, not to sound like a BuzzFeed post, the answer may shock you.
The following code does NOT optimize.
procedure TForm1.Button1Click(Sender: TObject); var i,a,b,c,d,e,f,g,h: nativeint; begin a := 2; b := 3; c := 4; d := 5; e := 6; f := 7; g := 8; h := 9; i := 0; i := 666; try TButton(sender).caption := inttostr(i); finally application.Title := 'stupid delphi'; end; end;
…yet… the following code DOES….
procedure TForm1.Button1Click(Sender: TObject); var i,a,b,c,d,e,f,g,h: nativeint; begin a := 2; b := 3; c := 4; d := 5; e := 6; f := 7; g := 8; h := 9; i := 0; i := 666; // try TButton(sender).caption := inttostr(i); // finally application.Title := 'stupid delphi'; // end; end;
At least I found that it supports the “inline” keyword and uses it… but that’s hardly a consolation prize at this point. This is another shameful stain on the Delphi product line!
I think it truly is time to get away from this language. I intend to build a Delphi to C++ compiler that takes Delphi code and generates a single C++ file that can be compiled with all the optimization your heart desires. If you want to join my project… get in touch with me.