floating point performance on x64 (AMD Opteron)

I am porting a numerically intensive application from VS7 to VS8.  Surprisingly the Win32 Release exe (VS8) is 3 times as fast as the x64 Release exe in a matrix solver soving an array of double.  I am using the same optimizations (/O2) for both x64 and Win32.  Does this seem right   I had been hoping for a speed improvement with the 64 bit version.


Answer this question

floating point performance on x64 (AMD Opteron)

  • Toper

    Are you using the new floating point switches

    If you are looking for performance gains, try using /fp:fast where lots of the floating point optimizations are enabled. The default FP model is /fp:precise which disables some of the FP optimizations.

    See the below paper for more info:
    http://msdn.microsoft.com/library/default.asp url=/library/en-us/dv_vstechart/html/floapoint.asp

    Try using /fp:fast on both Win32 and x64 and see if you still get such a huge difference.

    Thanks,
      Ayman shouklry
      VC++ Team

  • MeanOldDBA

    There so many variables who influence speed on different processors and with maybe same platform ( x64 ) that general speed compiler switches behave very differently.
    And don't forget that all what you're using is still beta and is very likely to behave different when it is final.

    My advice:

    1. Still port to x64 as i would do
    2. wait until final of Whidbey was released and redo test
    3. Maybe optimize some procedures with loop unrolling, ASM optimizations etc.

    But still keep in mind that you should only use SSE2-SSE3 in future. Don't use the old FP-opcodes in MASM64 or something equally.

    Bye
    Martin



  • Tomasz Staroszczyk Poland

    Hi John,
      I can see that you already reported the bug at http://lab.msdn.microsoft.com/ProductFeedback/viewFeedback.aspx feedbackid=68338e56-a013-48af-811e-454ca3b955b3

    Thanks a lot for spending the time to log the issue!

    Unfortunately, you haven't attached any code to reproduce the issue. Without such repro case, the responsible folks won't be able to dig more into what is really happening.

    It would be great if you generate a preprocessed version (using the /P compiler switch) of the code in question. Also, please include all compiler switches being used (build log). Please attach such info and repro case to the bug entry using the above link.

    Thanks,
      Ayman shoukry
      VC++ Team

  • popsdawg

    I was using /fp:fast on the matrix solver.  I rebuilt with /fp:fast for all numeric code but there was no significant improvement and Win32 was still 3 times faster than x64 (~8 seconds for 200 Win32 matrix solutions, vs ~26 secs for 200 x64).

    Thanks

    John


  • Stubabe D

    Hi John,
      I would follow what Martin is suggestinsg but still make sure to log a performance bug at http://lab.msdn.microsoft.com/. Our developers would for sure be interested in looking at the issue and analyzing it.

    Please include a sample where the issue could be reproduced.

    Thanks,
      Ayman Shoukry
      VC++ Team.

  • floating point performance on x64 (AMD Opteron)