I'm experimenting with the various compiler directives that might improve performance. /arch:SSE has the benefit of allowing the compiler to use the cvtss2si instruction under some circumstances. However, it still calls one of the _ftol intrinsics (or inlines equivalent code) when converting to unsigned data types.
Further, I find that when converting from float to unsigned long, the compiler _sometimes_ generates code that converts any value above UINT_MAX/2 to UINT_MAX/2.
The following code:
#include "stdafx.h"
int _tmain(int argc, _TCHAR* argv[])
{
unsigned long r = 1;
float f = 3.0e9;
for (int i=0; i<3; i++)
{
r *= (unsigned long)f;
f *= 1.234f;
}
return r;
}
... can demonstrate some of these issues. Converting 3x10^9 to an unsigned long should produce 0xB2D05E00 and this should be the value of r after the first iteration. If compiled with /arch:SSE the float-to-long is done with a call to _ftol2 and the result is correct. With /arch:SSE2 the code is inlined and it produces the result 0x8000000. I have also seen the compiler generate this code when using /arch:SSE, but cannot reproduce this in a simple example.
Can anyone tell me (a) the recommended way to get the best performance for float-to-integer (unguarded) conversions, (b) whether the problem above is a known bug, and (c) whether this area is changed at all in the 2005 compiler.
Thanks!
Phil Atkin

float to integer conversion
WhyteKnight
I just tried using VC2005 with your posted sample and it seems to behave as expected even with /arch:SSE2.
The whole floating point area has changed a lot in VC2005. VC2005 intrdouces a whole new FP model where the programmer will have more control over the FP optmizaitons performed by the compiler. Take a look at the article http://msdn.microsoft.com/library/default.asp url=/library/en-us/dv_vstechart/html/floapoint.asp where the new model is explained in details.
Hope this helps!
Thanks,
Ayman Shoukry
VC++ Team
LaurentiuH
The new model for floating point in VC2005 looks excellent and I think I won't waste any more time trying to squeeze the last drop of performance out of VC7.
Thanks again,
Phil Atkin