Hello code gurus,
I'm a recent arrival form the CodeWarrior PPC world, and hope someone can orient me toward the appropriate docs and examples.
I need to do some floating point (double preferred) operations on vectors, basic ones like vector +,-,/, *, sqrt, dot product. Vectors are sized at runtime and don't change once created. I've read about hardware SIMD instructions and want to use these.
1) Do I need to use the SIMD instructions explicitly Prefer not to!
2) Should I use a vector library Can you recommend one that useds SIMD, and allows the vector to to be sized at runtime (valarray can't do this).
3) If I write my own C++ code to operate on these vectors in explicit loops, is the VC++ compiler smart enough to use SIMD
If so, how can I find out how my code needs to look to enable this optimization In particular, required data alignment if vectors are in memory blocks, the required form of loops, and how to tell whether an optimization was done
Sorry for the flood of questions, but all I've been able to locate is a bunch of disconnected documents that only hint at these answers. If someone can point me to some definitive docs and examples, I would really appreciate it.

floating point vector optimization
Brett Nieland
You can use the /arch:SSE & /arch:SSE2 compiler switches.
Check the following links for more details:
http://msdn2.microsoft.com/en-us/library/7t5yh4fd.aspx
http://msdn.microsoft.com/library/default.asp url=/library/en-us/dv_vstechart/html/vctchoptimizingyourcodewithvisualc.asp
Also, take a look at the new floating point model introduced in VC2005 and described in details at http://msdn.microsoft.com/library/default.asp url=/library/en-us/dv_vstechart/html/floapoint.asp
Hope this helps!
Thanks, Ayman Shoukry VC++ TeamRido
Thanks for your answers. After reviewing the articles I see I want to set the /arch:SSE2 and /fp:fast flags in my application.
The particular optimizations I want to ensure are happening are (from the third article):
1) scalar reduction, in which looping through an array is done in chunks of 4 using SSE2 instructions
2) contraction, in which a C++ multiply and add is combined into one SIMD instruction.
However the article states that these optimizations "may" be done by the compiler, and the examples given are quite simple. How can I find out whether they actually ARE done in my more complex code, or even better, find out the form I need to follow to ensure that they are done
Alternatively, do you know of any existing C++ library that implements vector fp operations using SIMD
I appreciate your help and expertise!
Wolfgang Rockelein
You will have to do one of two ways to make sure such optmizations are taking place.
Either place some values that would give you different floating point results if such optmization is carried or dump the code generated (assembly) and trace if such optmization is taking place.
I am personaly not aware of exisiting lib for vector operations using SIMD but other folks on the forum might have more info.
Thanks, Ayman Shoukry VC++ TeamANeelima
/arch:SSE [2] doesn't qurantee generating only SSE[2] instrcutions. According to certain compiler huristics, the code generated is a mix of SSE[2] & x87.
As for contraction (the fma instrcution), I believe the VC2005 compiler does it only for 64bit code generation.
Thanks, Ayman Shoukry VC++ TeamPat Jones
Well, I put the computations (essentially a dot product) into the simplest loop possible. The gVec's are global float* that point to arrays allocated with _aligned_malloc to 16 byte boundary. isr, sum, and lp0 are local floats.
unsigned
int ni;for (ni = 0; ni<nSize; ni++) {
float isr=gVec1[ni];
sum += (lp0-gVec2[ni])*isr;
p0c += isr;
}
What I see in the code listing is the use of SSE instructions, but only on one array element at a time (no associative grouping by 4's), and no contraction of multiply with add.
Any ideas Thanks.