Hi!
When executing my pixel shader, results are ok. But when I execute with my recent new grpahic card Quadro FX 3400/4400 results are rubbish.
Exactly what happens is that!
In my pixel shader I have these four operations
Output.weights.x = (1-frac2.x)*(1-frac2.y);\n" // 1 float output
Output.weights.y = frac2.x*(1-frac2.y); // 1 float output
Output.weights.z = (1-frac2.x)*frac2.y; //1float output
Output.weights.w = frac2.x*frac2.y; // 1float output
Using Devyce_REF results are for example
0.5625
0.1875
0.1875
0.0625
that its exactly what i'm waiting for, but using my grpahic card the results are quite differents :
0.56175613
0.19117355
0.18433762
0.062732697
frac2.x and frac2.y are coming from pixel shader as
coord = In.oTex1 + (float2)transformation/texsize;
coord2[0] = coord*texsize;
frac2 = frac (coord2[0]);
and In.oTex1 is the output of my vertex shader as
Output.oTex1 = mul(vTex1, (float2x2)g_mWorld) + (float2)g_mWorld[3];
were in this case g_mWorld is the identity and g_mWorld[3] are (0.75 0.75)
My vertex coordinates are just a quand with values
(0,0)
(0,1)
(1,0)
(1,1)
The program works really fine with software emolution, but with the graphic card doesnt seems to work ok.
Can anyone give me a idea about whats going on
Thank you,
Jesuss

Different results between REF and HAL devyce
SSwam
First, thanks a lot for your answer.
texsize is 256 so 8 bits. I dont know really how does it works float instructions. But I’m sure that with floats (32 bits) ... if normal software gives me good results, it should the graphic card gives me the same.
Cna you explain me how do you know that
you're only getting 7 or so binary digits of precision after doing the frac().
The graphic card is a good one, is any place where i can give it a parameter of precision
I’m doing algorithms for biomedical image applications. The problem is, that given this error, as the Image is 256*256, after when I’m going to addition all the points, I have errors of more than 28 inside my histogram for one of my combination of pixels values, and this is not acceptable in the field where I am working.
Its for this reason I need to solve it out! IAnd I’m sure it is possible. Because operations similars in the normal pipeline I got perfect results. There is some bug...bad I donmnt have no idea where.
In any case, thanks, and I’m looking forward any suggestion.
Jesus
MetroP
Seems to me like what you're doing is::
frac2 = frac(in.oTex1 * texsize + transformation)
Any particular reason you're dividing and then multiplying That's a sure source of precision loss.
Another thing you could try is reading "Directly Mapping Texels to Pixels" in the docs. If your texture coords are 0 to 1 as your vertices are, it's likely that you're not getting the interpolated values you're expecting.
If modifying the texture coords according to that text fixes the problem, let us know, since in that case I'd say that the REF results are wrong.
AmirRony2
Tom is not kidding. The difference between the REF and HAL device that you're seeing is that the HAL device is using lower precision caclulations than REF device. This is allowed in Direct3D; you can't expect to get the same numbers on differing kinds of hardware.
In general, regardless of whether you're using Direct3D or not, you should realize that you can almost never assume floating point numbers are exact. They should always be seen as approximations.
RSF
Calebs Garage
I don't know if I'd call these results "rubbish", so much as poor approximations. They are accurate to 7 binary digits after the decimal place.
So it looks like you're only getting 7 or so binary digits of precision after doing the frac(). How big is texsize If it's say 512 then that's where 9 bits of your precision went. If your card is only using 24-bit floating point math, which has only has 16 bit of precision, then that would explain the results you're seeing.
Fernando Ruano
But anyway thanks!
If you visit http://www.gpgpu.org/ you will se really pretty medical algorithms and conferences about that! Anyways thanks!
Jesus
vivisad
Hi!
The interpolator is not doing the same as the software does But why
First think! The steps of my texture coordinates are not always the same.
The correct value of the step will be 1/256 = 0.00390625
But the values of my steps are changing between! Example of steps values:
0.00390621
0.0039062
0.0039063
It seems to be a error of precision, but ...how can I get a value of step of 0.00390621
This is the first think. After my values continue to be incorrect because, the fist value is completely wrong
I have made a table with some coordinates, and the results using hardware, and graphic card. These is what i found!!
Software hardware
x y x y
(0,0) 0.00292969 0.00292969 0 0
(1,0) 0.00683594 0.00292969 0.00682449 0.00294113
(2,0) 0.0107422 0.00292969 0.0107307 0.00294113
(3,0) 0.0146484 0.00292969 0.014637 0.00294113
....
(254,0) 0.995117 0.0029629 0.995106 0.00294113
(255,0) 0 0 0 0 (I have discard it)
(0,1) 0.00292969 0.00683594 0 0
(1,1) 0.00683594 0.00683594 0.00682449 0.00684738
(2,1) 0.0107422 0.00683594 0.0107307 0.00684738
......
(0,2) 0.00292969 0.0107422 0 0
(1,2) 0.00683594 0.0107422 0.00682449 0.107356
The only changes between the two programs are inside the instruction of creating my device who will work with the graphic card hr |= this->direct3DObject->CreateDevice(
D3DADAPTER_DEFAULT,
D3DDEVTYPE_REF,
0,
D3DCREATE_SOFTWARE_VERTEXPROCESSING,
&this->presentParameters,
&this->direct3DDevice );
change to
hr |= this->direct3DObject->CreateDevice(
D3DADAPTER_DEFAULT,
D3DDEVTYPE_HAL, //Hardware!!!!!!!
0,
D3DCREATE_HARDWARE_VERTEXPROCESSING,//Hardware!!!!
&this->presentParameters,
&this->direct3DDevice );
The program runs well if it's me that makes the work of my interpolator and I give him the coordinates... by defining one vertex per pixel, and change the kind of drawprimitive, to triangle to a list of points. But it would be slowler ...! Anyone can give me an idea, of what is going on
Thankss!!
Leif Greenman
I have just visited this page, and enter my numbers to know if there is a float correct representationfor th enumbers i am looking for, and there is!
http://babbage.cs.qc.edu/courses/cs341/IEEE-754.html
So, the problem is somewhere else....still looking for suggestions. 32 float precision is more accurate than the errors I have.
Anyway, thanks, I have been looking some docs about flotating, and i learn a little bit more!
Christopher Lord
Just to answer you...! My grpahic card use 128flotating bits as math calcul precisions. I think it should be enough! I know it could be different, but if you pay 600 pounds for a graphic card, is because it s a good one.
I have make the same program, interpolating myself, but taking the the fractional values and integer values, multypliybg and whatever I wanted, and the results are exactly the same as simulating in software. There is some bug...! I know...! And the first one, is why I get a 0 for the first value when it should be 0.75/256 ).75 because is the translation I give it.
But I don't think is grpahic card fault! I'm using the same precision as software does.
In any case, really thanks! The problem, I'm not happy until thinks works well, and I think is something I have make wrong me..! I really would like, it was just a hardware problem!!
Thank you!
slimbim
I doubt that. I think it will be using four 32bit floating-point values - R,G,B,A. 4x32 = 128. That's the way they express these things. Most cards also only support a subset of the IEEE754 32-bit floating-point number format. Usual differences are:
-Reduced precision. IEEE754 specifies 0.5ULP error, most cards get around 1ULP for add, subtract, multiply.
-Most transcendental functions, as well as reciprocal and divide, are computed at a much lower precision than 1ULP.
-GPUs may have odd rounding modes and not adhere to round-to-nearest-even.
-Some cards never generate NaNs and infs, but instead produce MAX_FLT or -MAX_FLT values, or something moderately random.
-Even if NaN and inf are generated, many operations will not handle them fully correctly, especially in corners of the spec such as whether NaN==NaN.
-The distinction between quiet NaNs and signalling NaNs obviously doesn't happen - there is no way for a GPU to "signal".
-No GPU I know of does anything with denorms except flush them to zero before and after use.
In short, if you are going to use GPUs as general-purpose computing devices, you need to be acutely aware of the limits of floating-point mathematics (even IEEE754-spec is just an approximation), and how each GPU implements its own odd variant of it. The GPGPU papers are full of advice about this and how to deal with the changing precision.
ryanlcs