Different results between REF and HAL devyce

Hi!
When executing my pixel shader, results are ok. But when I execute with my recent new grpahic card Quadro FX 3400/4400 results are rubbish.
Exactly what happens is that!
In my pixel shader I have these four operations

Output.weights.x = (1-frac2.x)*(1-frac2.y);\n" // 1 float output
Output.weights.y = frac2.x*(1-frac2.y); // 1 float output
Output.weights.z = (1-frac2.x)*frac2.y; //1float output
Output.weights.w = frac2.x*frac2.y; // 1float output

Using Devyce_REF results are for example
0.5625
0.1875
0.1875
0.0625
that its exactly what i'm waiting for, but using my grpahic card the results are quite differents :
0.56175613
0.19117355
0.18433762
0.062732697

frac2.x and frac2.y are coming from pixel shader as

coord = In.oTex1 + (float2)transformation/texsize;
coord2[0] = coord*texsize;
frac2 = frac (coord2[0]);

and In.oTex1 is the output of my vertex shader as

Output.oTex1 = mul(vTex1, (float2x2)g_mWorld) + (float2)g_mWorld[3];
were in this case g_mWorld is the identity and g_mWorld[3] are (0.75 0.75)

My vertex coordinates are just a quand with values
(0,0)
(0,1)
(1,0)
(1,1)

The program works really fine with software emolution, but with the graphic card doesnt seems to work ok.

Can anyone give me a idea about whats going on
Thank you,
Jesuss




Answer this question

Different results between REF and HAL devyce

  • Yaniv Feinberg

     yurixd wrote:

    coord = In.oTex1 + (float2)transformation/texsize;
    coord2[0] = coord*texsize;
    frac2 = frac (coord2[0]);



    Seems to me like what you're doing is::

    frac2 = frac(in.oTex1 * texsize + transformation)

    Any particular reason you're dividing and then multiplying That's a sure source of precision loss.


    Another thing you could try is reading "Directly Mapping Texels to Pixels" in the docs. If your texture coords are 0 to 1 as your vertices are, it's likely that you're not getting the interpolated values you're expecting.

    If modifying the texture coords according to that text fixes the problem, let us know, since in that case I'd say that the REF results are wrong.

  • The New Guy

    Hi,
    First, thanks a lot for your answer.

    texsize is 256 so 8 bits.  I dont know really how does it works float instructions. But I’m sure that with floats (32 bits) ... if normal software gives me good results, it should the graphic card gives me the same.

    Cna you explain me how do you know that
    you're only getting 7 or so binary digits of precision after doing the frac().

    The graphic card is a good one, is any place where i can give it a parameter of precision

    I’m doing algorithms for biomedical image applications. The problem is, that given this error, as the Image is 256*256, after when I’m going to addition all the points, I have errors of more than 28 inside my histogram for one of my combination of pixels values, and this is not acceptable in the field where I am working.
    Its for this reason I need to solve it out! IAnd I’m sure it is possible. Because operations similars in the normal pipeline I got perfect results. There is some bug...bad I donmnt have no idea where.
    In any case, thanks, and I’m looking forward any suggestion.

    Jesus


  • F0PS

    Hi!

    The interpolator is not doing the same as the software does But why

    First think! The steps of my texture coordinates are not always the same.

    The correct value of the step will be 1/256 = 0.00390625

    But the values of my steps are changing between! Example of steps values:

    0.00390621

    0.0039062

    0.0039063

    It seems to be a error of precision, but ...how can I get a value of step of 0.00390621

    This is the first think. After my values continue to be incorrect because, the fist value is completely wrong

    I have made a table with some coordinates, and the results using hardware, and graphic card. These is what i found!!

     

                       Software                      hardware

                x               y                x              y

    (0,0)   0.00292969     0.00292969           0              0

    (1,0)   0.00683594     0.00292969      0.00682449      0.00294113

    (2,0)   0.0107422       0.00292969      0.0107307       0.00294113

    (3,0)   0.0146484       0.00292969      0.014637         0.00294113

    ....

     

    (254,0) 0.995117       0.0029629       0.995106        0.00294113

    (255,0)     0                     0                    0                  0  (I have discard it)

     

    (0,1)   0.00292969     0.00683594           0                   0

    (1,1)   0.00683594     0.00683594      0.00682449     0.00684738

    (2,1)   0.0107422      0.00683594       0.0107307       0.00684738

    ......

    (0,2)  0.00292969       0.0107422          0                       0

    (1,2)  0.00683594       0.0107422      0.00682449      0.107356

    The only changes between the two programs are inside the instruction of creating my device who will work with the graphic card   hr |= this->direct3DObject->CreateDevice(

          D3DADAPTER_DEFAULT,

          D3DDEVTYPE_REF,

          0,

          D3DCREATE_SOFTWARE_VERTEXPROCESSING,

          &this->presentParameters,

          &this->direct3DDevice );

    change to

       hr |= this->direct3DObject->CreateDevice(

          D3DADAPTER_DEFAULT,

          D3DDEVTYPE_HAL, //Hardware!!!!!!!

          0,

          D3DCREATE_HARDWARE_VERTEXPROCESSING,//Hardware!!!!

          &this->presentParameters,

          &this->direct3DDevice );

     

     

    The program runs well if it's me that makes the work of my interpolator and I give him the coordinates... by defining one vertex per pixel, and change the kind of drawprimitive, to triangle to a list of points. But it would be slowler ...! Anyone can give me an idea, of what is going on

    Thankss!!



  • CodieMorgan

    Are you kidding This is not a error of precision...I think is just doing something different as software does!
    But anyway thanks!
    If you visit http://www.gpgpu.org/ you will se really pretty medical algorithms and conferences about that! Anyways thanks!
    Jesus

  • sunwei2004

    Just to answer you...! My grpahic card use 128flotating bits as math calcul precisions. I think it should be enough! I know it could be different, but if you pay 600 pounds for a graphic card, is because it s a good one.
    I have make the same program, interpolating myself, but taking the the fractional values and integer values, multypliybg and whatever I wanted, and the results are exactly the same as simulating in software. There is some bug...! I know...! And the first one, is why I get a 0 for the first value when it should be 0.75/256 ).75 because is the translation I give it.

    But I don't think is grpahic card fault! I'm using the same precision as software does.

    In any case, really thanks! The problem, I'm not happy until thinks works well, and I think is something I have make wrong me..! I really would like, it was just a hardware problem!!

    Thank you!



  • BobbyRayudu83

    That won't help you. GPUs do not usually use IEEE754 precision. That's the root of your problem!

  • ekekakos

     yurixd wrote:
    My grpahic card use 128flotating bits as math calcul precisions.

    I doubt that. I think it will be using four 32bit floating-point values - R,G,B,A. 4x32 = 128. That's the way they express these things. Most cards also only support a subset of the IEEE754 32-bit floating-point number format. Usual differences are:

    -Reduced precision. IEEE754 specifies 0.5ULP error, most cards get around 1ULP for add, subtract, multiply.

    -Most transcendental functions, as well as reciprocal and divide, are computed at a much lower precision than 1ULP.

    -GPUs may have odd rounding modes and not adhere to round-to-nearest-even.

    -Some cards never generate NaNs and infs, but instead produce MAX_FLT or -MAX_FLT values, or something moderately random.

    -Even if NaN and inf are generated, many operations will not handle them fully correctly, especially in corners of the spec such as whether NaN==NaN.

    -The distinction between quiet NaNs and signalling NaNs obviously doesn't happen - there is no way for a GPU to "signal".

    -No GPU I know of does anything with denorms except flush them to zero before and after use.

    In short, if you are going to use GPUs as general-purpose computing devices, you need to be acutely aware of the limits of floating-point mathematics (even IEEE754-spec is just an approximation), and how each GPU implements its own odd variant of it. The GPGPU papers are full of advice about this and how to deal with the changing precision.

  • GordonBJ

     yurixd wrote:
    Are you kidding This is not a error of precision...I think is just doing something different as software does!

    Tom is not kidding.  The difference between the REF and HAL device that you're seeing is that the HAL device is using lower precision caclulations than REF device.  This is allowed in Direct3D; you can't expect to get the same numbers on differing kinds of hardware.

    In general, regardless of whether you're using Direct3D or not, you should realize that you can almost never assume floating point numbers are exact.  They should always be seen as approximations.

  • ratster

    I have just visited this page, and enter my numbers to know if there is a float correct representationfor th enumbers i am looking for, and there is!
    http://babbage.cs.qc.edu/courses/cs341/IEEE-754.html
    So, the problem is somewhere else....still looking for suggestions. 32 float precision is more accurate than the errors I have.

    Anyway, thanks, I have been looking some docs about flotating, and i learn a little bit more!



  • Jacques Laurin

     yurixd wrote:
    Using Devyce_REF results are for example
    0.5625
    0.1875
    0.1875
    0.0625
    that its exactly what i'm waiting for, but using my grpahic card the results are quite differents :
    0.56175613
    0.19117355
    0.18433762
    0.062732697

    I don't know if I'd call these results "rubbish", so much as poor approximations.  They are accurate to 7 binary digits after the decimal place.
     yurixd wrote:
    frac2.x and frac2.y are coming from pixel shader as

    coord = In.oTex1 + (float2)transformation/texsize;
    coord2[0] = coord*texsize;
    frac2 = frac (coord2[0]);


    So it looks like you're only getting 7 or so binary digits of precision after doing the frac().  How big is texsize   If it's say 512 then that's where 9 bits of your precision went.  If your card is only using 24-bit floating point math, which has only has 16 bit of precision, then that would explain the results you're seeing.


  • Seidel1

    Using graphics cards to generate medical data is such an outrageously bad idea, I don't know where to even begin. They are extremely imprecise and make approximations all over the place in an effort to increase speed. Use them for rendering pretty pictures, nothing more.
  • Different results between REF and HAL devyce