Early Out

Hi.
I tried at the "regular" DirectX forums but since there was no answers or even faint ideas I am gonna post my question here.
I have a fairly simple shader which loops over a numer of calculations.
With ps_3_0 can we exit an loop and the fragment program early on
if a certain condition is obtained 
for( ... ){
  if( b > constant )
   return Output;
  } 
}
return Output;


Answer this question

Early Out

  • XZH

    To clarify, the "% 50" part explicitly specifies to the compiler that the dynamicCount variable will never be more than 50, so it needn't worry about infinite instruction count. % is the modulus operator, and returns the remainder of the division of the left value and the right value.


  • Li-Jen

    On the other hand, while HLSL's For is on target, the While and Do loops have no direct parallel in asm, which only has rep (perform n iterations) and loop (interate i=start to start+n). The lack of other looping functions in asm means that there is no way to generate a loop that may be infinite, as there will always a fixed number of iterations that can occur if nothing breaks out of the loop before the repeat count is reached.

    I would be curious to see how HLSL is compiling those two loop types, though I don't have time to test that just now. It would seem to me that it must be wrapping one of the break* statement with a rep or loop block to limit the maximum number of iterations. The difference between do and while would be whether the break* tests came at the beginning or end of the loop.

    Robert Dunlop
    Microsoft DirectX MVP
    www.directxzone.org


  • NeeTrihT

    I haven't worked with dynamic flow control much myself, but from what I've read, sampling operations within dynamic flow control blocks if they use texture coordinates calculated within the shader that may cause varying code paths between adjacent pixels in a primitive. The compiler will either try to unroll the loop, or if such a solution can't be found it will fail validation. Take a look towards the bottom of this page:

    http://msdn.microsoft.com/library/default.asp url=/library/en-us/directx9_c/dx9_graphics_reference_asm_ps_instructions_flow_control.asp

    It sounds like you may be able to use the partial derivative version of Tex3D within a dynamic flow control block if you specify derivatives calculated outside of the loop.

    Note that the For loop is intended for executing a static number of times, with starting and ending values defined by constants, and the index variable being read only. HLSL may be able to compile code that varies from this, I'm not sure on that, but used with constant inputs it lines up directly with the loop instructrion in PS_3. As for how an early out would be implemented, although HLSL doesn't advertise any break functions, PS_3 assembly code does include several forms of break instruction including break on comparison. In assembler the loop might look something like this:

    ; 32 iterations
    def i1,32,0,1,0
    def c0,0.9,0,0,0
    loop aL, i1 ; or if aL not needed, just "rep i1"
    ; do some operations that result in vResult in register r1
    break_gt r1.a,c0.r
    endloop
    mov oC0,r1

    Assuming that this is the optimjal form in assembler, one might be able to figure out what is likely to be freindly to HLSL and let it generate an optimim solution (assuming there are no restricted instructions in the loop that would cause it to unroll). You could try compiling different forms with the command line fx compiler and set it to output an assembly listing, see what it is generating in different cases.

    Robert Dunlop
    Microsoft DirectX MVP
    www.directxzone.com


  • leumas111

    I just tried to implement Etienne's and Stephan Mantler's suggestions but they both don't compile in my context. I'm using a tex3D inside the loop, which may cause the failure to compile (See Robert's comment). Or maybe the reason is what Robert pointed out in here:

    "Note that the For loop is intended for executing a static number of times, with starting and ending values defined by constants, and the index variable being read only. HLSL may be able to compile code that varies from this, I'm not sure on that, but used with constant inputs it lines up directly with the loop instructrion in PS_3."

    But if there is a way of breaking the loop in assembler, there should be a way of doing in HLSL as well, hm But I can't see it right now. Does anybody know if (and how) the assembler statement "break_gt" in Roberts post (or similar break statements) translate to HLSL


  • GreatLaker

    Hi shade,

    no question your question is universal... ;-) Just recognized in your code

    that you are doing something similar then I do.

    >>Perhaps you could post a compileable HLSL loop using early out

     

    I also haven't found a way to have an early loop termination,

    but an early discard of the pixel can be achieved for example like this:

    float a = 10;

    for ( int i = 0; i < 50; i++ )

    {

    a -= 1.0;

    clip(a);  

    }

    Yeah, it's a nonsense loop, but I guess you understand what I mean This loop terminates early after 10 iterations.

    About the modulo: I remember my compiler sometimes complained when I used a dynamic loop count like:

    for ( int i = 0; i < dynamicCount; i++ )

    But what worked was:

    dynamicCount = dynamicCount % 50;

    for ( int i = 0; i < dynamicCount; i++ )

     

     Btw: Did you manually format your code snippets so nicely ;-)


  • zymore

    Yes, we can.


  • LosDude

    I have succesfully tried the below shader, the solution was pointed out by Mike Houston at gpgpu.org, it seems like due to branching differently in nearby fragments/pixels the dx/dy rule is not garantied for lod therefor we have to use tex3lod with a lod value of zero. However with early experimentation there doesnt seem to be any performance gain.

    bool bNonTerminated= true;
    //! traversal along view ray
    for(int i=0; i<iStepsViewRay && bNonTerminated; i++)
    {
       vSample.rgba = tex3Dlod( sVolume, float4(P,0.0f) ).r;   
       vResult.rgba = max(vSample.rgba, vResult.rgba); //!shUnder
       if(vResult.a > 0.95f){
          bNonTerminated= false;
       }
       //! increment index
       P += fSamplingDistance*vRayDirection;
    } //! end outer loop   


  • A.Hadi

    An alternative that I've been using is to add a second condition to the loop:

    bool break = false;
    int i;
    for(i=0; i<loopcount && !break; i++)
    {
    if(condition)
    {
    /* ... */
    break = true;
    }
    }

    Which preserves the loop variable (sometimes desirable) and also works quite well for breaking all the way out of nested loops. One thing to be careful about is that in this case i is actually incremented once more before the loop is terminated - which must be taken into account when using it for further calculations (or avoided by moving the increment to a 'safe' place).

  • dwj

    Just an idea...if you want an early out you can just put i to a huge value

    The loop will end at the next iteration...

    for(int i=0; i<iStepsViewRay; i++)
    {
    if(vResult.a>0.9f)
    break;
    }
    return Output;

    This should do :

    for(int i=0; i<iStepsViewRay; i++)
    {
    if(vResult.a>0.9f)
    i=
    iStepsViewRay;
    }
    return Output;

    Sure it's not pretty :-) But it will definetly do the job...


  • Tony Toews

    The compiler tells me that it doesnt support asymetric returns.

    For reference I will post a little more of the shader which causes the compiler to complain:

    for(int i=0; (i<iStepsViewRay); i++){
       //! fetch sample
       G.rgb = tex3D(sGradient, P).rgb;
       vSample.rgba = tex3D(sVolume, P).rgba;
       /*! Perform computations

       */
       vResult.rgba = foo( vSample.rgba, vResult.rgba );
       Output.vColor.rgba = vResult.rgba; 
       if(vResult.a>0.9f)
          return Output;
    }
    return Output;

    Another idea I had was to use the for loop in another way as seen below

    for(int i=0; (i<iStepsViewRay) && (vResult.a < 0.9f); i++)
    {

    }
    return Output;

    Then the compiler complains that the loop doesnt look like its gonna terminate in a timely manner, the 1024 iterations cap which should play into effect since it doesnt reach that cap with the regular loop expression in the first shader. 

    Then I thought well lets try and use break a standard keyword in any c-like language... The shader
    is provided below. However break doesnt seem to be known by the compiler.

    for(int i=0; i<iStepsViewRay; i++)
    {
       if(vResult.a>0.9f)
          break;
    }
    return Output;

    So I cant really see how to perform an early-out, perhaps theres something I have misunderstood or dont use correctly, then please correct me. Because most of the above shaders are actually valid under NVIDIA Cg (also older versions).

    As a side note I am working with the Feb 2006 SDK and on a NV40 GPU

     

     

     


  • Amir_E

    I am doing volume ray casting, kind of like ray tracing but no-surfaces... Still the question on loops and early out is universial.

    Perhaps you could post a compileable HLSL loop using early out

    Previously and currently I am using early-z as an intermediate step to perform ray termination but would like to be able to investigate early outs.

    Perhaps some of the MVP people could comment I am very fuzzed about HLSL should support asymetric returns as the MVP claimed.



  • Boompty

    Hi!

    You are doing pixel shader ray tracing, hm I'm busy in that field for quite a while now and had lots of similar troubles. My conclusion so far is that graphics hardware is not really efficient in doing dynamic loops, right now. I am also using nvidia, maybe I should try experimenting with an ati card which are said to be more efficient in branching. Anyway, I ended up using only fixed loop counts in combination with the Clip() function, which discards a pixel entirely if its argument is < 0.

    What you also can do is to constrain dynamic loop counts by using % to not make your compiler complain.

    Hope this helps a little.

    Nico


  • Early Out