Sparse Matrix-vector multiplication on GPU using DirectX 9

I want to implement Sparse Matrix-vector multiplication on GPU.

I have a matrix where each component is of the form
struct cell {
    int value;
    int row;
    int col;
}

I store the row and column with the value because I rearrange the matrix into the Compressed Row (CRS) sparse matrix storage format. Can I pass the matrix to the GPU as a texture that stores the above values.

Can I pass the vector as another texture and use the values stored in the matrix texture to determine which values from the vector texture to use to do the multiplication.

How do I use DirectX to populate the matrix texture with the values above.

I desparate for an answer......Tongue Tied


Answer this question

Sparse Matrix-vector multiplication on GPU using DirectX 9

  • SreenivasRao.K

    I am a newbie at DirectX 9.0 programming and I need help implementing  Sparse Matrix-vector multiplication on GPU using DirectX 9.  I was advised to use multitexturing to effect the implementation. The idea is to use three textures. One texture stores a non-zero value from the sparse matrix, another texture stores the vector value that must be used to effect the multiplication and the third texture is used to store the results of the multiplication.

    The application program determines which elements will be used in the multiplication and the values are then stored in textures and passed to the GPU. A pixel shader is then used to extract the values from the textures and to perform the matrix-vector multiplication.

    How can I do this



  • vaidhe

    Thanks Jack:

    thanks for the info i am making some head way in the programming with the help of a nother master's student.

    However tell me this.....I defined a varibale of type float4 in my pixel shader......I want to pass four values from Direct X into the rgba components of the float4 variable.....how can I do this

    Kirk

  • CJDuva

    Interesting problem Smile

    Whether you can accurately pass those values to the GPU depends on what sort of range you're using. From everyone I've seen trying to do any sort of scientific/mathematical modelling on the GPU they seem to have trouble in the way that D3D (being primarily games/real-time) can cut a few corners and introduce precision issues...

    You'd need to look through the D3DFORMAT enumeration to try and find a texture format that contains the storage for what you want. You've got 3 integer values so you should have a fair bit of choice. Bare in mind that you'll need to verify hardware support - such that some exotic formats might not be available on all GPU's. I would personally consider the following:

    D3DFMT_R8G8B8 - Good if those integers can be expressed as unsigned 0-255 values
    D3DFMT_A2B10G10R10 - Same as above, but allowing 10bits (0-1023) for storage
    D3DFMT_A16B16G16R16 - Probably a bit better, still unsigned, but now 0-65535 range

    If you need signed formats, try D3DFMT_Q8W8V8U8 or D3DFMT_Q16W16V16U16.

    As mentioned, you'll need to verify hardware support before you start using them...

    Similar rules apply for storing your vector(s) as a texture... except, if you only have a few vectors you're far better of sending them as shader constants than textures. Depending on what shader model you targetted you'd be able to store a reasonable number of 4 component floating point vectors.

    w.r.t. creating the texture for your matrix... How familiar are you with Direct3D programming

    Basically you would create a texture using IDirect3DDevice9::CreateTexture(), then using the IDirect3DTexture9::LockRect() function you'd gain access to the raw data (subject to the format you picked). You'd then fill in all the data and unlock the texture.

    The next step is to write a pixel shader that operates on this data. I can try to explain that, but it's a big topic Smile

    If you need more help, post back with some more questions - the more details the better ;-)

    hth
    Jack


  • Jobby

    Thansk Tom...I like your idea and will explore it......I will let you know what is the outcome.

  • pretzelfisch

    Okay, so let me get this multiple-textures thing straight...

    Value: D3DFMT_R32F = [ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6 ... ]
    Row: D3DFMT_L16 = [ 1, 1, 2, 2, 3, 3 .. ]
    Col: D3DFMT_L16 = [ 1, 2, 1, 2, 1, 2 ... ]

    Thus defining a 2x3 matrix (or is it 3x2 ... I always get the terminology muddled up!) looking like this:

    [ 0.1, 0.2 ]
    [ 0.3, 0.4 ]
    [ 0.5, 0.6 ]

    Provided that is correct, you would first want to create three textures using IDirect3DDevice9::CreateTexture():

    value_texture has width/height defined by the number of rows and columns. It's format should be a single-channel floating point texture - D3DFMT_R16F or D3DFMT_R32F.

    row_texture has a width equal to the number of rows. It's height should be 1. Format should also be single-channel, integer format - I suggested D3DFMT_L16, but there are some other options (the '8' formats might be better supported, but automatically limit your matrix to being 256x256 in size).

    column_texture has a width equal to the number of columns. it's height should also be 1. There isn't any particular reason why it couldn't be a 'tall' texture rather than a 'wide' one - matter of preference really. The format should be the same as for 'row_texture'.

    The next step is to use the D3DXFillTexture() function to put your actual data in:


    // This function will be used as a callback:
    void FillRowTexture( D3DXVECTOR4 *pOut, const D3DXVECTOR2* pTexCoord, const D3DXVECTOR2* pTexelSize, LPVOID pData )
    {
        // pTexCoord will be a 0.0 to 1.0 value for X, you can ignore the 'Y' component
        // decode 'pData' to be an array of int's and use pTexCoord to look into this array
        // Once you have the value, write it out as the 'X' component of 'pOut'.
    }

    int* data = new int[ 256 ];
    data[0] = ...
    data[1] = ...
    D3DXFillTexture( row_texture, FillRowTexture, (LPVOID*)&data );

     

    Repeat as appropriate for the 3 textures to get the data from your file into the textures.

    When it comes to doing the computation you'll need a 4th texture - the place where the output is stored. Given you're feeding in floating point values, it makes sense to have the output texure at least as high resolution as your input.

    With regards to your question about specifying where the result is sent - you can't do that. Pixel Shader based rendering can do 'gather' operations but can't do 'scatter' operations. It's one of the big limitations when you try and use a GPU for non-graphics type work...

    I'm a bit stuck here because I don't full understand the mathematics behind what you're trying to do Sad

    The 4th, output, texture will probably be square (but again, not a requirement)... such that you'll want to express your function in terms of:

    f( u, v ) = row_texture( u, 0 ) * column_texture( 0, v ) * value_texture( u, v )

    You have 3 resources available - 2x1D arrays and 1x2D array. You need to find a way of addressing them based on whatever dimensions (e.g. 1D, 2D, 3D) your output texture is defined in.

    hth
    Jack



  • eddie_r

    multi-texturing is a common, and quite convenient, method for texture-related 'effects' - sounds like a reasonable route to take to me.

    The texture that you write to will have to be a render-target (D3DUSAGE_RENDERTARGET when creating your texture) and will likely have to be one of a few specific formats (most hardware supports less than 10 of the D3DFORMAT enumeations). The other two textures can be pretty much any format that you want (less of an issue, but still hardware dependent).

    Another thing you need to bare in mind (especially if you're offloading to the GPU to get better performance!) is that locking and modifying textures on a regular basis can be a pretty nasty performance hit. The *best* results come from resources that aren't modified...

    Basically, it's worth giving thought early on to ways that you can reduce the amount of changes requried or the frequency of any changes.

    Are you okay with the creating/modifying textures stage (the application side of things) or do you need someone to go over those parts

    Otherwise, for the pixel shader, sounds like you need to do a couple of fetch operations (tex2D() ops) and make sure you multiply accordingly to get the correct value from a subsequent texture.


    sampler s0 : register( s0 ); // whatever is stored with device->SetTexture( 0, ... );
    sampler s1 : register( s1 ); // whatever is stored wuth device->SetTexture( 1, ... );

    float4 ps( in float2 t : TEXCOORD ) : COLOR
    {
        // Fetch the matrix value:
        float4 matrix_value = tex2D( s0, t );

        // Depending on how you packed values into the 's0' texture:
        // matrix_value.r = non-zero matrix value
        // matrix_value.g = original matrix x coordinate
        // matrix_value.b = original matrix y coordinate
        // Although, worth noting that they'll be 0.0 - 1.0 (unless you're using a FP texture)

        // At this point you need to use one or more of those values to look up your
        // vector texture

        // Once you've done that, you can then multiply through and return the value
        return matrix_value.r * vector_value;
    }

     


    I'm not too familiar with the mathematical concepts of sparse matrix vector multiplication, so I've had to guess a few bits above Smile

    hth
    Jack


  • Matthew Weyer

    I think nobody else has responded because this is an incredibly tricky subject. Most DX people are writing graphics code, and that stuff doesn't care much about a few bits of precision here and there! If you haven't already, you need to check out http://www.gpgpu.org/  There's a whole bunch of pitfalls to using GPUs to do this stuff:

    -Read-only random-access to structures. Writes can be to only one place (the "screen pixel"), So you can do "gathers", but no "scatters".

    -Precision will hurt you. GPUs cut a huge numbers of corners. These include reduced precision (even if the format claims to be 32-bit IEEE), none or almost no support for NaNs, denorms, INFs and other exceptional numbers. Even in normal numbers, consistency will be pathetic. Really basic stuff, like "multiply-add" sometimes being a fused instruction and sometimes not, so you will get variable precision for an operation depending on SURROUNDING code. That even annoys graphics coders (we get bad Z-fighting).

    -No double precision stuff. Don't even think it!

    -Flaky hardware support for all sorts of things. Things will claim to work and then won't.

    -Irritating limits in pixel shaders - number of instructions, etc.

    But, IF (and it's a big if) you can get your algo working within those confines, you can get a decent speed boost over current CPUs.

    To answer your specific question, you can't really mix types in a pixel format - you basically get up to four elements of the same type. Since one of them is guaranteed to be a float32 (the actual value), that pretty much means the rest have to be as well, but of course you can store small integers in them, even if it's a bit of a waste. The other approach is to have multiple textures, one per component of your structure. So you have an integer-format texture that just stores the row, one that stores the column, and one (of a float32 format) that stores the actual value. That's much cheaper in terms of memory use (since you can choose the extact format of texture for the row and column values), but it's more actual "texld" instructions. So there's a tradeoff to be made. I would start with whatever is easiest :-)

    Be aware that unless you are careful, hardware will filter your texture reads. You need to turn texture filterng off AND you need to make sure you adjust for where the center of texels is.

    "I need help to understand how data passes from the application to the vertex processor and from the vertex processor to the pixel processor."

    I would completely ignore the vertex shader for now. It can do useful work, but on current hardware it's nowhere near the speed of the pixel shader, and the link between the two is fairly complex.

    The biggest conceptual hurdle you will probably have to face for more general algos is that you can't just write to anywhere like on a CPU, you have exactly one output, and the program (the pixel shader) can't change where that output is going to go. However, your current algo for sparse matrix multiplies sounds like it takes that into account pretty well.

    Good luck! But I'd be prepared for rather a lot of frustration :-(

  • pcb

    Above post was actually me. No idea quite what happened there... seems to have signed in using my other passport account and still posted despite throwing up a million errors about how I wasn't authorized to do something or another Smile

    Jack



  • dwloeb

     Kirk Barham wrote:
    I defined a varibale of type float4 in my pixel shader......I want to pass four values from Direct X into the rgba components of the float4 variable.....how can I do this

    If it's defined as a single variable in your HLSL I'm pretty sure you'll have to pass all four components over to the shader in one go. I don't think you can make four seperate calls, one for each channel. Maybe using SetFloat() or SetFloatArray() can be abused to do this though - I haven't tried Smile

    If you're using a pixel/vertex shader directly, then during creation you should've acquired a pointer to an ID3DXConstantTable; from which you can use ID3DXConstantTable::SetVector( ... ).

    If you didn't go for the D3DX route you'd have to manually go right down to the actual registers (again, still in a float4 block layout): IDirect3DDevice9::Set***ShaderConstantF()

    If you're using the effects framework, then ID3DXEffect::SetVector( ... ) should do the same job.

    hth
    Jack


  • Fred Hommel

    Thanks for Jack for your feed back.

    And to answer your question..no I am not really familar with creating/modifying textures..especially ensuring that the application puts the values in the right place of the texture.

    I would like some one to go over tose parts with me....thanks in advance.

    Kirk

  • Albert Pascual

    Thank you, Jack for replying to my post.

    First of all I am trying to implement this program for my master's thesis and actually this is my first DirectX program. I have succeded in passing the matrix to the GPU and I can do normal matrix vector multiplication using the GPU...that is trivial.

    My problem is that I am compressing the matrix so that I only work on the  non-zero elements of the matrix. This compression requires that the non-zero elements be compressed by the rows or by the columns. This compression causes the non-zero elements to change columns in the case of compressed rows or change rows in the case of compressed columns therefore using these compressed formats require that you know which vector component to use to do the product and which component to store the result.

    I have ordered a beginners book for to help me understand the concepts behind graphics program and how store values in textures and retrieve them for use in pixel shaders...this is a grey area for me.

    I have the source code for my program which implements normal matrix vector multiplication perhaps you could look at it if you have the time.

    Essentially what I want to do is pass the matrix as a  texture that stores the value of the non-zero element and the its original coordinates in the matrix before it was compressed.

    Then pass the vector also as a texture which has only one column.

    Extract the value and its original coordinates form the matrix texture and use the coordinates to determine which vector component in the vector texture to use to do the multiplication.

    Don't know if it can be done the way I envisage for it seems that with textures more than one 'value' can be stored in a texture component.

    I need help to understand how data passes from the application to the vertex processor and from the vertex processor to the pixel processor.

    I am going to read up on the D3DFORMAT to understand some more about textures......please keep in touch....because you are the only person that has responded to the 'voice in the wilderness'

    Kirk Barham


  • Chumplybum

    As a follow up to the previous posts can the following be done to implement the Sparse Matrix-Vector multiplication on the GPU:
     
    Using a two dimension array to read in the matrix from a text file, each component of the 2D array is a structure which is defined below.
     
    struct cell {
        int value;
        int row;
        int col;
    }
     
    The idea is to use multiple textures. One integer-format texture that just stores the row, one that stores the column, and one (of a float32 format) that stores the actual value. (I got the idea from this forum)
     
    Then try to use the following pixel shader to effect the multiplication:
     
    sampler s0 : register( s0 ); // which is used to store the matrix value
    sampler s1 : register( s1 ); // which is used to store the row
    sampler s2 : register( s2 ); // which is used to store the column
    sampler s3 : register( s3 ); // which is used to store the vector
     
    float4 ps( in float2 t : TEXCOORD ) : COLOR
    {
        // Fetch the matrix values:
        float4 matrix_value = tex2D( s0, t );
        float4 matrix_row = tex2D( s1, t );
        float4 matrix_column = tex2D( s2, t );

        // At this point the matrix_row and/or matrix_column values would be used to look up the vector texture value
     
        // Once that is done then multiply the matrix value by the vector value and return the value
        return matrix_value * vector_value;
    }
     
     
    The problem I am having is, how to pack the values into the different textures so that when they are extracted in the pixel shader the correct value is retrieved. There is also a problem of ensuring that the result is stored in the right place in the output texture. I am currently trying to determine how to do both.
     
    I don't think you can determine where the results are stored, from what I can see the graphics card determines where it is placed.
     
    If there are any suggestions please do not hesitate to let me know what you think.

  • Sparse Matrix-vector multiplication on GPU using DirectX 9