Is it possible to have a texture with more than 4 values per texel
My problem is, I need to look up a chunk of many values (about 16) multiple times per pixel and I like to do that quickly. In my setting more than three 2d texture lookups each time (= 12 float values) is the maximum I can allow without a significant performance drop.
Also if anybody has experimented with 64 or 128 bit textures I'm interested to know about the performance of a lookup compared to a 32 bit tex lookup. Maybe it is reasonable to pack more values into those larger formats If that is possible at all.
Nico

retrieving more than 4 values per texture lookup
Ayan Debnath
The size of the caches is a big secret from the IHVs but you should not expect more than a few Kbytes. If I run bandwidth tests I am always force all textures down to 1 pixel. I use something like an additional Direct3D Debug DLL for this job because this save me from add such options direct in the engine code. The performance tools from nVidia contain similar options.
uoknor
Thanks. I corrected the technique and pass parameters and now I get some result. The shader is compiled to asm code, but after that it says: ** Error compiling shader **
So is it that at first the shader is compiled to assembler and then compiled a second time optimized for the specific graphics card Seems like that.
Peter Theill
I just downloaded shaderperf. Does it work on my fx file without knowing about its actual input from the cpu
I executed it, but so far it simply creates an output file with
"Running performance on file ... " in it.
Ibrahim Y
As I said it looks like that you can get only 32 bit (overall) per clock from the texture unit. Using textures with more bits per component will slow down the transfer.
With testing I have mean something different like using a smaller texture that fit in the Cache to make sure you are not reach a bandwidth limit. As you are using a Geforce card you should try the shaderperf tool with your different shaders.
Texas2Wheeler
The best way to run performance tests is always to use a real world scenario and only change one part. There are some good papers about this on the nVidia developer’s page.
Dmiti
HSLS shaders are always compiled to ASM shader as the core Direct3D system only supports asm shaders. There are two versions of asm shaders. The first is human readable the second use binary tokens. Direct3D use the binary form. But the GPUs can not execute these shaders direct. The driver needs to translate them in an internal code. During this translation it will try to optimize it for faster execution. ShaderPerf contains the same translator and optimizer as the driver. Because of this it can say you how much clocks a shader program needs to execute.
I don’t know why it does not work maybe there is still a parameter missing.
tomG
Shaunkerr
Thanks for the tips! Yes, you seem to point in the right direction. I just noticed that the fourth texture I mentioned is much bigger than the others, so the bigger frame drop is obvious... ;-)
So, how small should the texture be And how big is the texture cache
Neil Kiser
>Have you already run some test to check if your performance drops are really based
> on the tex* instructions
Yes I did. When I add one tex2d lookup to my for-loop that is executed up to 20 or even 30 times per pixel, I get only about 4 frames less. The same linear performance drop when I add another two look ups. But interestingly with the fourth lookup I add, I get a drop in frame rate of about ten. (Geforce 7800 GTX).
Do you know about 64 or 128 bit textures Performance Is it possible to pack/unpack multiple values per texel component
Thanks.
Nico
kts
>ShaderPerf does only calculate how many cycles the shader needs if you are not >bandwidth and/or texture filter limited.
As I wrote I had no result so far... But I will try again.
I just tried the tex lookups with reduced texture sizes, there is no difference. And I have to correct what I wrote before: when I add the first of the four lookups, the frame rate drops about 12 frames when I add another lookup there is just adrop of about 4 frames. (I haven't re-tried adding a third and fourth lookup).
Anyway, thanks for your help so far, Ralf! :)
GMan6
It works when I compile for the 7800 GT chip, but it doesn't when I compile for the 7800 GTX. Anyway, I don't really now how to evaluate the output:
Target: GeForce 7800 GT (G70) :: Unified Compiler: v81.95
Cycles: 386.50 :: R Regs Used: 6 :: R Regs Max Index (0 based): 5
Pixel throughput (assuming 1 cycle texture lookup) 24.87 MP/s
386.50 cycles seems a lot, hm Yeah, its a long shader... ;-)
ChampAmp
No, you can’t get more than 4 components from a texture with one tex* instruction.
From my tests it looks like that current chips can only transfer up to 32 bit per clock between texture unit and the ALU.
Have you already run some test to check if your performance drops are really based on the tex* instructions
Omar Khorshid
Yes you can use fx files but in this case you have to tell it which shaders should be analyzed. You can find the details about the parameters in the “Readme.txt”.
ShaderPerf does only calculate how many cycles the shader needs if you are not bandwidth and/or texture filter limited.
iLo1630
Maybe your additional lookups increase the numbers of temp registers that your shader need. All GeForce chips are very sensitive about this. Using more temps can give you huge drops. This is something that can be tested with ShaderPerf.
If you have trouble with your fx file you can try to save your shader in an own file. Do you use an Asm or HLSL shader In the case of HLSL you have to define the profile and the function it should use.