I am experiencing errors in the form of a locked application when I have a device (with two backbuffers) in one thread which i use for rendering while another thread attempts to lock and read the other backbuffer. I get no error. The application just freezes on the call to getBackBuffer.
I attempt to lock buffer 1 while rendering into buffer 0.
How is the graphics device working in relation to threads Can you work on a buffer while rendering I was hoping to render in a thread while reading previously rendered data in another thread.

Thread safety and graphics device
Ram Pamulapati
Jack, the GPU itself is serial but memory operations with Direct3D objects can be done parallel. I know at least one game that uses a second thread to update a dynamic sky texture.
Thomas, I don’t see any problem beside the typical multithreading problems why doing the memory transfers at another thread should not work. But you have to make sure by your own that anything is in sync. In your case I would use multiple (at least 3) textures, render targets and query objects to make sure that you work that every part is working on another frame.
I hope you have PCIe hardware because it has a better read back speed as most AGP hardware. Do you want to run this program on a multicore/multicpu system I am ask because splitting work to more threads than you have cores does not necessary make it faster. In the case of a dual core system I would use only two threads.
If you use managed DirectX it could be necessary to deactivate the multithread protection (ForceNoMultiThreadedFlag in the PresentParameters object) of the device to get full speed. I know this sound strange but as you need to run your own sync anyway the locks inside the device would not help you.
It should not be that complicate to write a short test program as prove of concept.
Daksh Khatter
I can see where you are going.. and I don't like it.
It makes perfect sence though. If the reading of data from a backbuffer is not a real DMA, which it is not - at least not the way I am doing it - then the GPU has to be involved in the transfer, and it will only slow the actual rendering down.
Ok, my situation is this: for each frame i need to write a lot of pixels into a texture, render a scene and copy the backbuffer out into systemmemory to do other things with it. I was planning to have a thread writing the texture data while another was rendering and a third was reading data back. If the data transfere was through DMA, this could make sence (with proper synchronization and multiple buffers), but as it looks now, I can not do it any faster than the time to write texture plus the time to render plus the time to read back There is no way my speed can benefit from some form of full or partial overlaying of the tasks
I can still make it work. Just not as fast as I was hoping. If you have any suggestions as to how to speed this up, I would be happy to hear about it. If it is a given that this can not be sped up, then I would like to hear that as well.
Peter Tübben
I believe you have created a deadlock because the back buffers are chained together.
You should try to copy (StretchRectangle) the back buffer to an additional render target (CreateRenderTarget) and then read from this target. Maybe you will have to use a Query object to check if the GPU is already done with the copy before you lock it for reading.
Ramana G.V
To read form the graphics Ram you need some support from the GPU but only from the memory controller. A memory controller can do more than one thing at the same time. Details will depend on the GPU and driver you use.
All commands you send to the GPU will be stored in a command buffer. This is a kind of ring buffer. You can add commands for up to 3 frames ahead. Because of this the card can still render your commands for one frame with already commands for the next 3 frames in the buffer.
hemant Kanchan
Yes, I realize that if the copy operation demands participation of a cpu and you have two, then multithread can be faster, if you have more than a single cpu. That is the reason I started doing it this way after all.
The question is if the gpu needs to be involved in the copy. There is just one gpu after all. How is data read to and from the graahics cards memory in paralel without dma and with just one gpu
By "send the commands for frame 4 even if the card still works on frame 0send the commands for frame 4 even if the card still works on frame 0" you are simply talking about rendering into frame 4 while still displaying frame 0 If not, then what kind of work is it that the gpu is still performing on frame 4, while you render into frame4
koiravahti
Ok, so the reading and writing it though half dma it seems. I will try out a few setups and see which perform best.
How do you tell the card which buffer a given command is for I usualy do clear, begin/end scene and present. Anything in between being and end is then rendered to the backbuffer. How do I tell it to render into another of the backbuffers
christinamarygeprge
Yes “forceNoMultiThreadedFlag“ is false by default and this mean that the multithread protection is active. This is one of the differences between managed and Unmanaged DirectX.
If something is not done via DMA you will need the CPU to do the copy. If you move the copy operation to a second CPU you will have more cpu power left on the other.
More textures and render targets are better because Direct3D already contains a buffer system. Up to 3 frames can be stored in a command buffer. This mean you can already send the commands for frame 4 even if the card still works on frame 0. This is called “Prerender”. If you use only two targets you cannot make optimal use of this prerender buffer. The reason for this is that anytime you look a Direct3D object that is still used from a instruction in the buffer you have to wait until the card have execute this instruction. Using more buffer make sure that you GPU have enough time to do the work.
PaloMisik
Yes, I have two cpus but no pcie.
The reason for three , or actually four threads, was that I was unsure about the speed of each, so I would let the system decide if perhaps three threads on one cpu and one on the other was better than some other combination. Besides that possibly more optimal division, more than two threads naturally doesnt help anything with only two processors.
The forceNoMultiThreadedFlag is by default false, as far as I can see
You say that memory access can be done in paralel. By that you implie that it is by dma If not, then how will it improve speed Can the gpu service two reading threads with double the throughput as a single thread
Why would you want three textures and rendertargets It seems to me that two is enough. The card renders into one buffer and someone reads from the other buffer at the same time. Same goes for the textures. The read will then have to wait for the render to complete and swap buffers before starting a new read, but I do only want the taks to run in paralel and not a buffering system.
flodis79
To set another render target you have to use the SetRenderTarget Method. Every draw call is always done on that render target that is set at the moment you call the draw method.
K42
How exactly are they chained I can not call Present while having a locked buffer, or can I not even render in a buffer while another is locked
If I create another rendertarget and perform the copy and later read from it, will this be as quick as a single read, or will I end up doing double copy
SteveHagget
Direct3D doesn't typically like multithreaded access. You can force it to take CritSec's using the D3DCREATE_MULTITHREADED flag in your CreateDevice() call, but performance can suffer.
From a purely conceptual point of view - the GPU is itself a serial device, especially the communication to/from it. Yes, you get massively parrallel transform/shading - but thats more of an internal optimization/method rather than something that is inherantly exposed as a feature of the hardware. With this in mind, it does make it a little pointless to try and make accessing the D3D device from multiple threads a little pointless. Even if you can do it safely, chances are you're not actually going to get any performance benefit from it - more likely the opposite (due to locking/waiting etc..).
Common convention is to have a single thread deal with the actual API calls - you can have many threads that manipulate data or make calls on this central thread, but it is only that one single thread that makes any API calls.
hth
Jack