Canvas, Rectangles, ClipToBounds slowness

We are thinking about starting a new moderately complex graphical application in WPF. But before committing to the platform, we are testing features to see if WPF can handle it. >:)

One test application so far has a Canvas with ~800 Rectangles (added with C# using “canvas.Children.Add(..)”. These rectangles are movable by dragging them… implemented with Canvas.SetLeft(rect, newLeft) and Canvas.SetTop(..) from the MouseMove handler.

I need some tips on how to speed this up. Currently dragging a single rectangle around causes 100% utilization with 20 FPS on a Xeon 2.8GHz… even turning down to 50 rectangles seems sluggish. Perforator shows only HW drawing, with only the currently dragged rectangle being invalidated. If I set ClipToBounds on the canvas (which would be nice to have), Perforator shows the entire canvas being invalidated and things _really_ slow down.

Is there a better content container to use for this application Is this approach to handling 800 rectangles not possible with WPF

I like the extra features on the higher level objects (DP, animation, grid layout, etc.) and they would very helpful for development. But is WPF off recompiling itself in the background between frames It seems implausible that it can be _this_ slow. I could write this in C++/GDI and expect literally 10,000 times the speed.

I’m using FebCTP and VS2005 of course… any help would be greatly appreciated!



Answer this question

Canvas, Rectangles, ClipToBounds slowness

  • lordali

    Hello,


    Other folks may know why dragging the rectangle is slow without profiling, but the sure-fire way to find out is to put your app under a profiler. Instructions for using the VSTS profiler can be found on my blog: http://blogs.msdn.com/timothyc/archive/2006/02/28/540291.aspx. If you need any help interpreting the results just let me know.



    Using ClipToBounds is always going to be slow, so designing your app without clipping is ideal. This is because when clipping anti-aliased UI, you must use anti-aliased clips to avoid introducing artifacts. Unfortunately, implementing anti-aliased clipping in a composited scene typically requires an intermediate bitmap. And those are expensive.



    Hope that helps,



    Tim Cahill

    Software Design Engineer

    Windows Presentation Foundation Performance



    This message is provided "As Is" with no warranties, and confers no rights. Opinions and views expressed here are not necessarily those of Microsoft Corporation.

  • sewyew

    I have a problem with single pixel lines that is discussed in thread:

    http://forums.microsoft.com/MSDN/ShowPost.aspx PostID=211255&SiteID=1

    One solution is to scale geometry objects and use a single pixel path to draw them but I get speed problems. To test this I did a test app. Basically:

    1 create 500 geometry entities, simple straight lines.

    2 set their render transform to the same scaletransform

    3 change the scaling for the scaletransform, eg by the mousewheel

    The function is very slow and cannot follow the mouse wheel. It takes some seconds to redraw the screen.

    I sent the test app to Tim Cahill and he kindly profiled it:

    UI Thread: 74.4%

    Layout - Arrange: 30.3%

    Layout - Measure: 8.48%

    Render - 23.78%

    Rendering: 25%

    As CrazyAboutWPF says, I have done drawing apps in VB/GDI and could do similar functionality much faster eg zoom with mousewheel.

    Currently, drawing in WPF is unusable which is very disappointing as I thought that the hardware rendering would enable a lot more functionality.

    I am hoping that Tim may come up with some ideas since, as the speed issue is in background WPF functionality, I can't do much to improve the speed.

    John


  • Giri12

    I would not assume that updates to the geometry result in a simple display list update. That would be a very advanced optimization. I would think it does in fact recreate the required object. WPF is very functional but is never going to perform as well as a hand-tuned D3D application with its own display model.

    When I really think about the fact that a laptop can execute on the order of 1/2 Billion instructions per second I find it ammazing that anything is slow. How can it possibly take that long to update a few pixels on the screen For animations of couse it is doing the work 60 times a second, and so on. On any windows machine there is never just one thing running. I recently found an Oracle process taking 30% of the cpu on a regular basis, but I mostly did not notice. That is presumably not the case here.

    One design desicion in WPF that may be at issue is the use of a separate rendering thread. That forces context switches (which are slow) when any change is sent ot the other thread. I do not know how good the current bits are at batching those changes. If it currently sending each property change you could see quite a bit of overhead. And for sure changing the width is going to send property change events and such. WPF provides so much functionality that locating where the time goes is the issue, that comes back to a profiler.



  • Dooshan

    It is reassuring that you’re having problems too. I understand that this is CTP code and optimizations haven’t been implemented. But I’d rather hear this from Microsofties: “We’re working to make this code 100x faster by RTM (as it needs to be)”, or at least “Vista runs the MIL natively and you’re code will run fast there, the backport is crappy on XP.”

    But the response “profile your code” (the CPU spends 0.01% of the time in mine), and “use Visual objects at a low level” makes me think…. Don’t use WPF, you’re going to have to hack around it to get any performance.

    But if I must use low-level objects without Styles, and do all the work myself… then I might as well go back to owner draw, D3D hacks, and user32/gdi controls !

    Using WPF reminds me of the joke from Joel on Software:

    Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. "That's pretty good!" says his boss, "you're a fast worker!" and pays him a kopeck.

    The next day Shlemiel only gets 150 yards done. "Well, that's not nearly as good as yesterday, but you're still a fast worker. 150 yards is respectable," and pays him a kopeck.

    The next day Shlemiel paints 30 yards of the road. "Only 30!" shouts his boss. "That's unacceptable! On the first day you did ten times that much work! What's going on "

    "I can't help it," says Shlemiel. "Every day I get farther and farther away from the paint can!"

    http://www.joelonsoftware.com/articles/fog0000000319.html

    There must be a few Shlemiel algorithms hidden inside WPF!


  • Yogesh Shah

    This may be the disconnect. I believe that they are not suggesting that your code (CodeDjinn) is "at fault". But, rather that your secenario is poorly implemented/optimized in WPF. I saw nothing suggesting they did not own that it is WPF that is slow. They just want your profile to pinpoint where in THEIR code work is needed. Since they do not know what you are doing they can not get more detailed until they get some measure of what is going on. I imagine that if you offered to send them your application they would be glad to run it on their machines and see if it performs as badly. If so, great they know where they can improve. If not, then there may be something with your setup that is at issue (not your code). In particular WPF performance is VERY dependent on D3D drivers and other issues in the machine's software configuration. I can not speak for them, but if you have a simple test application they have always been willing to accept those in the past.

  • Michael Pang

    CrazyAboutWPF, I couldn’t say for sure why updating is so slow. I should have some time early next week to stick your sample under a profiler & find out though (right after I get done with johnvarney’s profile, since he asked earlier …). I know johnvarney’s sample was causing a new Layout pass to happen. Usually I can turnaround these things faster but my schedule’s been super-crammed this week :-(.



    Between now & then, if you could send me a complete sample that’d speed things up a bit. And if you have extra time, sending me a profile using the steps on my blog would speed it up even more.

    Best regards,



    Tim Cahill

    timothyc@removethisantispamtext.microsoft.com

    Software Design Engineer

    Windows Presentation Foundation Performance

    http://blogs.msdn.com/timothyc

  • Michael Walsh

    Hey all,



    First, let me say I can totally understand the frustration when dealing with these problems. They don’t seem tractable, and it’s hard to understand what’s going on. It’s not like a functionality bug in your code where you can provide a repro with the problem. At least, that’s how most of our team felt when they started working on performance.



    It turns out these things are completely tractable – once you know where the time is being spent. When it comes to perf bugs, the profile *is* the repro. Until a profile exists, you don’t know what part of the system is behaving poorly. For example, I could go and optimize the heck out of the animation system, but if your bottleneck is in layout, or your using animation in a different way then I optimized for, your not going to see much improvement.



    Another example: CodeDjinn pointed out that he’s getting slow performance on a very decent video card, where we’re definitely rendering in Hardware. That says to me that rendering isn’t the bottleneck at all – some other part of the system (layout, databinding, animation, composition, the list goes on) is taking too much time. Assuming the time is spent in WPF, with a profile in hand we can A) start making improvements because we’ll know where the problem is and B) suggest alternative ways to get around the bottlenecks.



    So I wasn’t at all suggesting you profile your code to make your code faster, or in any way passing the buck back onto you. We’ve been working hard optimizing WPF for the many, many of the scenarios we know about. But there’s always going to be more. By getting us your profile, we can add your scenario to that list, and start making improvements that will be meaningful to you.



    Tim Cahill

    Software Design Engineer

    Windows Presentation Foundation Performance



    This message is provided "As Is" with no warranties, and confers no rights. Opinions and views expressed here are not necessarily those of Microsoft Corporation.

  • phankhanhhung

    Remember that a Rectangle is a UIElement so it has event handling, layout, styles etc. If you don't need them and you want better performance you can try visual layer programming - see
    http://windowssdk.msdn.microsoft.com/library/default.asp url=/library/en-us/wpf_conceptual/html/6dec9657-4d8c-4e46-8c54-40fb80008265.asp



  • Troby

    I was not trying to imply that performance would be magically solved with better drivers, just that better drivers will make things well better. Also, there have been several cases of bugs in the drivers causing problems for WPF. In your case that would not be an issue, but just the hardware specs are not enough.

    I agree with you that "hardware accelerated" in particular when the WPF team talks about all the primitives that are "hardware accelerated" implies that those primitives wil be processed at GPU limited speeds. Which is clearly not the case.

    What I would like to see is a performance tool like the OpenGL profiler on OS/X. This has counters for every OpenGL entry point, and tracks % of CPU and GPU time for each. This starts to give a real good idea where time is spent. And, sometimes it is very surprising. Like when you do 20 calls and only the last one takes time because all the other calls just did client side state setup, but you did not know that from the API descriptions (like pointer arrays). Having something like that for WPF is the only way we are going to really be self sufficient in resolving performance issues. Of course in WPF there would need to be about 6 levels of that: rendering thread stats, UI thread rendering, UI thread layout, UI thread data binding, UI thread animation, UI thread event processing. This would all need to be done with cheap counters that can be polled for display with minimal disruption of the application (as in shared memory, not web services).

    While this discussion is about performance the issue of debugging all that functionality is also an issue. Tracking down some types of data binding problems is very difficult. Having in-process debugging related events might do much in this area to expose what is happening behind the curtain.

    Another performance tool to look at that exists for Apple and Sun systems is a whole system trace. In the apple case you can initiate a sample of the entire system over a peariod of time. Every thread, every process, every level including the kernel is sampled and the results can be very informative. As in you only used 10% of the CPU, but initiated 90% of the kernel level logic that does not show up in your process counter. This also shows context switches and the time incurred, and device drivers, everything. It just samples what ever is running in the CPU and provides a symbolic reference so you can see routine A called system service B which called the kernel C. The Sun technology is similar in that it can trace from Java to the OS to the Kernel and back. This is not statistical sampling as in profiling, it is a trace. Every sample collected is displayed and represents a time slice of the system. That holistic view of the computer may show just where those 120,000 cycles are going.



  • Mostafa Hafez

    Why then, still, can't he use those functions   Seeing it is being "hardware" rendered   Why then, does it appear that a CPU can render 2d Vector graphics much better than a graphics card   Is it not true that utilizing the graphics card should have made drawing complex UI's much faster   Instead now, even drawing just a few objects like rectangles and canvasses causes the XAML to slow down literally to a snail's speed

    For example, my computer specs, Pentium 4 3.2Ghz, 3Gb Ram and a NVidia Geforce 6600 GT, well my guess is that if XAML is being hardware rendered, it should not be so slow as to render 1 Canvas, 2 Rectangles, 1 Group box, 2 text boxes and a button at snail speed. (I did a clean install, and installed the WinFX Run-time components and Cider) So what am I missing here


  • ChrisMorley

    I agree about providing a frank and open discussion. The fact that Vista has approximately zero .NET/WinFX apps says a lot... there are performance bottlenecks _by design_ that won't ever be addressed. WPF couldn't cut it for Vista or Office12. The smart guys at MSFT know this... they have tons of internal benchmarks... so this "pillar of Longhorn" was removed. It will of course still be available for us third parties to install. Let us ISV people "Go-Live" with shoddy/slow managed code. We don't have the native C/MIL libraries and headers to do it right like MSFT.

    My question is... how can we produce binaries that utilize the Avalon native engine Will symbols and headers for MilCore.dll be provided at some point Maybe some documentation for dwmapi.lib Throw us developers a bone... if we can't produce cool apps/gadgets available for Vista's release, why would users bother upgrading (oops forgot there’s no WPF in the gadget bar now).

    Below is a sampling profile of rectangles panning like snails on a canvas as I twiddle the mouse wheel... just a regular scrolling operation, no clipping (although doesn't the GPU provide this ), solid fill on rectangles. Sure there is room for optimization… maybe copying the already visible regions… then only drawing the part that is new. But then why bother with this fancy $500 video card It says “1,300,000,000 triangles/sec fill rate” on the box, not “130 triangles/sec fill rate”.

    I’ll gladly send my 3 lines of XAML and 10 lines of C# code to MSFT for "Professional Profiling"... at which time I'll forget I ever spent 2 weeks messing with this stuff and go back to real software development.

    MilCore.dll 54.029%
    WindowsBase.ni.dll 11.949%
    System.Windows.DependencyObject.LookupEntry(int32) 2.699% (you have got to be kidding…)
    mscorwks.dll 8.972%
    PresentationCore.ni.dll 7.265%
    PresentationFramework.ni.dll 3.017%


    My EXE: 0.000%

    Sorry if I seem a bit steamed, but geez… please understand my frustrations, ambitions, deadlines. ;)


  • MichaelX

    I really appreciate the feedback, but I still don't understand something.

    Stating my machine's hardware was just to get a reference point to show that the specs running WPF is decent enough. I understand what you are saying, but how can something else be the bottleneck, when all that is in my code when running the UI is basically WPF controls

    Lets take CrazyAboutWPF's scenario. Lets just have an app that is going to render about 10 Canvasses. No extra code! By just displaying these 10 canvasses at once, I am going to experience very poor performance, so I fail to see that the bottleneck can exist on my side Running normal windows forms, I could probably show 100 Forms wihtout a hitch of performance degradation. The reason why am so confused is that hardware rendering the UI should be bleeding fast! So what you are actually saying is, use WPF, but use as little as possible How are we then, going to build impressive UI's

    Thanks,

    Jaco


  • chribonn

    Even an update of some canvas elements is a head scratcher for me:

    foreach (UIElement e in canvas.Children)
    if (e is Rectangle)
    ((Rectangle)e).Width += 3;


    On a canvas of 100 rectangles. Theoretically the graphics card receives coordinate updates on a display list, visual tree, or whatever (not full reconstruction of the scene). The rectangles are then colored with pixel shaders... things are drawn with HW... right

    Why does this operation take 4-6 milliseconds Does it really take 120,000 cycles per rectangle to change a couple bytes in memory and propagate those changes to the underlying hardware

    I must be fundamentally missing something. I’ve been imagining WPF with this wonderful display surface that I can animate and freely toss around graphics and do super-cool stuff. What I’m seeing is performance on par (or possibly worse) than a Java VM (*gasp*).


  • OAF-NOR

    I agree with Michael here. In my own dealings with Tim, he has been very clear that the most usefull comment you can make is something along the lines of "I am trying to do X, and in my application, aspect Y is slow as evidenced by profiling function F on line LLL of file ABC.cs in the code attached". On the other hand, I can almost guarantee that CodeDjinn's code will be slow on any machine it runs on, and that the problem is deep down in WPF's rendering architecture, which, by my estimation, is a long way from being hardware aceelerated.

    WPF does need to be improved 100% , nay 1000% or 10000% - it is way too slow right now, and the promise of hardware acceleration is completely misleading, considering how much of the load is obviously being carried by the CPU.

    Frankly, I think that the WPF MSFTers should keep quiet about hardware acceleration until they can demonstrate the kind of hardware accelerated performance we have all seen in games. It makes no sense to say, oh yeah this application is hardware accelerated, when 99% of the works is being done by the CPU. Hardware accelerated means to me being limited only by the power of the embarrasingly parralel graphics hardware available.

    As for the line "WPF is highly dependent on D3D drivers", well thats a crock of *** too. Im running a latest gen 7900GTX512 graphics card, with the latest drivers, and WPF is slow for me. Are you saying that there's a graphics card and driver combo more suited to WPF than that one Please, do tell. Im not seeing too much of a difference in performance compared to the Radeon 9700Pro I was using earlier.

    We need a better tool than Perforator - we need a tool that can tell us exactly whats going on under the hood, as far as rendering goes. Perforator is telling me that all my rendering is happening in hardware, and yet Im seeing 100% CPU usage. I need perforator to tell me much much more, i.e. that clipping routine XXX is being called so many times per frame, or that text rendering routine YYY is being called so many times per frame, or that vector drawing ZZZ is being converted into a bitmap so often.  Im also being told that 75 hardware IRTs are being used; well give me some idea what they are - their sizes, or maybe even pictures of them - what is being rendered into them. To get the most out of WPF, we need to know whats going on under the hood. I have yet to see any comments in this forum, in which MSFT people look at an application and come back with information about what is going on under the hood that is making applications slow, nor have I seen any workarounds offered, except for minimalia such as using one property accessor form over another.

    As one poster mentioned, perhaps a frank and open dialogue about is going on, what we can see improving in the future, and what we will never see, is in order. As that poster suggested, maybe we will never see WPF under XP perform anything like at hardware accelerated speeds. Maybe we will only see that kind of performance using DirectX 10 hardware under the LDDM on Vista. It wouldnt begar the imagination to know that WPF on XP is simply considered a sandbox for develepoers to cut their teeth on until Vista comes out.


  • Canvas, Rectangles, ClipToBounds slowness