Heeeeellllllllppppppppp!!!! Replication Latency in Dist Agents

Hi,

I have Transactional Replication running and Pushed to 2 subscribers.  I am noticing that, sporadically, the latency shown on the Distribution Agnets is extreme.  This means it's taking a very long time for transactions to get to the subscribers... which in turn means that my 'High Availability' scenario is down the tubes.... which is a disaster.

Usually I see latency numbers under 5000... and they are reset periodically.  Today, for example, the numbers just keep piling up and exceed 1000000 already. 

I'm not a DBA... but I am  the guy that has to figure this out.  What could be causing this   How do I approach this

Thanks,

Tom


Answer this question

Heeeeellllllllppppppppp!!!! Replication Latency in Dist Agents

  • SVerhalle

    Hi Greg,

    Thanks for the response. 

    On the subject of Topology...

    Publisher and Distributor reside locally on the Transaction server.  The subscriptions are 'pushed' to 2 backup machines.  I've been running this config for a few years now and have only recently observed the conditions taht I reported in the original post.

    I Replicate the Action of stored procedures rather than just moving the data.  (Although I have thought of trying this... and tested same...  as an option since we don't update large quantities of data... we have lots of single insert/update activity for the most part)

    In any case, I did read some of the relevant parts of the white paper.  It's very general, as is usually the case. 

    The Log Reader never has a problem with latency.. it's always the distributor agents (for those few occasions where I have seen this behavior)   Today, for example, my latency is well under 5000 milliseconds on the distributor agents and I see that we're current on delivery.

    As far as hardware, there's not a lot I can tell you.  Not my bailywick.

    Thanks again.  If you have any suggestions, please feel free to pass them on.

    Regards,

    Tom

  • Fredrik Bergström

    Greg,

    What you're suggesting makes sense.  Monitoring the subscribers is probably the way to go. 

    I wonder if pulling the subscriptions will make a difference....

    Thanks again.

    Tom

  • wbartussek

    This could be for a number of reasons, but involves some questions to be answered first.  Just to cover the basics, I assume the distribution agent and logreader agent are running continuously   Which agent is performing slower, the logreader agent or distribution agent   Is the distributor a remote machine, or is it local to the publisher How about the subscriber   maybe you can explain your topology to us.

    Are there users hitting the subscribers   Could there be more indexes on the subscriber-side tables that are slowing down the distribution agent commands   What about triggers   What's the disk subsystem look like   Any filters anywhere  

    Can you take a look at this whitepaper first, see if any of it applies.

    http://msdn.microsoft.com/library/default.asp url=/library/en-us/dnsql2k/html/sql2k_replperf_tran4.asp

     

     


  • Josh Zana

    If you're just now seeing a slowdown at the distributor agent side, then usually something's changed at the subscribers, or at the distribution agent.  Since you replicate a stored proc execution, have those procs changed at the subscriber   Has the indexes changed  

    If it's periodic latency, then maybe workload is increasing, or there's some background task that's running at the distribution agent

    It sounds like the high latency is sporadic   If that's the case, the best thing to do is monitor the distributor/subscriber machines at their quiet times (cpu, memory usage, disk io, agent cmds/sec) and at their high-latency times.  From there, you'll have to dig in.


  • Stahl

    If cpu/memory at the publisher/distributor machine is the bottleneck, then yes, making them pull agents might help alleviate some of the problem (provide the agents are consuming enough cpu/memory).  A recommended/standard solution would be to make the distributor remote from the publisher, but that's not always feasible.  As mentioned, you'll need to do some investigation work to see exactly where the bottleneck is occuring (9 times out of 10 it's at the subscriber), and then take action.

     


  • Heeeeellllllllppppppppp!!!! Replication Latency in Dist Agents