Poison Message Sample in BOL

Hi, 

We've been trying to diagnose a problem with service broker's poison message detection and tried to run the sample script in BOL to see how it solved the problem.  It appears the example doesn't seem to work correctly,

          IF (@messageTypeName =
              '//Adventure-Works.com/AccountsPayable/ExpenseReport')
          BEGIN
            DECLARE @expenseReport NVARCHAR(MAX) ;
            SET @expenseReport = CAST(@messageBody AS NVARCHAR(MAX)) ;
            EXEC AdventureWorks.dbo.AddExpenseReport
              @report = @expenseReport ;
            IF @@ERROR <> 0
             BEGIN
               /* THIS ROLLBACK INCREMENTS THE FAILURE COUNT */
               ROLLBACK TRANSACTION UndoReceive ;
               EXEC TrackMessage @conversationHandle ;
             END ;
            ELSE
             BEGIN
               EXEC AdventureWorks.dbo.ClearMessageTracking
                 @conversationHandle ;
             END ;
           END ;
        ELSE

In the scenario where AddExpenseReport fails it will retry the message the next time round the loop - TrackMessage looks to see how many times the message has failed:

  IF @count > 20
    /* @count WILL NEVER GET ABOVE 5 */
    BEGIN
      EXEC dbo.ClearMessageTracking @conversationHandle ;
      END CONVERSATION @conversationHandle
        WITH ERROR = 500
        DESCRIPTION = 'Unable to process message.' ;
    END ;
  ELSE


But the counter will not get to 20 as after 5 the queue will be disabled which causes ProcessExpenseReport to fail.  Subsequent message cannot then be processed until the queue is re-enabled.

We have changed the sample so that once the message has failed four times it is then written to a dead message table and we can then call commit so that the queue can carry on without being disabled:

[in ProcessExpenseReport]
             ...
            EXEC AdventureWorks.dbo.AddExpenseReport  @report = @expenseReport ;
            IF @@error <> 0
             BEGIN
              
-- The message has failed, we need to work out if we are safe to
               -- retry or not.  If this message has failed four times already
               -- we need to recieve it, save it into another table and commit to
               -- avoid service broker from disabling the queue.  Otherwise we can
               -- rollback and try again to see if it works next time.

               -- Find out how many times this message has failed
               declare @failCount int
               exec GetFailCount @conversationHandle, @failCount output
 
              
-- If we are about to reach the threshold
               if @failCount >= 4
               begin
                 
-- Insert the message into the dead message table.  A sql agent
                  -- job will at some point in the future put the contents of this
                  -- table back onto the queue for reprocessing
                  EXEC DeadMessage @conversationHandle, @expenseReport
               end
               else
               begin
                 
-- We are safe to rollback and take another run at it
                  ROLLBACK TRANSACTION UndoReceive ;
                  EXEC TrackMessage @conversationHandle ;
               end
             END ;
            ELSE
             BEGIN
              
-- Call to SP worked we can clear down our message tracking
               EXEC ClearMessageTracking @conversationHandle ;
             END ;
           END ;
...


/* Procedure to find out how many times this message
   has failed to be processed */

CREATE PROCEDURE GetFailCount
   @conversationHandle uniqueidentifier, 
   @failCount int output
as
begin
   SELECT @failCount = [count] FROM ExpenseServiceFailedMessages
                                WHERE conversation_handle = @conversationHandle
end
go

/* Procedure to insert the message into the Dead message
   table and end the conversation */

create procedure DeadMessage
   @conversation_handle uniqueidentifier,
   @message_body varchar(max)
as
begin
   insert into ExpenseServiceDeadMessages ( message_body ) values ( @message_body )

   EXEC dbo.ClearMessageTracking @conversation_Handle ;
   END CONVERSATION @conversation_Handle
       WITH ERROR = 500
            DESCRIPTION = 'Unable to process message.' ;
  
end
go

/* Updated version of TrackMessage which no longer checks the
   fail count - it just increments each time */
CREATE PROCEDURE TrackMessage
@conversationHandle uniqueidentifier
AS
BEGIN
  IF @conversationHandle IS NULL
    RETURN ;

  DECLARE @count INT ;
  SET @count = NULL ;
  SET @count = (SELECT count FROM dbo.ExpenseServiceFailedMessages
                  WHERE conversation_handle = @conversationHandle) ;

  /* TrackMessage no longer checks the failure count */
  IF @count IS NULL
    BEGIN
      INSERT INTO dbo.ExpenseServiceFailedMessages
        (count, conversation_handle)
        VALUES (1, @conversationHandle) ;
    END ;
  ELSE
    BEGIN
      UPDATE dbo.ExpenseServiceFailedMessages
        SET count=count+1
        WHERE conversation_handle = @conversationHandle ;
    END ;
END ;
GO

/* New table to hold the Dead messages */
CREATE TABLE ExpenseServiceDeadMessages (
  message_body varchar(max)
) ;


Now this solves the problem.  If a message arrives that can't be processed (either becuase the message itself is bad, or because of an system fault with another system that we interact with) the message is moved to the DeadMessage table and the queue carries on and does not get disabled.

In our application however none of this code should be required, as long as we are confident that the data in the messages are "valid".  In our case the only reason the processing would fail is if an external system we are dependant fails for some reason  (e.g.  Maybe writing the message to disk and the disk is full ).  In this scenario we want to retry the message indefinately until the error clears. If we could disable the poision message detection this would solve the problem and remove the need for this solution.

Any thoughts on this   Have we missed the point somewhere....

Cheers,

Neil.



Answer this question

Poison Message Sample in BOL

  • Dinana

    I'd like to add that when a queue is being disabled (e.g. because of a poison message), a BROKER_QUEUE_DISABLED event notification is fired. Here is a small demo to ilustrate this:

    use tempdb;
    go

    create procedure sp_rollback_on_pm
    as
    begin
       set nocount on;
       begin transaction;
       receive * from q_pm;
       rollback;
    end;
    go

    create queue q_pm
       
    with activation (
          status = on,
          max_queue_readers = 1,
          procedure_name = [sp_rollback_on_pm],
          execute as owner);
    go

    create service s_demo_pm on queue q_pm ([DEFAULT]);
    go

    create queue q_en;
    create service s_en on queue q_en ([http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]);
    go

    create event notification ev_qm on queue q_pm for broker_queue_disabled to service 's_en', 'current database';
    go

    declare @dh uniqueidentifier;
    begin dialog conversation @dh
       from service s_demo_pm   
       
    to service 's_demo_pm'
       
    with encryption = off;
    send on conversation @dh;
    send on conversation @dh;
    send on conversation @dh;
    send on conversation @dh;
    send on conversation @dh;
    send on conversation @dh;
    go

    select cast(message_body as xml),* from q_en
    go

    An administrator can receive notifications when queues are being disabled as a result of a poison message and it can inspect the queue and fix the issue or remove the problem dialog. Or an activated procedure on the q_en queue can automatically fix the issue and re-enable the q_pm queue (of course, assuming that the issue can be fixed automatically).

    HTH,
    ~ Remus


     



  • Igor.Kantor

    For the scenario you describe I'd recommend using conversation timers, not rollback the message. Rolling back would make the message visible again immeadetly, triggering another attempt. Instead, using a timer would allow for a delayed retry, like this:

    begin transaction
    Receive 
       if web request message
          store the request data in table
       else if timer message
          read request data from table
       begin conversation timer (e.g. 1 minute)
    commit
    do the web request
    if succeeded
       begin transaction
          reset the conversation timer
          send back response
       commit


    Ultimately, everything in the server is optimized for the commit case. Rollbacks should really be extraordinarily and this is how we approached the problem.

    As a last resort, the BROKER_QUEUE_DISABLED event notification can be used to enable back the queue automatically.

    HTH,
    ~ Remus

  • Ravs Kaur

    Everything you did in the transaction is undone when the transaction is roled back. The message is back in the queueand any other data updates are reversed. The queue is disabled as a result of the rollback, after the rollback occured.

    In the case you describe, since the application is able to detect the problem message, I wouldn't actually call it a poison message. Is more like a message your app understands but is not willing to process. It seems to me that a more appropiate action would be to move the message into a different queue (i.e. forward the message to a secondary service) or end the dialog with an error.
    The poison message was indeed thought more for cases when the application crashes when processing the message and the system spins on this message, w/o making any progress.



  • user11

    I can see Neil's issue with the poison message handling, and the inability to turn it off.  I guess he's got a queue whose processing ultimately depends on some external system (say a web service call or something).  In the scenario where the web service is unavailable, you'd pull a message from the queue and try to call the web service.  If the web service fails (for example, due to network issues), then you want to rollback the receive to enable that message to be retried in the future.  It's not really possible for an application to know whether a network-based call is going to work in advance.

    Moving the message of to another queue merely delays the inevitable - the other queue needs processing, and of course it can fail in the same way.  I guess you could just take messages from the second queue and add them back to the first, but this seems like a poor solution (it also causes the messages to be reordered, which could have its own issues).

    Much as I understand (and applaud) the reasoning behind poison message detection, it does seem that there are scenarios where it would be useful to just turn it off.


  • Benjamin Wulfe

    Thanks for the replies - this has allowed us to progress to a past this problem.

    I do entirely understand why service broker is doing what it's doing, it will quickly enable you to locate any messages that break the system doe to an application bug.  However in the scenario where a message cannot be processed due to a situation beyond your control this features requires you to implement a lot more code then perhaps you should.

    In our scenario, if we cannot process message A which is at the front of the queue, we are not going to be able to process message B either, so there's no point trying.  The ability to turn of poision message detection would be very useful.

    I imagine this has been discussed elsewhere but it also strikes me as odd that I can now write code:

    -- About to start a transaction
    BEGIN TRANSACTION
    RECEIVE * FROM myQueue
    ROLLBACK TRANSACTION
    -- The database is now in a different state from when begin tran was called!!

    I personally have always relied on the premise that EVERYTHING I DO (with SQL) inside a transaction is undone when I call rollback.  This no longer seems to be true

    Cheers,

    Neil

  • kdogg2188

    The main problem with just disabling the poison message check is that when you roll back a message, it's still at the top of the queue so nothing else gets processed until the message succeeds.  You're probably better off commiting the receive and re-sending the meaasge - maybe to some kind of retry queue - so that you don't block other messages on the queue.  Keep in mind that if you have multiple messages on the same dialog you'll have to be careful how you handel this so you don't end up changing the message order.  I've seen a couple examples of people who use conversation timers to implement a timed retry for cases like this when there is a possibility of the remote resource being unavailable for an extended time.
  • Poison Message Sample in BOL