Feb 26, 2011

Troubleshooting asynchronous communication with WCF, MSMQ and IIS (WAS)


The Requirement
This post is dedicated to all of those who have attempted to work with this amazing technology combination for asynchronous communication, and have found out is not trivial to make it work consistently. However once all the components are understood and harnessed, you’ll realize the beauty and power of these “guys” working together.
My main goal is not to provide a detailed, step-by-step, example of how to set this configuration up, but to layout a comprehensive checklist of all the things you should look at if your setup is NOT working. I’ve gathered these items over time and, although pretty close, I haven’t found an article that discuss all of them in one place.
NOTE: If you need to learn about asynchronous communication with WCF and MSMQ from the ground up, I recommend Tom Hollanders amazing post, he presents this subject more eloquently than I ever could:http://blogs.msdn.com/b/tomholl/archive/2008/07/12/msmq-wcf-and-iis-getting-them-to-play-nice-part-1.aspx
A quick background
The idea here is to combine the flexibility of the WCF programming model with the power of MSMQ to achieve asynchronous communication, all harnessed from IIS’s robust infrastructure. However, there are many components involved, and, if you’ve dealt with this scenario before, you’ll realize there are many little details you have to keep an eye on in order to get this working smoothly.
When everything works correctly, you’ll rely on WCF to get a message from point A to point B, MSMQ will be used behind the scenes to persist your messages if point B is unavailable. There are several ways in which we can configure the behavior when the destination is unreachable, but the main point is that the infrastructure should have the capability to forward and/or retry the messages on its own, all the time.
When things are not working you’ll start asking some of the following questions:
·         Why are my messages not even being written to the queue?
·         Why are my messages not being picked from the queue at all?
·         Why my messages stop being picked up from the queue after a certain amount of time?
The checklist
I’ll be discussing private (transactional) queues in this post only, public queues only differ in that they’re published in Active Directory. Also, I assume all the queues rely on the same box; the principles are the same, except you’ll have to configure MSDTC in scenarios where the queues are setup remotely.
Use this checklist to quickly review your setup in the event of failure or unstable behavior.
MSMQ related:
1.       Check the queue you’re attempting to send messages to for the right permissions. This normally involves granting privileges to the account the sender uses. However, if you’re sending messages from a remote machine, you’ll also need to add the ANONYMOUS USER account.
2.       The Message Queueing windows service is on, this can be left running under the default account NETWORK SERVICE
3.       The .NET Msmq Listener Adapter windows service is started AND the account is running with has permissions to receive (read) from the queue.
4.       Run: sc sidtype NetMsmqActivator unrestricted on a command prompt. This is to allow the previous service to run with an unrestricted token, in workgroup mode.
5.       In Server Manager, right click on Features -> Message Queueing and select Properties. Go to the Server Security tab and make sure the 2 last options (“Disable…”) are checked.
IIS related:
6.       In IIS, the Web Site where your service is has the net.msmq binding declared. The Binding Information value should be the server name where the queue is located.
7.       Also, make sure the Web Site/Virtual Application where your service is has the net.msmq binding enabled. You can have other protocols if your services require them.
8.       Check the identity your service is running under has the right permissions on the queue. Depending on the authentication mechanism/mode you’re using, most likely check your service’s Application Pool identity.
Application related:
9.       Check the queue name, it has to match your service net.msmq endpoint’s name. For example:
10.   If you’ll be requiring transactional support, check DTC is configured and running properly. I recommend reading Hollanders post (part 3) for a detailed explanation on how to do this.
The big catch: Faulted State
Prepare your application so it can recover itself in the unlikely event the WCF/MSMQ channel goes into faulted state.
After going through the above steps multiple times and still finding that my messages were getting stuck in the queue after a few minutes, was driving me nuts. Yes, I could see some exceptions thrown when the Net Msmq Listener Adapter service attempted to deliver the message, but I figure there shouldn’t be a problem since the message would be placed in the Retry subqueue and life would go on… well, not always.
In some cases, such as a problem while receiving a message from a queue, the channel between the queue and the Net Msmq Listener Adapter will go into Faulted state, once this happens you can only re-establish the connection by hitting the .svc http endpoint manually or by restarting the Adapter windows service (which is even less desirable).
There are 2 things you should do in your application to enforce better error tracking and potentially recovery:
1.       Implement a custom ServiceHostFactory
By doing this, you can hook up your application to the different events a service can throw, we’re particularly interested in the Faulted one. With the custom ServiceHostFactory you can’t get a detailed error description, but you’ll be able to gracefully handle it and re-open the communication channel.
Here is the generic CustomServiceHost you can use initially:
namespace MyNamespace
{
       public class CustomServiceHost : ServiceHostFactory
       {
              private Uri[] _baseAddresses;
              private Type _serviceType; 
              private ServiceHost _serviceHost;

              public override ServiceHostBase CreateServiceHost(string constructorString, Uri[] baseAddresses)
              {
                     // Use whatever method necessary to get the type from the constructorString and save it
                     // (the service type calling this factory)
                     _serviceType = Type.GetType(string.Format("{0}, {1}",
                           constructorString, Assembly.GetExecutingAssembly().FullName.Split(new char[] { ',' }).First()));

                     // Save the base addresses in a class variable as well
                     _baseAddresses = baseAddresses;

                     // Create the service host for the first time
                     CreateServiceHost();

                     return _serviceHost;
              }

              protected override ServiceHost CreateServiceHost(Type serviceType, Uri[] baseAddresses)
              {
                     // Save the service type in a class variable
                     _serviceType = serviceType;
                    
                     // Save the base addresses in a class variable as well
                     _baseAddresses = baseAddresses;

                     // Create the service host for the first time
                     CreateServiceHost();

                     return _serviceHost;
              }

              private void CreateServiceHost()
              {
                     // Instantiate the ServiceHost and put it in a class variable so we can manage it later
                     _serviceHost = new ServiceHost(_serviceType, _baseAddresses);

                     // Hook up to the Faulted event
                     _serviceHost.Faulted += new EventHandler(serviceHost_Faulted);

                     // These are the other possible events we can hook up to

                     //_serviceHost.Opening += new EventHandler(serviceHost_Opening);
                     //_serviceHost.Opened += new EventHandler(serviceHost_Opened);
                     //_serviceHost.Closing += new EventHandler(serviceHost_Closing);
                     //_serviceHost.Closed += new EventHandler(serviceHost_Closed);
                     //_serviceHost.UnknownMessageReceived += new EventHandler<UnknownMessageReceivedEventArgs>(serviceHost_UnknownMessageReceived);
              }

              public void serviceHost_Faulted(object sender, EventArgs args)
              {
                     // Try to clean up as much as possible by aborting the channel
                     _serviceHost.Abort();

                     // Re-create it
                     CreateServiceHost();

                     // Re-open it
                     _serviceHost.Open();
              }
       }
}

And reference it in your .svc file like this:
<%@ ServiceHost Language="C#" Debug="true" Service="MyNamespace.MyService" Factory="MyNamespace.CustomServiceHost" %>

2.       Implement IErrorHandler
By implementing IErrorHandler you get access to all the exceptions thrown by the service, this includes both the ones you can explicitly catch in your code as well as the ones you can’t (such as the ones we’re after).
You can customize your IErrorHandler implementation in many ways, however you’ll have to deal with 2 methods: HandleError and ProvideFault. In this generic example, I show how to use HandleError to simply log all exceptions; this combined with the logging provided by your ServiceHostFactory above, can be very useful in determining “weird” exceptions/errors in your WCF/MSMQ infrastructure.
You can use the following code to start, I also implement IServiceBehavior and BehaviorExtensionElement so I can declare and use this ServiceErrorHandler as a behavior extension in my web.config.

NOTE: For an in-depth explanation of the IErrorHandler and BehaviorExtensionElement implementations, check out this great post: http://weblogs.asp.net/pglavich/archive/2008/10/16/wcf-ierrorhandler-and-propagating-faults.aspx
namespace MyNamespace
{
       public class ServiceErrorHandler : IErrorHandlerIServiceBehavior, ServiceErrorHandlerBehaviorExtensionElement
       {
              #region IErrorHandler Members

              public bool HandleError(Exception error)
              {
                     // Log the error

                     return true;  // error has been handled.
              }

              public void ProvideFault(Exception error, System.ServiceModel.Channels.MessageVersion version, ref System.ServiceModel.Channels.Message fault)
              {
                     // If the exception is not a fault, create one from it
                     if(!(error is FaultException))
                     {
                           MessageFault msgFault = null;

                           FaultException<InvalidOperationException> fex =
                                  new FaultException<InvalidOperationException>(new InvalidOperationException("ServiceErrorHandler ProvideFault: " + error.Message, error.InnerException), error.Message, new FaultCode("InvalidOperationException"));

                           msgFault = fex.CreateMessageFault();

                           fault = Message.CreateMessage(version, msgFault, ServiceConstants.FaultAction);
                     }
              }

              #endregion

              #region IServiceBehavior Members

              public void AddBindingParameters(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase, System.Collections.ObjectModel.Collection<ServiceEndpoint> endpoints, System.ServiceModel.Channels.BindingParameterCollection bindingParameters)
              {
                     return;
              }

              public void ApplyDispatchBehavior(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)
              {
                     foreach(ChannelDispatcher channDisp in serviceHostBase.ChannelDispatchers)
                     {
                           channDisp.ErrorHandlers.Add(this);
                     }
              }

              public void Validate(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)
              {
                     //Implement your own validation
              }

              #endregion
       }

       public class ServiceErrorHandlerBehaviorExtensionElement : BehaviorExtensionElement
       {
              public override Type BehaviorType
              {
                     get { return typeof(ServiceErrorHandler); }
              }

              protected override object CreateBehavior()
              {
                     return new ServiceErrorHandler();
              }
       }

       public static class ServiceConstants
       {
              public const string FaultAction = "http://MyNamespace.com/FaultAction";
       }
}

Then, declare the extension in your web.config:
       <system.serviceModel>
           <extensions>
               <behaviorExtensions>
                   <add name="ServiceErrorHandler" type="APX.Service.Multiplexer.Utility.ServiceErrorHandlerBehaviorExtensionElement, APX.Service.Multiplexer.Utility, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null" />
               </behaviorExtensions>
           </extensions>

And use it in any behavior:
           <behaviors>
               <serviceBehaviors>
                   <behavior name="MyBehavior">
                        <serviceMetadata httpGetEnabled="true"/>
                        <serviceDebug includeExceptionDetailInFaults="true" />
                        <ServiceErrorHandler />
                   </behavior>
               </serviceBehaviors>
           </behaviors>
       </system.serviceModel>

Conclusion
That’s it, it takes a bit of work to bring it all together, but the effort pays off when you see your asynchronous system, pub-sub or ESB smoothly in action.

3 comments:

  1. gr8 stuff thanks for sharing ur knowledge
    http://soft-engineering.blogspot.com

    ReplyDelete
  2. Yeah - awesome, totally solved a major problem for us!

    ReplyDelete