Monthly Archives: February 2008

Using the Automatic Recovery Features of Windows Services

Windows Services support the ability to automatically perform some defined action in response to a failure. The recovery action is specified in the Recovery tab of the service property page (which can be found in Settings->Control Panel->Administrative Tools -> Services). The Recovery tab allows you to define actions that can be performed on the 1st failure, 2nd failure, and subsequent failures, and also provides support for resetting the failure counters and how long to wait before taking the action. The allowed actions are

  • Take No Action (default)
  • Restart the Service
  • Run a Program
  • Restart the Computer

Having this type of functionality is really helpful from the perspective of a developer of services. Who wants to re-invent the wheel and have to write recovery code in the service if you can get it for free. Plus it allows the recovery to be reconfigured as an IT task as opposed to rebuilding the software. I did discover a few gotchas along the way, though.

Building a Basic Service
Let’s start with the basics though. You can create a service using Visual Studio by adding a project and selecting the type “Windows Service”. This will populate an empty service. First thing I like to do is give the classes and service real names. In this case I am testing that the recovery from a failure works so I renamed my class to FailingService.cs (not something I would recommend calling a product, but in this case it fits) . I also changed the ServiceName using the Properties of the window from Service1 (the default) to TestFailingService. This is the name that is going to appear in the Windows Services dialog, so I strongly recommend changing it.

You will also want to add an installer for your service as this will make it much easier to install the service onto your computer. You can’t run the service like a regular EXE file, so you definitely want an installer here.

Now that we’ve done all the basics, you should be able to build your service and install it. You can install the service using the InstallUtil.exe program that comes with the .NET Framework. Open a Visual Studio Command Prompt (this will already have a path to InstallUtil) and run

     InstallUtil ServicePath

Note: If there are spaces in the path you should enclose the path in “” as in InstallUtil “C:Documents and SettingsuserMy DocumentsVisual Studio 2005ProjectsServiceRecoveryTestsTestServicebinDebugTestService.exe”

Depending on how your PC is configured, you may be prompted to log in as part of installing the service. If you are on a domain you should use the full user name of domainuser.

From the Services tab you should now be able to start and stop your service. You should also be able to go to the Windows Event Viewer and see events related to starting and stopping your service in the Application and System logs.

To uninstall the service use the commands above, but with the /u option, as in

     InstallUtil /u ServicePath

Error Recovery
What I really wanted to do was to work with the failure recovery features of the service, so the first thing I did was create a thread that would simulate an error on some background processing of the service. The thread sleeps for 30 seconds and then throws an exception.

Since this exception is being thrown on a different thread than the main thread, I need to subscribe to the AppDomain’s UnhandledException event. If I don’t do this the thread will just die silently and the service will continue to run, which is not what I want.

Initially I thought I could just take advantage of the ServiceBase class ExitCode property. I figured that if I set the ExitCode property to a non-zero value and stopped the service that would be interpreted as a failure and the service would automatically be restarted, as in

        void UnhandledExceptionHandler(object sender, UnhandledExceptionEventArgs e)
        {
            ExitCode = 99;
            Stop();
        }

That is not the case though as I found out here. “A service is considered failed when it terminates without reporting a status of SERVICE_STOPPED to the service controller.”

So from this definition I decided that I needed to throw an exception in the Stop event handler, otherwise Stop would return normally, and it would not be considered a failure by the SCM.What I finally ended up doing was having the unhandled exception handler cache the unhandled exception and call Stop. The Stop event handler then checks if there is an unhandled exception and wraps that in an exception and throws it. Wrapping the exception, and passing the asynchronous exception as the InnerException, preserves the call stack of the asynchronous exception.

Examining the Event Viewer
I configured the service to restart after the 1st and 2nd failures, but to take no action after the third. The fail counters will reset after 1 day and the service will restart immediately after a failure.

Configuring Service Recovery 2

The service is written to fail after 30 seconds and the recovery mechanism should restart it immediately. If you start the log file you should see entries in the Event Viewer’s Application or System log showing the service starting, failing, and restarting.

Viewing Service Events

You can also get more information by opening up some of these events, such as service state transitions, how many times the error has occurred, or detailed error messages. Here is an exampleExamining Service Events

Resetting the Error Counters
Unfortunately the Reset fail count after and Restart service after fields in the Recovery tab only takes integers. It would have been nice to be able to have granularity less than a whole day for the reset of the counters to take effect. If you are running this service repeatedly (like I was doing during testing) the counters may be too high to automatically restart the service. If you expected a restart and didn’t get one, examine the entries in the Event Viewer’s System log and you should see something like this

If you want to reset the counters you can set the Reset fail count after field to zero which will cause the counters to reset after each failure.

Turning Off the JIT Debugger
One of the main reasons you write a service is to do something without user intervention (or even with a user logged in). One of the problems with this implementation is that you end up throwing an unhandled exception, by design, from the Stop event handler. If the Microsoft Just-In-Time (JIT) debugger is configured to run on your system it will prompt you if you want to debug the application. For a service that you want to auto-recover this is a really bad thing, since….well there might not be anyone there to answer the prompt.

I found a few tips on how to turn this off. You can read about them here.

Note: If you have multiple versions of Visual Studio installed (I have 2005 and 2003 installed) you have to turn off the JIT debugger for each version.

Service Source Code
Here is the source code for the service. The service is designed to fail after 30 seconds to test the recovery mechanisms of the Windows services. The code automatically generated for the installer was not modified at all, except to change the service name which was described above.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Diagnostics;
using System.ServiceProcess;
using System.Text;
using System.Threading;

namespace TestService
{
    public partial class FailingService : ServiceBase
    {
        private Thread _workerThread;

        // Used to cache any unhandled exception
        private Exception _asyncException;

        public FailingService()
        {
            InitializeComponent();

            // Wire up the UnhandledExcepetion event of the current AppDomain.  
            // This will fire any time an undandled exception is thrown
            AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(UnhandledExceptionHandler);
        }

        void UnhandledExceptionHandler(object sender, UnhandledExceptionEventArgs e)
        {
            // Cache the unhandled excpetion and begin a shutdown of the service
            _asyncException = e.ExceptionObject as Exception;
            Stop();
        }

        protected override void OnStart(string[] args)      
        {
            // Start up a thread that will simulate a failure by throwing an unhandled exception asynchronously
            _workerThread = new Thread(new ThreadStart(SimulateAsyncFailure));
            _workerThread.Start();

        }

        protected override void OnStop()
        {
            // If there was an unhandled exception, throw it here
            // Make sure to wrap the unhandled exception, as opposed to just rethrowing it, to preserve its StackTrace
            // To wrap it, create a new Exception and pass the unhandled exception as the InnerException
            // The exception info will be in the Windows Event Viewer's Application log
            // You could also use some logging/tracing mechanism to capture information about the exception 
            if( _asyncException != null )
                throw new InvalidOperationException("Unhandled exception in service", _asyncException );
        }

        private void SimulateAsyncFailure()
        {
            // Simulate an asynchronous unhandled excpetion by sleeping for some time and then throwing
            // There is no caller to catch this exception since it is running in a separate thread
            Thread.Sleep((int)TimeSpan.FromSeconds(30).TotalMilliseconds);
            throw new InvalidOperationException("Simulating a service error");
        }
    }
}

Programmatically Configuring Recovery
The installers do not provide any support for programmatically setting up the recovery actions, which I find a little frustrating. I did find some code here that will allow you to do this, but have not had a chance to test it out or incorporate it into the sample.

The Importance of Sharpening the Saw

Today I popped into Starbucks and saw a sign at the register that said they would be closed from 5:30 – 9:00 this Tuesday for Espresso Training. The sign went on to say how important quality is and that the training is necessary to maintain quality. I thought maybe it was just this one store, but now I see it is nation wide.

So what about our industry, why isn’t regular training more of a priority? Some of the companies I have worked for have, on occasion, sent me to training but usually that was because a specific skill was required. What about the regular training and learning that makes you a better developer, where is the priority for that?

In a perfect world, maybe your company would arrange for your continued development, but that’s pretty unlikely. The reality is that we as developers are responsible for staying current and looking for tools and techniques to make us better. We’re the only ones that are going to know what we need to learn to fill in the gopher holes. Whenever I am in crunch mode, the first thing that suffers is learning: I don’t take the time to try new tools, I read fewer blog posts, and I certainly don’t take the time to write any blog posts.

I’m going to try and make regular time for sharpening the saw. If it’s good enough for Starbucks, it’s good enough for me.

Rethinking the C# using statement

I’ve typically used the C# using construct to wrap an instance on an object that has a short lifetime and requires a call to Dispose. The using keyword hides the need to call Dispose explicitly and avoids having to use a try-finally to ensure that Dispose is always called. One example of a class that I often use the using keyword is a stream class

        using (FileStream fs = File.Create(path))
        {
            // Do something with the file stream
            ...

        }  //  Dispose gets called automatically here

Recently, while using NMock2, I came across a usage of the using statement that forced me to stop and think about what was going on under the covers. NMock2 is a mocking framework and the Order/Unordered properties are used to tell the framework whether you care that the expectations are satisfied in a particular order. The Ordered attribute tells NMock2 that the expectations must match the actual order of execution, while Unordered indicates that all the expectations must be satisfied, but the order doesn’t matter. When you are defining the expectations you put the Ordered/Unordered attributes in a using statement. They can even be nested, so a test might look something like this (and no I don’t use class/method names like this):

        [Test]
        public void CanDoIt()
        {
            using (_mockery.Ordered)
            {
                Expect.Once.On(_thing1).Method("Method1").WithNoArguments();

                using (_mockery.Unordered)
                {
                    Expect.Once.On(_thing1).Method("Method2").WithNoArguments();
                    Expect.Once.On(_thing1).Method("Method3").With(foo, bar);
                }

                Expect.Once.On(_thing1).Method("Method4").WithNoArguments();
            }

            classUnderTest.DoIt();
        }

In the above test we are telling NMock that we expect the following methods to be done in a certain order. Method1 should be called first, followed by Method2 and Method3 in any order, followed by Method4.

Using NMock2 for unit testing can be the subject of another post, but it was the use of the Ordered/Unordered attributes inside of a using statement that really made we think about what was going on. What I found intriguing about the use of this construct is that the Ordered/Unordered properties are returning an object that implements IDisposable but you never do anything with the returned property. NMock2 is using the using contruct with Ordered and Unordered as syntactic sugar to make the test cases easier to read, and I think it works well in NMocks case.

Under the covers, the constructor of the object returned from the Ordered/Unordered properties is actually being used to set state in the underlying owning Mockery object. I don’t think I can show only a snippet of the NMock2 source code under the license agreement of the NMock2, but if you want to see for yourself what it is doing you can download it from here.

Basically the object that gets created by the Ordered/Unordered properties takes a reference to the “parent” object in the constructor. In this case the parent object is the Mockery object. The Dispose method cleans up the state in the parent object. I came up with a somewhat contrived package shipping class that does something similar to what NMock is doing as an example, although in my example you can not nest using statements like in NMock.

using System;
using System.Collections.Generic;
using System.Text;

namespace TestUsing
{
    class Program
    {
        static void Main(string[] args)
        {
            PrioritizedShipping shipping = new PrioritizedShipping();

            // Setup a bunch of packages for priority
            using (shipping.Overnight)
            {
                shipping.ScheduleForDelivery(new Package(10.0, 51.2));
                shipping.ScheduleForDelivery(new Package(11.0, 52.3));

                // Note you can not nest usings because there is no Push/Pop mechanism implemented
            }

            // Default is standard shipping, could also use a using( shipping.Standard)
            shipping.ScheduleForDelivery(new Package(300.0, 1248.0));

            // For this example I prefer a more straightforward method that is clear to a manintainer
            shipping.ABetterScheduleForDelivery(new Package(500.0, 2448.0), ShipPriority.Standard);
            shipping.ABetterScheduleForDelivery(new Package(20.0, 12.0), ShipPriority.Overnight);
            shipping.ABetterScheduleForDelivery(new Package(21.0, 13.0), ShipPriority.Overnight);
        }
    }

    public class Package
    {
        private double _weight;
        private double _girth;

        public Package(double weight, double girth)
        {
            _weight = weight;
            _girth = girth;
        }
    }

    public enum ShipPriority { Overnight, Standard };

    public class PrioritizedShipping
    {
        private List<Package> _priorityPackages = new List<Package>();
        private List<Package> _standardPackages = new List<Package>();
        private List<Package> _activeList;

        public IDisposable Overnight
        {
            get { return new SetupShippingHelper(this, ShipPriority.Overnight); }
        }

        public IDisposable Standard
        {
            get { return new SetupShippingHelper(this, ShipPriority.Standard); }
        }

        public void SetupPriority(ShipPriority priority)
        {
            if (priority == ShipPriority.Overnight)
                _activeList = _priorityPackages;
            else
                _activeList = _standardPackages;
        }

        public void ScheduleForDelivery(Package package)
        {
            _activeList.Add(package);
        }

        public void ABetterScheduleForDelivery(Package package, ShipPriority priority)
        {
            if (priority == ShipPriority.Overnight)
                _priorityPackages.Add(package);
            else
                _standardPackages.Add(package);
        }

    }

    public class SetupShippingHelper : IDisposable
    {
        PrioritizedShipping _parent;

        public SetupShippingHelper( PrioritizedShipping parent, ShipPriority priority )
        {
            _parent = parent;
            _parent.SetupPriority(priority);
        }

        #region IDisposable Members

        public void Dispose()
        {
            _parent.SetupPriority(ShipPriority.Standard);
        }

        #endregion
    }
}

Note in the contrived example I’ve come up with I think it is actually less clear and I prefer a more explicit method, ABetterScheduleForDelivery, to indicate what priority to use. I would also have some serious concerns about how something like this would work in a multi-threaded environment. I can see how you could make it work, but at the end of the day I think this type of approach would be harder to maintain and could have some unforeseen consequences. I think it works really well for the NMock2 case, but I would have to think long and hard before using it in my own code.

I haven’t found a use for this particular usage of the using statement, but it does give me something to think about and a tool that perhaps I can use in the future. I think there is a lot to be said for code that reads well, but that also needs to be balanced against a maintainer being able to understand what is happening under the covers to avoid inadvertent consequences.

Pair Programming Follow-Up

In my last post I described some of my early impressions on my first pair programming experience. Still no complaints, although a few additional observations.

I am surprised at the end of the day to find that I am a little mentally drained and I think that is a good thing. The effort that we are putting in is sustainable, but it is also very focused. There is little down time since at any given moment at least one of us is “on” and good progress is being made. Even when we are at a point where we are stuck the ideas are still flowing and we are both mentally engaged in the process.

Given the focus that we both have on the task, I find it beneficial to take breaks from time to time to check email, grab coffee, and even read a few blog posts. It would be a tough pace to maintain continuously and I think the occasional 5-minute break keeps both of us fresh.

We are doing a good job of share time at the keyboard, and I see a real difference in the approach I take depending on whether I am at the keyboard or not. When I am at the keyboard my focus is on the details of programming: variable names, style, and yes…just being able to type with someone looking over my shoulder. When I am not at the keyboard I seem to see the big picture better: are we repeating code, what is the complexity like, is this code easily testable.

I just started reading James Shore’s The Art of Agile Development and I hope to learn more about the mechanics of pair programming and tips/techniques to be more successful at it. Mike and I are pairing well together (I think), but I am also looking forward to pairing with someone else on our next iteration , getting a better understanding of how they work and how we can work together.

Pair Programming

I’ve been doing some pair programming this week and have found the experience to be extremely productive and informative. I am pairing with a guy named Mike, who is someone I have worked with in the past and really enjoying working with. We are in the midst of a fairly significant refactoring effort for a particular component.

Although Mike and I often bounce ideas off each other when we are working individually, I think the quality and flow of ideas is significantly better now that we are pairing. That only makes sense I guess. Even if it is code that you are familiar with, when someone asks you a question while you are working on something else it is difficult to grasp the subtleties of the problem. During this pairing effort we are both intimately familiar with the underlying problems and can quickly comment on strengths and weaknesses of a particular approach. The design is really coming along well and we have been able to address issues regarding complexity, IOC, and enhanced testing.

Speaking of testing, this component has a good suite of tests and that is a real blessing when you are refactoring. I feel a lot more confident after a significant change to see “All tests passed”. I have been using NUnit/MbUnit for about 3 years now and I am trying to remember what programming was like before I had these tools. We are trying to work incrementally and each time we get to a stable point we are checking in the changes.

I have to admit that at times pair programming was tough to get used to. On the first day I found myself saying things like “if you want to go work on XYZ I can run with this for a bit and we can get back together later”. Usually this would happen when we hit a snag and I needed time to think though the problem. I was uncomfortable not writing code in front of someone else, even though when I am working alone I recognize this non-coding time as part of moving forward. Something about not having an immediate answer in front of someone else made me feel very uncomfortable.

There have been a lot of positives to the effort, and so far no negatives. I am sure that some of that is due to that fact that Mike and I work well together and have similar beliefs about software development. That is not to say we have always been in agreement, and indeed have had some spirited discussions. We have come up with a design that is stronger than either of us would have come up with individually. We are pushing each other to develop and maintain good tests, and also to push back with the occasional YAGNI criticism. We are also sharing productivity tips, comments about programming styles, ideas about new tools, blog posts, etc, etc.

One unexpected benefit of this pair programming effort has been that people have been less willing to interrupt us when we are working together. Maybe it is seeing two people working productively and enthusiastically that makes people hesitate before interrupting. There have been interruptions but they are fewer and we have also been more willing to say “let me finish up what we are doing here and I can swing by in 30 minutes”.

I am really looking forward to doing more of this. Based on the quality of the code and tests and our overall productivity I would feel confident defending the use of two developers for one task to management. I would also like to try pairing up with other members of the team. I am sure we will go through some of the same awkward startup moments but I think there is a lot to be learned and shared.