Friday, March 14, 2014

SQL Sentry 8.0 Technology Update

This content has been moved to the new unified SQL Sentry blog:
SQL Sentry 8.0 Technology Update

Friday, February 19, 2010

Caching and Memoization

A while back Wes Dyer wrote a great article on function memoization, a technique used to speed up subsequent function evaluations by caching the results of previous executions based on input value.

In a nutshell, given a function f(x),
m(f(x)) =
if (precomputed x is stored)
return precomputed x
else
precomputed x = f(x)
store precomputed(x)
return f(x)

Memoization is frequently used when the following conditions are true:
1. Input values of x are frequently recalculated
2. f(x) takes enough time to execute that the speedup for computing f(x) outweighs by a large margin the lookup times for finding the precomputed value of f(x).
3. There are not so many distinct values of x as to cause memory to overflow for all the precomputed values that will be cached. Memoization would not be helpful to calculate across a set of 10^10 distinct input values, for example.

The examples Wes gave are great and I strongly encourage you to read his article. This article was written to expand on that article by applying some additional concepts. The primary issue I had with straightforward memoization was item #3. Let us assume we have a long running function f(x). We then want to apply the following logic:

Define a cache timeout time t.
If we have computed f(x) within t, then return the cached value of f(x).
If we have not, then recalculate f(x) and update the cache.

This solves item #3 fairly gracefully. As long as the number of input values of x do not vary by more than n new items in time t, you can have support for an infinite values of x, as long as you understand how x is changing in time t. It also solves the issue of eviction of stale values from the cache. So knowing what I wanted to do let’s look at the implementation.

The class I created was called CachedFunction<T, TKey, TResult> where T is the input type that can be a class, TKey is a struct that can be reliably used as an key for the caching dictionary, and TResult is the output type of the function we are caching. I also created a simpler version when you have a function that takes a struct as an input value vs a class. In that case you can just use CachedFunction<T, TResult>. Internally that class just maps T->TKey via a 1-1 mapping function.

Let’s look at an example:

Func<int, int> addOne = x => { System.Threading.Thread.Sleep(1000); return x + 1; }; // Wait second and add a value
Action<int, TimeSpan> printTime =
(x, time) =>
{
string message = string.Format("result={0}, computationtime={1}", x, time);
System.Diagnostics.Debug.WriteLine(message);
}; // Helper for printing output

var addOneCached = addOne.CreateCachedFunction(new TimeSpan(0, 1, 0)); // Create a caching version of the function with a one minute timeout
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
var result = addOneCached(1); // Compute the value. Should take 1 second because of sleep.
printTime(result, sw.Elapsed);
sw.Reset();
sw.Start();
var result2 = addOneCached(1); // Compute the value. Should be instantaneous because it's cached.
printTime(result2, sw.Elapsed);
sw.Stop();


And the results:
result=2, computationtime=00:00:01.0039687
result=2, computationtime=00:00:00.0005103

Let’s look at the code, starting with the caching function itself:

using System;
using System.Collections.Generic;

namespace Intercerve
{
/// <summary>
/// Provides functionality for wrapping functions and caching computed values
/// </summary>
/// <typeparam name="T"></typeparam>
/// <typeparam name="TKey"></typeparam>
/// <typeparam name="TResult"></typeparam>
[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Design", "CA1005:AvoidExcessiveParametersOnGenericTypes")]
public class CachedFunction<T, TKey, TResult> where TKey : struct
{
/// <summary>
/// An internal class used to hold the time an individual result was cached and the corresponding value.
/// </summary>
private class ResultAndCacheTime
{
/// <summary>
/// The value of the cached result.
/// </summary>
public TResult Result { get; set; }
/// <summary>
/// The time the value was cached.
/// </summary>
public DateTime CacheTime { get; set; }
}

private readonly Dictionary<TKey, ResultAndCacheTime> _Cache = new Dictionary<TKey, ResultAndCacheTime>();
private readonly Func<T, TResult> _Function;
private readonly Func<T, TKey> _KeyMap;
private readonly TimeSpan _CacheTimeout;
private readonly object _SyncLock = new object();

/// <summary>
/// Creates a new CachedFunction to provide automatic caching and eviction for computed values.
/// </summary>
/// <param name="function">The function to wrap.</param>
/// <param name="keyMap">A mapping function that returns a key of type TKey for an input value of type T.</param>
/// <param name="cacheTimeout">The cache timeout threshold for flushing the cache.</param>
public CachedFunction(Func<T, TResult> function, Func<T, TKey> keyMap, TimeSpan cacheTimeout)
{
_Function = function;
_KeyMap = keyMap;
_CacheTimeout = cacheTimeout;
}

/// <summary>
/// Computes the value of f(value) or returns the cached value if within the cache timeout threshold.
/// </summary>
/// <param name="value">The value to retrieve the result for.</param>
/// <returns>The value of f(value) or the last cached value.</returns>
public TResult Compute(T value)
{
TKey key = _KeyMap(value);
ResultAndCacheTime resultAndCacheTime;

lock (_SyncLock)
{
// Aquire the lock and see if we have the value already cached.
if (_Cache.TryGetValue(key, out resultAndCacheTime))
{
// We already have the value. How old is it?
TimeSpan elapsedTime = DateTime.UtcNow.Subtract(resultAndCacheTime.CacheTime);
if (elapsedTime >= _CacheTimeout)
{
// The value is too old so remove it.
_Cache.Remove(key);
}
else
{
// The value is within the cache threshold so return it.
return resultAndCacheTime.Result;
}
}
}

// We don't have the value cached. Compute the value. Note we don't hold the lock here.
// This can result in the operating executing twice vs only computing the value once when we don't have it cached
// but if we don't do this and hold _SyncLock while computing, it would make Compute(T value) a
// blocking operation for the duration of _Function(value).
TResult computedResult = _Function(value);
resultAndCacheTime = new ResultAndCacheTime { Result = computedResult, CacheTime = DateTime.UtcNow };

lock (_SyncLock)
{
ResultAndCacheTime resultAndCacheTimeExisting;
if (_Cache.TryGetValue(key, out resultAndCacheTimeExisting))
{
// This is for thread synchronization. _Function(value) could potentially take a long time, so
// we cant hold the lock during it. Because of that we use a last win algorithm to see if two threads updated at the same time
// If they did the last computed time wins
if (resultAndCacheTime.CacheTime > resultAndCacheTimeExisting.CacheTime)
{
_Cache.Remove(key);
_Cache.Add(key, resultAndCacheTime);
}
}
else
{
_Cache.Add(key, resultAndCacheTime);
}
}

return computedResult;
}

/// <summary>
/// Clears the entire cache all at once for all values.
/// </summary>
public void ClearCache()
{
lock (_SyncLock)
{
_Cache.Clear();
}
}

/// <summary>
/// Clears a specific value from the cache.
/// </summary>
public void ClearCacheForValue(T value)
{
TKey key = _KeyMap(value);
lock (_SyncLock)
{
if (_Cache.ContainsKey(key))
{
_Cache.Remove(key);
}
}
}
}

/// <summary>
/// Provides functionality for wrapping functions and caching computed values
/// </summary>
public class CachedFunction<T, TResult> where T : struct
{
private CachedFunction<T, T, TResult> _CachedFunction;

/// <summary>
/// Creates a new CachedFunction to provide automatic caching and eviction for computed values.
/// </summary>
/// <param name="function">The function to wrap.</param>
/// <param name="cacheTimeout">The cache timeout threshold for flushing the cache.</param>
public CachedFunction(Func<T, TResult> function, TimeSpan cacheTimeout)
{
_CachedFunction = new CachedFunction<T, T, TResult>(function, GetKey, cacheTimeout);
}

private T GetKey(T value)
{
return value;
}

/// <summary>
/// Computes the result of value.
/// </summary>
/// <param name="value">The value to evaluate.</param>
/// <returns>The result.</returns>
public TResult Compute(T value)
{
return _CachedFunction.Compute(value);
}

/// <summary>
/// Clears the entire cache all at once for all values.
/// </summary>
public void ClearCache()
{
_CachedFunction.ClearCache();
}

/// <summary>
/// Clears a specific value from the cache.
/// </summary>
public void ClearCacheForValue(T value)
{
_CachedFunction.ClearCacheForValue(value);
}
}
}


That is relatively user friendly, but to create a caching function using that method we have to call:

var addOneCachedLong = new CachedFunction<int, int>(addOne, new TimeSpan(0, 1, 0)).GetFunction();

To reduce the complexity we can create helper extension methods and use type inference to ease in the construction. To do so I defined the following:

using System;

namespace Intercerve
{
/// <summary>
/// Various extension methods for creating alternate versions of functions.
/// </summary>
public static class FunctionExtensions
{
/// <summary>
/// Creates a caching wrapper around a function.
/// </summary>
/// <typeparam name="T">The function argument type</typeparam>
/// <typeparam name="TResult">The function return type</typeparam>
/// <param name="function">The function to wrap</param>
/// <param name="cacheTimeout">The cache timeout for cached results</param>
/// <returns></returns>
public static Func<T, TResult> CreateCachedFunction<T, TResult>(this Func<T, TResult> function, TimeSpan cacheTimeout) where T : struct
{
CachedFunction<T, TResult> cachedFunction = new CachedFunction<T, TResult>(function, cacheTimeout);
return cachedFunction.Compute;
}

/// <summary>
/// Creates a caching wrapper around a function.
/// </summary>
/// <typeparam name="T">The function argument type</typeparam>
/// <typeparam name="TKey">The cached item key type</typeparam>
/// <typeparam name="TResult">The function return type</typeparam>
/// <param name="function">The function to wrap</param>
/// <param name="keyMap">The mapping function from T to TKey</param>
/// <param name="cacheTimeout">The cache timeout for cached results</param>
/// <returns></returns>
public static Func<T, TResult> CreateCachedFunction<T, TKey, TResult>(this Func<T, TResult> function, Func<T, TKey> keyMap, TimeSpan cacheTimeout) where TKey : struct
{
CachedFunction<T, TKey, TResult> cachedFunction = new CachedFunction<T, TKey, TResult>(function, keyMap, cacheTimeout);
return cachedFunction.Compute;
}
}
}


We can then do what we did in the example, which is simply:
var addOneCached = addOne.CreateCachedFunction(new TimeSpan(0, 1, 0));


I also added support for eviction of individual items on the cache if need be, or flushing the cache entirely. The primary usage I’ve found for the CachedFunction() was to add a bounded caching solution around a function that already had been written. It’s much easier to extend a method like this than to go into the guts and reorganize. During an optimization phase we noticed that certain database function calls were running very often. It was a tricky problem because we wanted to make a method call, but depending on various factors that method could run very often or not very often. We wanted to restrict how often the inner function ran.

Put another way, take a function outer(x) that calls inner(x). We can’t control how often outer(x) runs. Sometimes it could execute multiple times per second, sometimes once per minute. We can’t change that behavior. It needs to run as often as it needs to run. However inner(x) shouldn’t change that often. Most of the time its value is the same as it was the previous execution. Sometimes it can change but rarely. So we used this function to wrap inner(x) and make cached_inner(x) with a threshold of five minutes or so. Voila. Problem solved. No matter how often outer(x) runs, inner(x) will only run once every five minutes, but will always yield a value, and better yet, we did this without having to modify the original function and in one line of code.



Of course, right now this method only supports functions with one input parameter, however it wouldn’t be too hard to extend the class to support additional variables. Wes demonstrates how to do this easily in another excellent blog post: http://blogs.msdn.com/wesdyer/archive/2007/02/11/baby-names-nameless-keys-and-mumbling.aspx



Until next time…

Friday, February 5, 2010

Win32_Service Memory Leak

During the development of SQL Sentry 5.5 we noticed we were receiving errors from some of our watched development servers. The error was from the WMI subsystem and simply stated “Out of Memory.” After searching for a bit to try to determine the cause, we realized that on all the affected watched servers the wmiprvse.exe process was using around 512MB of memory. Doing some additional searches turned up the following blog post:

http://blogs.technet.com/askperf/archive/2008/09/16/memory-and-handle-quotas-in-the-wmi-provider-service.aspx

in which Mark Ghazai, a member of the Windows Performance Team, discussed the wmiprvse.exe process and the 512 meg cap. In a nutshell, the wmiprvse.exe process is the WMI Provider Service, which acts as a host for WMI providers such as win32_service. It has a cap of 512 megabytes which can be adjusted, but in the case of a memory leak, that would just be a band-aid. We needed to get to the root of the problem. Why was this process spiking to 512MB to begin with?

The first thing we noticed was that this problem only showed up on Windows 7 and Windows Server 2008 R2, so it was specific to Windows 6.1. It also happened only on systems we watched, which makes sense because we use WMI heavily. We could look at the wmiprvse.exe process throughout the day and see that the memory usage was steadily rising. A mitigating factor is that this process will actually terminate itself after a period of inactivity, but in the case of a monitoring system like SQL Sentry, we don’t ever wait long enough for that period of inactivity to elapse. The question remained, exactly what were we doing that was causing this process to increase in memory on Windows 7 and 2008 R2?

The next step was to try to profile the process for a memory leak. A quick search in the Debugging Tools for Windows help document (WinDbg) revealed a helpful topic called “Using UMDH to Find a User-Mode Memory Leak.” Seeing as that was exactly what I wanted I started in earnest.

The first step involves setting up your symbols. In order to analyze a memory leak you have to be able to look at the call stacks, and the only way you can get call stack information from an unmanaged executable is with symbols. Fortunately this is pretty easy since Microsoft provides symbol servers. The following command, taken from the documentation, can be used to set up the symbol path.

set _NT_SYMBOL_PATH=c:\mysymbols;srv*c:\mycache*http://msdl.microsoft.com/download/symbols

The next step was to use GFlags to enable UMDH stack traces as outlined in the WinDbg documentation. We started GFlags and turned on Stack Backtrace (Megs) for the wmiprvse.exe image by clicking the checkbox. After that you have to restart the process, so I just killed wmiprvse.exe. It gets auto-launched the first time a WMI query is executed, so it respawned right away.

Once the process was running we needed to collect our allocation snapshots. To do so, you use:
umdh –p:<processid> –f:<logfilename>
Each time you run the above command, it generates a snapshot of the current allocations. What we are doing here is taking a peek at all the unmanaged memory allocations from the process and their corresponding call stacks. So I ran that once, waited for the memory used by that process to increase by about 1 megabyte, then ran it again using a different log file name.

The next step is to run these files back through umdh to create a differential file. UMDH will compare the allocations in one file to the allocations in the other and determine what memory allocations made in the earlier file still exist and have not been cleaned up by the time the second file was created. This is done using the following command:

umdh <file1> <file2> > <outfile>

The > before <outfile> is just a redirect showing where you want the output to go to. This will generate a new file which is readable. After the symbol listing at the top of the file are the allocations. Not everything in this list is a problem. Something could be in this list just because it hasn’t been cleaned up yet, but in our case, one entry always showed up at the top. Furthermore, the numbers got larger as time went on (I only included the top six lines of the call stack).

+   c28ba ( 185174 - c28ba)   1078 allocs    BackTrace2980620
+     83c (  1078 -   83c)    BackTrace2980620    allocations

    ntdll! ?? ::FNODOBFM::`string'+0001A81B
    msvcrt!malloc+00000070
    cimwin32!operator new+00000009
    cimwin32!CWin32Service::LoadPropertyValuesWin2K+000004A1
    cimwin32!CWin32Service::AddDynamicInstancesNT+00000200
    framedynos!Provider::CreateInstanceEnum+00000034

As you can see, CWin32Service is the leaky class, and I presumed that it was the code that supplied the functionality for the Win32_Service WMI provider. The next step was validating this outside our code, so I got on a system that SQL Sentry was not looking at to ensure there wasn’t any interference in my metrics and ran the following query in wbemtest:

select * from win32_service

Each time, the wmiprvse.exe process memory went up, but never down. I then decided to throw a heavier test at it, so I whipped up a little powershell function

for ($i=0; $i -le100; $i++) { get-wmiobject win32_service | format-table  }

Running that caused wmiprvse.exe to continually increase in memory while it was running, so I had my smoking gun and proceeded to file a bug report with Microsoft.

So, where are we now? After going back and forth with Microsoft on this, they have filed it for the next major release of the OS, i.e. it won’t be fixed in Windows 7 or 2008 R2 in any service pack or hotfix. Apparently the changes are “too invasive.” We are currently working with Microsoft to see if we can escalate this and get it fixed. In the meantime we have other options for querying service status, like using the Service Control Manager, we’re just making sure that it doesn’t cause any issues that we’ve never seen before. In 5.5 we’ll be including an App.Config option called useScmForServiceStatus that we can turn on and off for testing, or to switch to SCM if WMI is causing problems in your environment.

Monday, January 19, 2009

The SQL Sentry Console is now 64-bit capable

I wanted to take a departure from the language-based posts I’ve done in the past and relay some information regarding the next release of our product. Starting with the next point release of our software (currently slated to be 4.3) The SQL Sentry Console will be able to run in native 64-bit mode. The reason for this blog post is to describe some of the reasons why this is available now and has not been in the past, as well as what it means to the end-user.

SQL Sentry has two components, the SQL Sentry Console, and the SQL Sentry Server Service. Each one of these talks to a number of different systems and uses many supporting dlls/assemblies to provide support for external systems. The SQL Sentry Server Service has run natively in 64-bit mode for some time now. The console, however has been forced to run in 32-bit mode. The reason for this is that in Windows, any 64-bit process can only load 64-bit dlls. You can’t mix and match 64-bit and 32-bit code.

SQL Sentry was conceived when SQL Server 2000 was the dominant SQL Server version. As such we decided early on to support reading SQL Server Enterprise Manager registrations into the console using SQL-DMO. We also found we could (with some work) show Enterprise Manager property windows and dialogs using the SQL-NS library. Both of these libraries were COM libraries, and since SQL Server 2000 was written at a time when 32-bit was your only OS option, the COM libraries were 32-bit. Eventually Microsoft added support for 64-bit operating systems in SQL Server 2000, but the client tools, running in 32-bit mode anyway, had no need to get these updates.

Because the console linked directly to these dlls to support reading registrations and showing SQL Server 2000 dialogs, the console was forced to run in 32-bit mode, lest it not be able to load the 32-bit dlls. This wasn’t a major issue to do, as it’s easy to flag an executable to run in x86 mode (vs. the default automatic detection mode for managed applications) by just setting that as a compilation option. It worked and that’s how SQL Sentry has been able to interoperate with SQL Server 2000 until now.

In 4.2 we wanted to support reading SQL Server 2008 registrations. This proved to be quite an engineering task as we found that Microsoft changed many of the APIs used to read in this registration data. Because of this we decided to go back to the drawing board and reengineer the way we read in registration information. As stated previously, before we linked directly to the registration COM objects (for SQL Server 2000) and SMO assemblies (for 2005). For 4.2 we elected to use an intermediate layer. We abstracted all the functionality we needed into a set of interfaces that we could read/write to, and then created plugins, one for 2000, one for 2005, and one for 2008. That allowed us to independently manage all the plugins and their corresponding references and decouple their implementations from the console. It was an undertaking but it paid off, because we have a much better separation of code layers, and we also will be able to support the next version of SQL Server without any major work.

Around the time when I’d looked at the feasibility of making the console work in 64-bit mode I quickly realized that doing so would require such a decoupling, because I had to get the 32-bit dlls out of the console process. It just so happens that the work done for 4.2 did just that.

In 4.3 the default build will not have the 32-bit flag in the manifest, so it will launch using whatever architecture the processor/OS supports. The side effect of this is if you are using the SQL Sentry Console on a 64-bit system and running it in 64-bit mode, you will be unable to read in SQL Server 2000 registrations, because it will not load the 2000 plugin (if it did it wouldn’t work due to the 64-bit/32-bit process/dll interoperability issue). The workaround for this is that we’re also shipping an x86 SQL Sentry Console that is flagged to run in x86 mode. This console will only be installed on 64-bit operating systems. If you install SQL Sentry on a 64-bit OS starting with 4.3 you’ll see two executable shortcuts in the start menu:

SQL Sentry Console and
SQL Sentry Console (x86)

If you require interoperation with SQL Server Enterprise Manager registrations, you should run the x86 version, otherwise, run the regular SQL Sentry Console, which will run in 64-bit mode on a 64-bit OS.

Tuesday, January 6, 2009

Dictionary Replication

It is very common in programming to need to do a replication of data. In SQL Sentry when we pull objects from remote systems we need to figure out if they exist in our database. We index objects by specific keys, and then during the synchronization process use these keys and compare them to the remote objects keys to figure out what objects are new, what objects have been deleted, and what objects have changed. The details of everything we do are beyond the scope of this article, but I find myself doing enough of these dictionary replications that I decided to blog about it.

Take the following scenario:
You have two dictionaries,

Dictionary<K,VTarget> targetDictionary
and
Dictionary<K,VSource> sourceDictionary

You want to in one call update targetDictionary with all the contents of sourceDictionary, and be able to generate callbacks for new entries, deleted entries, and changed entries. The value types are different, as you plan to map VSource to VTarget (more on that later), but they share a common key data type (we could later expand that to be different too if it suited us). It seems a bit overwhelming but it’s a perfect example of where generics can make your life a lot easier. Traditionally I’d find myself writing three loops. In the first loop I update changed items and add new items that are in the source but not in the target. In the second l find the keys that are in the target but not in the source. These are deleted items. I cant remove them yet though because I’m inside an enumerator and that would generate an error, so I use a temp collection then do one more loop at the end to remove them from the target.

   1: // Initialize the collections
   2: Dictionary<int, string> peopleByIDTarget = new Dictionary<int, string>();
   3: Dictionary<int, string> peopleByIDSource = new Dictionary<int, string>();
   4:  
   5: peopleByIDTarget.Add(1, "Brooke");
   6: peopleByIDTarget.Add(2, "Tommy");
   7:  
   8: peopleByIDSource.Add(2, "Rick");
   9: peopleByIDSource.Add(3, "Dom");
  10:  
  11: // Loop through the source
  12: foreach (KeyValuePair<int, string> sourceKeyValuePair in peopleByIDSource)
  13: {
  14:     string existingName;
  15:     
  16:     if (peopleByIDTarget.TryGetValue(sourceKeyValuePair.Key, out existingName))
  17:     {
  18:         // ID Exists. Change Name!
  19:         peopleByIDTarget[sourceKeyValuePair.Key] = sourceKeyValuePair.Value;
  20:     }
  21:     else
  22:     {
  23:         // ID doesn't exist, so add with name.
  24:         peopleByIDTarget.Add(sourceKeyValuePair.Key, sourceKeyValuePair.Value);
  25:     }
  26: }
  27:  
  28: // Create a temp list to hold items we need to remove. You cant remove items
  29: // while enumerating or you get an error.
  30: List<int> keysToRemove = new List<int>();
  31:  
  32: // Loop through the target to see what items dont exist in the source
  33: foreach (KeyValuePair<int, string> targetKeyValuePair in peopleByIDTarget)
  34: {
  35:     // The target item doesnt exist in the source so we must remove it.
  36:     // Add it to the removal list.
  37:     if (!peopleByIDSource.ContainsKey(targetKeyValuePair.Key))
  38:     {
  39:         keysToRemove.Add(targetKeyValuePair.Key);
  40:     }
  41: }
  42:  
  43: // Remove the keys we marked for removal
  44: foreach (int key in keysToRemove)
  45: {
  46:     peopleByIDTarget.Remove(key);
  47: }

This works but it’s quite a bit of code, especially if this is happening with a lot of collections. I like to promote code reuse so the goal was to make this routine generic so that I could simply call:

peopleByIDTarget.Merge(peopleByIDSource);

Another thing the above example lacks is support for the callbacks I mentioned earlier. I’d like to know when an item is removed, added, or changed, and specify it in a easily defined way, like

peopleByIDTarget.Merge(peopleByIDSource, itemAddedCallback, itemChangedCallback, itemRemovedCallback)

and get the item that was removed, added, or changed.

It’s pretty straightforward to convert the above code into a generic method. The following is the most advanced version, supporting lots of options, callbacks, comparisons to see whether two values are the same (you may not always wish to fire the changed event if the instance of V didn’t have any properties that are different).

First we need to define some helper classes for the callbacks:

   1: /// <summary>
   2: /// Provides event arguments for items that are added to a dictionary.
   3: /// </summary>
   4: /// <typeparam name="K">The Key type</typeparam>
   5: /// <typeparam name="V">The Value type</typeparam>
   6: public class DictionaryItemAddedEventArgs<K, V> : EventArgs
   7: {
   8:     /// <summary>
   9:     /// The Key
  10:     /// </summary>
  11:     public K Key { get; private set; }
  12:  
  13:     /// <summary>
  14:     /// The new value
  15:     /// </summary>
  16:     public V NewValue { get; private set; }
  17:  
  18:     /// <summary>
  19:     /// Creates a new DictionaryItemAddedEventArgs
  20:     /// </summary>
  21:     /// <param name="key">The key</param>
  22:     /// <param name="newValue">The new value</param>
  23:     public DictionaryItemAddedEventArgs(K key, V newValue)
  24:     {
  25:         this.Key = key;
  26:         this.NewValue = newValue;
  27:     }
  28: }
  29:  
  30: /// <summary>
  31: /// Provides event arguments for items that are changed in a dictionary.
  32: /// </summary>
  33: /// <typeparam name="K">The Key type</typeparam>
  34: /// <typeparam name="V">The Value type</typeparam>
  35: public class DictionaryItemChangedEventArgs<K, V> : EventArgs
  36: {
  37:     /// <summary>
  38:     /// The Key
  39:     /// </summary>
  40:     public K Key{ get; private set; }
  41:  
  42:     /// <summary>
  43:     /// The previous value
  44:     /// </summary>
  45:     public V OldValue { get; private set; }
  46:  
  47:     /// <summary>
  48:     /// The new value
  49:     /// </summary>
  50:     public V NewValue { get; private set; }
  51:  
  52:     /// <summary>
  53:     /// Creates a new DictionaryItemChangedEventArgs
  54:     /// </summary>
  55:     /// <param name="key">The key</param>
  56:     /// <param name="oldValue">The previous value</param>
  57:     /// <param name="newValue">The new value</param>
  58:     public DictionaryItemChangedEventArgs(K key, V oldValue, V newValue)
  59:     {
  60:         this.Key = key;
  61:         this.OldValue = oldValue;
  62:         this.NewValue = newValue;
  63:     }
  64: }
  65:  
  66: /// <summary>
  67: /// Provides event arguments for items that are deleted from a dictionary.
  68: /// </summary>
  69: /// <typeparam name="K">The Key type</typeparam>
  70: /// <typeparam name="V">The Value type</typeparam>
  71: public class DictionaryItemDeletedEventArgs<K, V> : EventArgs
  72: {
  73:     /// <summary>
  74:     /// The Key
  75:     /// </summary>
  76:     public K Key { get; private set; }
  77:  
  78:     /// <summary>
  79:     /// The previous value
  80:     /// </summary>
  81:     public V OldValue { get; private set; }
  82:  
  83:     /// <summary>
  84:     /// Creates a new DictionaryItemDeletedEventArgs
  85:     /// </summary>
  86:     /// <param name="key">The key</param>
  87:     /// <param name="oldValue">The previous value</param>
  88:     public DictionaryItemDeletedEventArgs(K key, V oldValue)
  89:     {
  90:         this.Key = key;
  91:         this.OldValue = oldValue;
  92:     }
  93: }

Then we can get to the actual dictionary extensions class. The primary work starts on line 180 and I've included a couple other helper extensions I added for other uses:

   1: /// <summary>
   2: /// Provides extention methods to the dictionary class
   3: /// </summary>
   4: public static class DictionaryExtensions
   5: {
   6:     /// <summary>
   7:     /// Creates a new Dictionary with the key and value of the current dictionary reversed.
   8:     /// This method should be not used when duplicate values are expected because collisions will occur.
   9:     /// </summary>
  10:     /// <typeparam name="K">The Key type</typeparam>
  11:     /// <typeparam name="V">The Value type</typeparam>
  12:     /// <param name="dictionary">The dictionary to use</param>
  13:     /// <returns>A Dictionary with the keys and values of the current dictionary reversed</returns>
  14:     public static Dictionary<V, K> CreateDictionaryOfValueAndKey<K, V>(this Dictionary<K, V> dictionary)
  15:     {
  16:         Dictionary<V, K> result = new Dictionary<V,K>();
  17:         foreach (KeyValuePair<K, V> keyValuePair in dictionary)
  18:         {
  19:             result.Add(keyValuePair.Value, keyValuePair.Key);
  20:         }
  21:  
  22:         return result;
  23:     }
  24:  
  25:     /// <summary>
  26:     /// Creates a new MultiDictionary with the key and value of the current dictionary reversed.
  27:     /// This method should be used when duplicate values are expected.
  28:     /// </summary>
  29:     /// <typeparam name="K">The Key type</typeparam>
  30:     /// <typeparam name="V">The Value type</typeparam>
  31:     /// <param name="dictionary">The dictionary to use</param>
  32:     /// <returns>A MultiDictionary with the keys and values of the current dictionary reversed</returns>
  33:     public static MultiDictionary<V, K> CreateMultiDictionaryOfValueAndKey<K, V>(this Dictionary<K, V> dictionary)
  34:     {
  35:         MultiDictionary<V, K> result = new MultiDictionary<V, K>();
  36:         foreach (KeyValuePair<K, V> keyValuePair in dictionary)
  37:         {
  38:             result.Add(keyValuePair.Value, keyValuePair.Key);
  39:         }
  40:  
  41:         return result;
  42:     }
  43:  
  44:     /// <summary>
  45:     /// Merges the targetDictionary with the sourceDictionary, deleting items that arent in sourceDictionary, 
  46:     /// adding items that are in sourceDictionary but not in the targetDictionary, and updating items that are in both, 
  47:     /// setting them to the value in sourceDictionary
  48:     /// </summary>
  49:     /// <typeparam name="K">The Key type</typeparam>
  50:     /// <typeparam name="V">The Value type of the source and target dictionaries</typeparam>        
  51:     /// <param name="targetDictionary">The current dictionary to merge entries into</param>
  52:     /// <param name="sourceDictionary">The new dictionary with the most recent data</param>
  53:     public static void Merge<K, V>(
  54:         this Dictionary<K, V> targetDictionary,
  55:         Dictionary<K, V> sourceDictionary)
  56:     {
  57:         Merge(targetDictionary, sourceDictionary, null, null, null, null, null);
  58:     }
  59:  
  60:     /// <summary>
  61:     /// Merges the targetDictionary with the sourceDictionary, deleting items that arent in sourceDictionary, 
  62:     /// adding items that are in sourceDictionary but not in the targetDictionary, and updating items that are in both, 
  63:     /// setting them to the value in sourceDictionary
  64:     /// </summary>
  65:     /// <typeparam name="K">The Key type</typeparam>
  66:     /// <typeparam name="VTarget">The Value type of the target dictionary</typeparam>
  67:     /// <typeparam name="VSource">The Value type of the source dictionary</typeparam>
  68:     /// <param name="targetDictionary">The current dictionary to merge entries into</param>
  69:     /// <param name="sourceDictionary">The new dictionary with the most recent data</param>
  70:     public static void Merge<K, VTarget, VSource>(
  71:         this Dictionary<K, VTarget> targetDictionary,
  72:         Dictionary<K, VSource> sourceDictionary)
  73:     {
  74:         Merge(targetDictionary, sourceDictionary, null, null, null, null, null);
  75:     }
  76:  
  77:     /// <summary>
  78:     /// Merges the targetDictionary with the sourceDictionary, deleting items that arent in sourceDictionary, 
  79:     /// adding items that are in sourceDictionary but not in the targetDictionary, and updating items that are in both, 
  80:     /// setting them to the value in sourceDictionary
  81:     /// </summary>
  82:     /// <typeparam name="K">The Key type</typeparam>
  83:     /// <typeparam name="V">The Value type of the source and target dictionaries</typeparam>
  84:     /// <param name="targetDictionary">The current dictionary to merge entries into</param>
  85:     /// <param name="sourceDictionary">The new dictionary with the most recent data</param>
  86:     /// <param name="valueUpdater">The action to use to update values that share the same key</param>
  87:     public static void Merge<K, V>(
  88:         this Dictionary<K, V> targetDictionary,
  89:         Dictionary<K, V> sourceDictionary,
  90:         Func<V, V, bool> valueUpdater)
  91:     {
  92:         Func<V, V> valueMapper = x => x;
  93:         Merge(targetDictionary, sourceDictionary, valueMapper, valueUpdater, null, null, null);
  94:     }
  95:  
  96:     /// <summary>
  97:     /// Merges the targetDictionary with the sourceDictionary, deleting items that arent in sourceDictionary, 
  98:     /// adding items that are in sourceDictionary but not in the targetDictionary, and updating items that are in both, 
  99:     /// setting them to the value in sourceDictionary
 100:     /// </summary>
 101:     /// <typeparam name="K">The Key type</typeparam>
 102:     /// <typeparam name="VTarget">The Value type of the target dictionary</typeparam>
 103:     /// <typeparam name="VSource">The Value type of the source dictionary</typeparam>
 104:     /// <param name="targetDictionary">The current dictionary to merge entries into</param>
 105:     /// <param name="sourceDictionary">The new dictionary with the most recent data</param>
 106:     /// <param name="valueMapper">The transform to convert VSource to VTarget</param>
 107:     /// <param name="valueUpdater">The action to use to update values that share the same key</param>
 108:     public static void Merge<K, VTarget, VSource>(
 109:         this Dictionary<K, VTarget> targetDictionary,
 110:         Dictionary<K, VSource> sourceDictionary,
 111:         Func<VSource, VTarget> valueMapper,
 112:         Func<VTarget, VTarget, bool> valueUpdater)
 113:     {
 114:         Merge(targetDictionary, sourceDictionary, valueMapper, valueUpdater, null, null, null);
 115:     }
 116:  
 117:     /// <summary>
 118:     /// Merges the targetDictionary with the sourceDictionary, deleting items that arent in sourceDictionary, 
 119:     /// adding items that are in sourceDictionary but not in the targetDictionary, and updating items that are in both, 
 120:     /// setting them to the value in sourceDictionary
 121:     /// </summary>
 122:     /// <typeparam name="K">The Key type</typeparam>
 123:     /// <typeparam name="V">The Value type of the source and target dictionaries</typeparam>
 124:     /// <param name="targetDictionary">The current dictionary to merge entries into</param>
 125:     /// <param name="sourceDictionary">The new dictionary with the most recent data</param>
 126:     /// <param name="itemAddedEventHandler">The event to fire for items that were added</param>
 127:     /// <param name="itemChangedEventHandler">The event to fire for items that were changed</param>
 128:     /// <param name="itemDeletedEventHandler">The event to fire for items that were deleted</param>
 129:     public static void Merge<K, V>(
 130:         this Dictionary<K, V> targetDictionary,
 131:         Dictionary<K, V> sourceDictionary,
 132:         EventHandler<DictionaryItemAddedEventArgs<K, V>> itemAddedEventHandler,
 133:         EventHandler<DictionaryItemChangedEventArgs<K, V>> itemChangedEventHandler,
 134:         EventHandler<DictionaryItemDeletedEventArgs<K, V>> itemDeletedEventHandler)
 135:     {
 136:         Func<V, V> valueMapper = x => x;
 137:         Merge(targetDictionary, sourceDictionary, valueMapper, null, itemAddedEventHandler, itemChangedEventHandler, itemDeletedEventHandler);
 138:     }
 139:  
 140:     /// <summary>
 141:     /// Merges the targetDictionary with the sourceDictionary, deleting items that arent in sourceDictionary, 
 142:     /// adding items that are in sourceDictionary but not in the targetDictionary, and updating items that are in both, 
 143:     /// setting them to the value in sourceDictionary
 144:     /// </summary>
 145:     /// <typeparam name="K">The Key type</typeparam>
 146:     /// <typeparam name="VTarget">The Value type of the target dictionary</typeparam>
 147:     /// <typeparam name="VSource">The Value type of the source dictionary</typeparam>
 148:     /// <param name="targetDictionary">The current dictionary to merge entries into</param>
 149:     /// <param name="sourceDictionary">The new dictionary with the most recent data</param>
 150:     /// <param name="valueMapper">The transform to convert VSource to VTarget</param>
 151:     /// <param name="itemAddedEventHandler">The event to fire for items that were added</param>
 152:     /// <param name="itemChangedEventHandler">The event to fire for items that were changed</param>
 153:     /// <param name="itemDeletedEventHandler">The event to fire for items that were deleted</param>
 154:     public static void Merge<K, VTarget, VSource>(
 155:         this Dictionary<K, VTarget> targetDictionary,
 156:         Dictionary<K, VSource> sourceDictionary,
 157:         Func<VSource, VTarget> valueMapper,
 158:         EventHandler<DictionaryItemAddedEventArgs<K, VTarget>> itemAddedEventHandler,
 159:         EventHandler<DictionaryItemChangedEventArgs<K, VTarget>> itemChangedEventHandler,
 160:         EventHandler<DictionaryItemDeletedEventArgs<K, VTarget>> itemDeletedEventHandler)
 161:     {
 162:         Merge(targetDictionary, sourceDictionary, valueMapper, null, itemAddedEventHandler, itemChangedEventHandler, itemDeletedEventHandler);
 163:     }        
 164:  
 165:     /// <summary>
 166:     /// Merges the targetDictionary with the sourceDictionary, deleting items that arent in sourceDictionary, 
 167:     /// adding items that are in sourceDictionary but not in the targetDictionary, and updating items that are in both, 
 168:     /// setting them to the value in sourceDictionary
 169:     /// </summary>
 170:     /// <typeparam name="K">The Key type</typeparam>
 171:     /// <typeparam name="VTarget">The Value type of the target dictionary</typeparam>
 172:     /// <typeparam name="VSource">The Value type of the source dictionary</typeparam>
 173:     /// <param name="targetDictionary">The current dictionary to merge entries into</param>
 174:     /// <param name="sourceDictionary">The new dictionary with the most recent data</param>
 175:     /// <param name="valueMapper">The transform to convert VSource to VTarget</param>
 176:     /// <param name="valueUpdater">The action to use to update values that share the same key</param>
 177:     /// <param name="itemAddedEventHandler">The event to fire for items that were added</param>
 178:     /// <param name="itemChangedEventHandler">The event to fire for items that were changed</param>
 179:     /// <param name="itemDeletedEventHandler">The event to fire for items that were deleted</param>
 180:     public static void Merge<K, VTarget, VSource>(
 181:         this Dictionary<K, VTarget> targetDictionary,
 182:         Dictionary<K, VSource> sourceDictionary,
 183:         Func<VSource, VTarget> valueMapper,
 184:         Func<VTarget, VTarget, bool> valueUpdater,
 185:         EventHandler<DictionaryItemAddedEventArgs<K, VTarget>> itemAddedEventHandler,
 186:         EventHandler<DictionaryItemChangedEventArgs<K, VTarget>> itemChangedEventHandler,
 187:         EventHandler<DictionaryItemDeletedEventArgs<K, VTarget>> itemDeletedEventHandler)
 188:     {
 189:         foreach (var keyValuePair in sourceDictionary)
 190:         {
 191:             VTarget newValue = valueMapper(keyValuePair.Value);
 192:  
 193:             VTarget oldValue;
 194:             if (targetDictionary.TryGetValue(keyValuePair.Key, out oldValue))
 195:             {
 196:                 bool changed = true;
 197:                 if (valueUpdater == null)
 198:                 {
 199:                     targetDictionary[keyValuePair.Key] = newValue;
 200:                 }
 201:                 else
 202:                 {
 203:                     changed = valueUpdater(oldValue, newValue);
 204:                 }
 205:  
 206:                 if (itemChangedEventHandler != null && changed)
 207:                 {
 208:                     itemChangedEventHandler(targetDictionary, new DictionaryItemChangedEventArgs<K, VTarget>(keyValuePair.Key, oldValue, newValue));
 209:                 }
 210:             }
 211:             else
 212:             {
 213:                 targetDictionary.Add(keyValuePair.Key, newValue);
 214:                 if (itemAddedEventHandler != null)
 215:                 {
 216:                     itemAddedEventHandler(targetDictionary, new DictionaryItemAddedEventArgs<K, VTarget>(keyValuePair.Key, newValue));
 217:                 }
 218:             }
 219:         }
 220:  
 221:         List<KeyValuePair<K, VTarget>> itemsToDelete = new List<KeyValuePair<K, VTarget>>();
 222:         foreach (var keyValuePair in targetDictionary)
 223:         {
 224:             if (!sourceDictionary.ContainsKey(keyValuePair.Key))
 225:             {
 226:                 itemsToDelete.Add(keyValuePair);
 227:             }
 228:         }
 229:  
 230:         foreach (var keyValuePair in itemsToDelete)
 231:         {
 232:             targetDictionary.Remove(keyValuePair.Key);
 233:             if (itemDeletedEventHandler != null)
 234:             {
 235:                 itemDeletedEventHandler(targetDictionary, new DictionaryItemDeletedEventArgs<K, VTarget>(keyValuePair.Key, keyValuePair.Value));
 236:             }
 237:         }
 238:     }
 239: }

It’s essentially the original code, just made generic with the callbacks included. There are some additional Func<> delegates. valueMapper is used to transform the source value type to the target value type if they are different. Note there are overloads that do not require this and they just use a simple x=>x mapping. Also, there is a valueUpdater delegate as well. It is used to compare two values to see if they are really the same, which is useful if the value type is a class with properties and you want to see if those have changed prior to calling the changed callback.

Taking the initial example we can now do this:

   1: int countRemoved = 0;
   2: int countAdded = 0;
   3: int countChanged = 0;
   4:  
   5: peopleByIDTarget.Merge(
   6: peopleByIDSource,
   7: (dictionary, itemAddedArgs) =>
   8: {
   9:     countAdded++;
  10:     System.Diagnostics.Debug.WriteLine("Item " + itemAddedArgs.NewValue + " Added");
  11: },
  12: (dictionary, itemChangedArgs) =>
  13: {
  14:     countChanged++;
  15:     System.Diagnostics.Debug.WriteLine("Item " + itemChangedArgs.OldValue + " changed to " + itemChangedArgs.NewValue);
  16: },
  17: (dictionary, itemRemovedArgs) =>
  18: {
  19:     countRemoved++;
  20:     System.Diagnostics.Debug.WriteLine("Item " + itemRemovedArgs.OldValue + " Removed");
  21: });
  22:  
  23: System.Diagnostics.Debug.WriteLine(countAdded.ToString() + " Items Added");
  24: System.Diagnostics.Debug.WriteLine(countChanged.ToString() + " Items Changed");
  25: System.Diagnostics.Debug.WriteLine(countRemoved.ToString() + " Items Removed");

Generics combined with closures allows for some very rapid development.