Bucketizing – A simple approach for solving hidden memory issues

Sometimes, seemingly simple loops may hide memory consumption bugs. Let’s look at the following C# code snippet that’s responsible for doing maintenance on a list of users.

long[] userIds = GetUserIdsForMaintenance();
using (DbContext dbContext = new DbContext())
{
    foreach (long id in userIds)
    {
        User user = dbContext.GetUser(id);
        // ... Do maintenance on user ...
    }
}

As implied, each dbContext.GetUser(id) creates a DB call that fetches a User. Many popular O/R Mapping frameworks, such as Entity Framework or NHibernate, utilize a caching mechanism when fetching entities from the DB, so in our example all the fetched Users might be cached by the framework in its first-level cache (More about first-level caching: Entity Framework, NHibernate).

When our userIds list is very long, this cache can quickly fill up to a point where we run out of memory and receive an OutOfMemoryException.

How Bucketizing can help memory issues

One way to avoid these memory issues without turning off the caching feature is to periodically clear the cache before it fills up.

An easy way to do that would be to split our userIds into buckets and for each bucket to initialize a new DbContext instance:

IEnumerable userIds = dbContext.GetAllUserIds();
foreach (IEnumerable idBucket in userIds.Bucketize(5000))
{
    using (DbContext dbContext = new DbContext())
    {
        foreach (long id in idBucket)
        {
            User user = dbContext.GetUser(id);
            // ... Do maintenance on user ...
        }
    }
}

What we see here is a new extension method called Bucketize that splits the long userId list into buckets, each containing 5,000 IDs.

When handling each bucket, we are creating a new instance of DbContext. This effectively clears the cache of the old DbContext instances by letting the garbage collector collect the entire object and free all of its memory.

What does Bucketize code looks like?

public static IEnumerable<IEnumerable<T>> Bucketize(this IEnumerable vals, int bucketSize)
{
    var currentList = new List();
    foreach (var element in vals)
    {
        if (currentList.Count == bucketSize)
        {
            yield return currentList;
            currentList = new List();
        }
        currentList.Add(element);
    }
    if (currentList.IsEmpty())
    {
        yield break;
    }
    yield return currentList;
}

As you can see, Bucketize is an extension method for IEnumerable which utilizes the yield keyword in order to retrieve the next bucket when needed, and not iterate on the entire collection.

“Bucketizing” large data collections can help us overcome memory issues that are sometimes hidden behind seemingly simple-looking loops.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s