Food for thought: A scenario for handling massive amounts of data

Today I overheard a very interesting discussion at the office.

For one of our customers we’ve built a massive CRM implementation based on CRM 2015 online. An implementation in which a large number of invoices is going to be created; at least 500.000 invoices per year.

In order to create those invoices, a massive amount of data in a large number of entities is required. Information that remain inactive in the system once the invoices have been created.

Millions and millions of inactive records will reside in CRM’s database: In my eyes a waste of resources.

The inactive data will clutter up the system; making it harder to navigatie, slowing down processes, eating up storage space. My prediction is that the performance of the invoice generation process will degrade seriously over time.

Discarding the data once it is inactive is no option, as for auditing processes (read: tax legislation) the data needs to be verificable. In the Netherlands companies are obliged by law to keep a history of 7 years of their financial transactions.

Even though the CRM implmentation is running on top of CRM 2015 online, performance is a serious concern. We face a huge gap, a gap beween speed versus auditability.

  • For speed you want the CRM database as compact as possible.
  • For auditing / history you want to keep all data ever created.

One of the scenarios that crosses my mind is offloading data. Just keep the active data in CRM and move the inactive data to an external storage. A good moment in time could be the fiscal closing of a year.

For external storage multiple options are available, such as SQL Server or Azure Table Storage.
Troy Hunt wrote a great article on his blog in which he describes how to handle 154 million records that made me an instant fan of Azure Table Storage.
Which storage option is the best for you, depends on the requirements you have to meet.

The big question is: “What do you want to do with the data?”

Regardless of storage option, in general the data offloading process and the data retrieval process will look as follows:

Offloading data


As for the inactive CRM records that need to be eliminated from CRM, I would transform the data into a flat denormalized structure resulting in a set of records with all attributes present, eliminating the entity relations.
In our case we could have a record set with invoice data, and a record set containing the data that is used to create the invoices. On both record sets we could add a column with a retain date.

Using the retain date we can implement a process that will clean up the data once the legal date expired. Ensuring that only that data is stored that should be stored.

Once the CRM entity data has been transformed and committed in the Azure Storage table, the entity records can be removed from CRM. Freeing up space, keeping CRM fast and small.

In order to facilitate the data transformation and offloaded data clean-up, we can build a couple of Azure jobs. The jobs can either be scheduled or be triggered by a process in CRM (e.g. custom workflow activity).

Retrieving offloaded data


In an ideal world we could retrieve the offloaded records in exact the same fashion as the entity records stored in CRM. Dynamics CRM does not offer this functionality so you would have to design it yourself. In this case one or more custom actions that retrieve the data for us and display it on a custom form or on a custom web control would do the job.

The big why?

Writing this article one question is crossing my mind: Why doesn’t Microsoft offer an out-of-the-box data offloading functionality within CRM?

I think we are not unique in building massive CRM implementations in which we store massive amounts of data. The SQL Server powering Dynamics CRM is powerful but will be lacking horsepower when it comes to handling massive amounts of data (one of the shortcomings of a relational database).

In the years to come, the amount of data to store will only increase. Especially when you take into consideration that in the years to come machine to machine communication will be the domaninant information stream on the web (Internet of Things).

Dynamics CRM as we know is not suited for this (in terms of connectivity, speed and storage). Fortunately for us, the Azure platform can provide us with all the tools and technologies we need to handle the increasing amount of data.

Technologies that will also help us to take the hurdle that my colleagues and I are going to face in the coming time: “How are we going to handle the massive amount of inactive entity records that need to be retained, while keeping CRM speedy?”

Food for thought…