When NoSQL first came on the scene, it was touted by many as the new kind of data store that solved many of the problems that RDBMS couldn't. We had shoehorned RDBMS solutions into problems they were unsuitable to address, and here was an alternative that could help us out. Many years have passed, and I'm concerned that all we've achieved is changing our shoe... with the shoehorn still firmly in hand!

One of the difficulties with talking about NoSQL is that there are a large number of vastly different data stores grouped together under the same umbrella — not because of their similarities, but because of what they're not. The strength of NoSQL is that there is such a diverse range of storage options available that simply didn't exist a few years ago. We have columnar stores, graph stores, key-value stores, document stores, a bunch of hybrids, and many others not named here. Each of these stores addresses a particular need in a specialised way, with tradeoffs that play to the advantages of the intended usage pattern. This is all great, as it creates a rich ecosystem of data stores to choose from — rather than the old world of an ecosystem of rich vendors to choose from!

Fast-forward a few years... and where are we today? Many projects still shoehorn a single product into all their needs — just as they did a decade ago. We still force Cassandra to work for all our needs, or MongoDB, or whatever was chosen at the beginning of our project as the 'preferred data store' — regardless of whether it is a good fit or not. In fact, many projects don't even bother to evaluate data stores... they simply pick the favoured product up off the shelf and carry on as normal.

We evaluate programming languages... should we use Java or Scala? We evaluate 3rd party libraries... should we use Spring Boot or Dropwizard or Play Framework? We evaluate testing tools... is it Mockito or EasyMock this time; what about Cucumber? But the one thing that very few teams even consider is whether they're using the right data store.

So what am I saying?

To avoid the technology extremists putting words in my mouth... I'm not suggesting we use a different data store for each project. I'm not even suggesting we use the best data store for the given scenario. What I am suggesting is that we have a responsibility to evaluate our options and make conscious, justifiable tradeoffs. Of course there are many valid reasons to reduce the number of data stores you have to operate and maintain, and the number of skills you need in your teams. But I'm confident that there are many teams that cannot articulate why they're using their particular data store... other than "because that's what we use". They're becoming the people they were pointing the finger at only a few years ago — and they don't even realise it.

It is never acceptable to answer a question with "that's the way we've always done it". Firstly, it demonstrates that you don't understand the reason why you're doing something — you're just blindly following without challenge. Be careful... robotic behaviour can and will be replaced by a robot! Secondly, it creates dinosaurs — old systems with stale-minded people that refuse to change. No business can survive if its employees refuse to change. Certainly don't change for the sake of it — but if you don't seek to understand the motivation behind decisions and know when the circumstances that led you to making those decisions are no longer valid, then you'll never change!

So what are the risks?

If we don't evaluate the options when we have the opportunity, we won't realise when the world has changed beneath us. Our choice of Product A might well have been the right decision at the time, but is it still the right decision? Have we suddenly expanded to multiple data centres and now need a data store that handles XDC consistency? Did we start out write-heavy and are now read-heavy? Did we expect to always retrieve by unique key, but are now searching on a wider range of data attributes? Is Project X really that similar to Project Y to justify using the same data store? Or even worse, was our data store only chosen because that's what the first people on the project knew best? Did they even consider alternatives?

We may evaluate the options and still choose the same data store... that's perfectly fine. We may even choose it knowing that it's not the best fit... that's perfectly fine too. On condition that it is an informed decision and can be justified! There are many factors to consider, including availability of development skills, maintainability, performance, security, operational support, disaster recovery, 3rd party support agreements, and loads more. All of these play a part and their weighting will vary from project to project and from organisation to organisation.

So what do we do?

My advice is simple and obvious... apply your mind to the problem at hand and make an informed decision. Don't expect to always get it right, and don't expect that it will never change. But do the best you can and act responsibly as a software engineer, tester, architect, or whatever you role is. You may not always agree with the outcome, but you can always present your case and be confident that you've fulfilled your professional responsibility to your employer or client.

Be aware of your biases. I have my favourite data stores, just like most others I know. More often than not, our favourites are the ones we know best — seems obvious really. Don't let that cloud your judgement. Get others involved that do not share your views and work together to determine the best outcome objectively.


30 April 2015