Thursday, September 22, 2011

SQL or NOSQL that is the question?

So what's the deal with NoSQL?

Is NoSQL just a controversial buzzword? Could you imagaine if the term 'Object Orientated' didn't exist and instead architectures based on concepts such as encapsulation, polymorphism and inhertiance were referred to as 'NoProcedural'?  Could you imagine if .net was called 'NoJava'?  Leinster was called 'NoMunster'?

Well controversial name aside,  a good way to appreciate the hype about NoSQL is to consider scalability - the classical non-functional architectural concern. In a classical OLTP architecture, when load increases and your JVM is under pressure, you need to scale.  You have two choices:
  • vertical scaling - adding more CPU power to your JVM
  • horizontal scaling - adding more JVMs (usally one more boxes)
It's generally never any problem scaling the business tier horizontally. Follow J2EE / JEE specs and unless you've done something crazy your business tier will scale.  Just add more JVMs and load balance between them.  However, while the business tier may be straighforward, the persistence tier ain't so easy.   Let's say you are using a classical relational database (such as MySQL, SQLServer, DB2 or Oracle) for your persistence, you can't just add database machines like you can add JVMs.  Why not?  Imagine trying to do SQL joins when tables are on the same machine and when the tables are on different machines! Imagine trying to do maintain ACID characteristics for your transactions when your database is split across various CPUs?  Now think trying to do all that on 5 machines, 50 , 500, 5000 machines?  The more machines the harder it gets.

The leading relational databases will scale horizontally.  But only by so much.  To get around this an architect usually will consider:
  • Scaling vertically - putting the database on the best hardware that can be afforded
  • Partitioning out legacy data and thus reduce things like the size of index tables. This will boost performance and put less pressure on the need to scale
  • Remove the amount of pressure on the database by caching more in the business tier
  • Pay a DBA a lot of money!
But what if you just run out of all possible database optimisations options and you have to scale horizontally? Not just to a few machines but to a few hundred if not thousand. This is where NoSQL architectures become relevant.

With a NoSQL database there is no strict schema.  Everything is effectively collapsed into one very fat table - a bit like an old skool flat file, but where each row stores a huge amout of data.   So, instead of having a table for Users and a table for Activities (representing User's activities), you put all the User information together in one fat row. This means there are no joins across tables.  It also means there is a lot of data redundancy which means more storage space required. In addition, more computational power will be needed for writes. But because data that is used together is located at the very same place - within the same row - it means no complex joins and hence it is easier to scale. The computational requirement for reads is also less.  So reads can go faster.

Another advantage of NoSQL databases is derived from the freedom that comes with not having to be tied to strict schema.  You know that headache where a change to a data model  can cause big problems? Well since there is no strict schema with NoSQL - this problem does not exist.  This makes the architecture more flexible and more extensible.

Right now, it's fair to say NoSQL is only relevant in the minority of architectures. But could this be another case of technical innovation driving business innovation as we have seen with smart phones?  There wasn't a need for smart phones but the technical innovation provided business opportunities. I think the same could happen with NoSQL Architectures.

Take a step back from Computer Science and just think Science.  Science used to be hypotheisis centric, now it is becoming more and more data centric. CERN, genome sequencing, climate change analysis - all involve tonnes and tonnes of data. Surely NoSQL architectures allied with searching technologies such as MapReduce / Hadoop will open up new ways to do Science?

So any disadvantages with NoSQL architectures? Well it's still an immature technology. Indexing, Security models are just not as sophisticated as they are with classical relational databases. And because most of it is coming from the open source community the support is not as good as it is for relational databases.  So don't throw out your SQL just yet!

PS Well done Dublin and winning the All Ireland!



References:
1. http://about.digg.com/blog/looking-future-cassandra
2. http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
3. http://nosqltapes.com/