Ever wondered what it means to use NoSQL or SQL? What's the difference? Here’s a little post to help you identify a suitable solution: if not the perfect fit, at least a good one – and the ones you should not consider for your specific task.
A while ago I participated in a meetup focused on Node.JS and Database. Once again I noticed that most questions from the audience concerned the basic concepts of NoSQL and SQL. Databases are currently in a huge flux as they are being developed, open sourced, combined and so on. Just four years ago, having a ten minute conversation about three to four databases would have made you a serious DB consultant. Today, there are hundreds of databases – DB-Engines page, for example, lists 277 systems – each with a specific purpose, with its unique pros and cons. As a database administrator or a backend developer, you should be familiar with at least the top 10 DB’s just to understand the different solutions and the reasons behind using them in projects.
ACID means that the database should follow the principles of Atomicity, Consistency, Isolation and Durability. This means that after the database verifies that the data is inside it, it will stay there. CRUD by computer programming means create, read, update and delete. For a database to qualify as CRUD, you need to be able to run these commands. Transaction is a coherent and reliable way for an independent query to work against a database. Transaction represents any change in a database. CAP means Consistency Availability and partition tolerance. Achieving all three with distributed systems has been deemed impossible. This model has received a lot of criticism and would require its own blog post to explain of those conversations in depth. However, our partners at VoltDB, for example, have written a good blog post about CAP.
First group of databases are "pure" SQL engines. Most of these use transactions and support ACID and CRUD. These are your SQL servers, MySQL, PostgreSQL, MariaDB and similar databases. These usually follow relational mathematics and are queried using SQL queries. When creating a system, they are a safe bet, having a track record of 30 years with standardized interfaces. SQL is standardized by both ANSI and ISO. You can learn more about SQL here. The problem with these databases, however, is that they are not scalable or scaling is hard and will consume your performance. Therefore, scaling SQL engines normally means just putting more hardware to work on it. SQL engines are also highly configurable. This poses a problem: if you configure it wrong you might end up crippling the whole database.
Let’s get to the part that is a bit more modern. This means NoSQL, or rather the multiple parts that NoSQL constitutes of. The term NoSQL in itself comes from words “Not only SQL”. The databases which belong to it are categorized as wide column, document based, graph and key/value databases. There are also some newcomers that haven’t really been classified as NoSQL, but aren’t really SQL databases either, such as time series and event based databases. Both of these sort of fit under the term NoSQL, but they haven’t been officially qualified as NoSQL database types yet. Wide Column databases Wide column databases (e.g. HBase, Cassandra and Accumulo) are used to store extensible records and data with the ability to hold a lot of data within dynamical columns. They can be seen as two-dimensional key/value stores that have no fixed fields. You could argue that Vertica Flex tables fit into this category as well. These databases were originally developed by Google with their BigTable publication and later by their cloud service, which gives you access to the Google BigTable. Graph databases Graph databases are probably the closest thing to relational databases or maybe even surpass them in certain aspects of relational connectivity. Graph databases are heavily designed around joins. They work with heavily joined data such as social graphs or alike and are built with the mindset that there will be a lot of joins in the dataset. Most known such databases are Neo4j and OrientDB. The challenge with these databases is that they don’t offer indexes on all nodes. Moreover, direct access to the nodes based on value is not possible, which makes them work quite like key/value in that aspect. Key/Value databases Key/value databases, such as Riak, Redis and Memcached, are used to store values that are fetched by knowing their key. They are extremely fast due to their simplistic design. Document databases Document databases usually store JSON or XML objects, but can also be used to store other kinds of documents, depending on the database. These are usually schema-free and can have nested objects. Most well-known document databases are probably MongoDB and CouchDB.
My view on NoSQL
NoSQL’s biggest problem is that it’s not SQL: you have to use new API’s, and these API’s change a lot from version to version. Moreover, analytical capabilities with most of these databases fall short as you must hack your way through to the analytics. Compared with mature SQL technologies, these technologies are still on the path towards growth. However, great innovations in this field have already been made, and one can assume to see more in the near future. The biggest effect that NoSQL brought, however, was the fact that SQL developers realized the need for scaling out and other distributed needs for traditional database management systems. This created a new category called NewSQL which includes databases such as VoltDB and NuoDB. These are databases that focus on bringing SQL to highly scalable environments and increasing performance while adding nodes.
After working with all of these, I must say that all of the technologies still hold their place in backend development. It would be hard to prove one to do a better job than the other in its own special purpose. Personally, for example, I prefer working with SQL, due to always designing analytical features to the backends that I create. On the other hand, I have seen awesome softwares and analytics being built over NoSQL too. To sum up, your path towards mastering databases relies on your ability to identify suitable solutions for specific purposes. When designing software, you should therefore always ask yourself whether you are storing data as key/value, document, graph or SQL style. This will ultimately determine how your database connection works.