The future of backend: less cost, more and better analytics and a more flexible design?
The future of server side: how will analytics revolutionize the server side design? Here at Eficode I work as a data analyst, but I’ve also had a lot of experience with the server side. Here’s my view on how the data storing will flip in the future.
First, here’s some terminology about the topic. ACID means Atomicity, Consistency, Isolation and Durability. It’s sort of a standard for data safety and that things are working correctly. NoSQL means Not Only SQL and usually refers to simple design, horizontal scaling and better scalability than SQL. The solutions for this are usually ranked as graph stores, document stores or key value stores. Some notable NoSQL stores are MongoDB, Redis, Memcache, Dynamo, Solr and CouchDB. Usually NoSQL solutions remove one or more components of ACID. NewSQL databases, then again, are traditional relational databases and are always ACID compliant. However, these databases have added performance and scaling from the NoSQL solutions. Some examples of NewSQL solutions are NuoDB, VoltDB and MemSQL.
Let’s start from the time before the NoSQL trend when we had multiple relational databases for multiple software. We were using MySQL, Microsoft SQL, Oracle, etc. This meant we had to keep juggling with systems that wouldn’t necessarily fit our solution or we could have been using the system completely wrong altogether and never knew about it. After this came the NoSQL trend which was, in my opinion, horrible. Everything was done with key value stores. This meant that things that needed SQL were done with NoSQL. We were doing relations with bad systems or we were throwing ACID away for things that could have been using ACID without having the need to sacrifice it. This was actually followed by good design of using tools as needed: using NoSQL to store event data and relational data to relational database. This led us to the big data era. Sadly we still hit a wall when using tools such as MySQL, MongoDB or Riak to store data and then we had to move it to Hadoop, Vertica or other big data tools. This meant having a lot of duplicate data. We also had a lot of NewSQL solutions that were working solutions to the old relational database model.
Currently we are using NewSQL, SQL and NoSQL as what we need. Now the future of server side, as I see it, is that we get rid of the useless storing layer. We are seeing a lot of ACID capable NewSQL solutions that work in memory: VoltDB is a good example. What if we would use in-memory to have the information that needs to be constantly up to date and usable in milliseconds, and use HP HAVEn or comparable to store history and stuff that doesn’t need to be updated every second? This would remove the process of moving from i.e. MySQL and MongoDB towards a flowing water model, where the event data goes straight to Hadoop. It then creates batch inserts to our in-memory for our software to show as new situations as we get them from our big data side.
So less cost, more and better analytics and a more flexible design. Sounds good to me, but what do you think? How should we approach the cost of infrastructure and duplicate data?