I read an interesting article on how cheaper RAM is making BigData solutions redundant. This thoughts has been playing in mind for sometime. There has been a big hype about BigData tools primarily because even running through several hundreds of GB in memory was becoming a problem, forget about terra and peta bytes of data. This particular problem is especially faced by data scientists who want to build models and experiment with data. Not many organizations need more than a couple of 100s of GBs  of data for analysis (at least after doing some aggregation before the core analysis).

This problem has led to spawning of several BigData tools with multiple layers. This led to complex stack with lot of overheads ala MLlib on Spark on HBase on Yarn on HDFS on Java on Virtual Machine on Bare Metal. All this for doing task which are quite “immature” compared to the high end analytics data scientists actually want to do. The whole complexity drops if the memory available on a single machine increases drastically, which is what is happening right now. Simple desktops can have upto 64GB easily. Higher end servers are easily available on AWS or any other cloud solutions.

With high end memory most data science problems can be solved beautifully using traditional tools like R or Python. Thus data scientists can focus on actual analysis rather than engineering a complex set of tools.

 

Follow

Leave a Reply

Your email address will not be published. Required fields are marked *