Its very highly likely you will not need Hadoop

Do you have peta bytes of data (1,000,000 GB?)
Are you willing to wait in long queues even before simple queries get answered?
Are your computational requirements embarrassingly parallel?

If the answer to all of the above questions is “true” then sure go ahead with Hadoop. Even in such cases there could be alternatives. For simpler problems there are much better ways to solve. Here are some alternative solutions:

Upscale by taking more memory, processors and storage in a single machine
Use a scalable unstructured/NoSQL DB.. strongly recommended is MySQL cluster
Scaling out can be done using Virtual Machines as well
Most programming frameworks will have distributed computing capabilities, which can implement Map Reduce without heavy frameworks
Try faster alternatives to Hadoop like Apache Spark for realtime analytics
Check out cloud solutions like Google’s BigTable

Happy big dataing 🙂

Leave a Reply Cancel reply