- Do you have peta bytes of data (1,000,000 GB?)
- Are you willing to wait in long queues even before simple queries get answered?
- Are your computational requirements embarrassingly parallel?
If the answer to all of the above questions is “true” then sure go ahead with Hadoop. Even in such cases there could be alternatives. For simpler problems there are much better ways to solve. Here are some alternative solutions:
- Upscale by taking more memory, processors and storage in a single machine
- Use a scalable unstructured/NoSQL DB.. strongly recommended is MySQL cluster
- Scaling out can be done using Virtual Machines as well
- Most programming frameworks will have distributed computing capabilities, which can implement Map Reduce without heavy frameworks
- Try faster alternatives to Hadoop like Apache Spark for realtime analytics
- Check out cloud solutions like Google’s BigTable
Happy big dataing 🙂
Follow