Tuesday, January 31, 2012

Hadoop and MapReduce

Hadoop and MapReduce continue to pick up steam and generate "noise" in periodicals and online publications. As an Information Management Architect I get asked about the value of Hadoop and the impact to existing information and architecture roadmaps.

As such, a short entry to clarify the purpose of Hadoop, which I view as a complementary rather than a competing solution (assuming there is a majority of structured data to be managed) to addressing the need for highly distributed parallel cloud computing for unstructured and structured data in high volumes requiring fast performance. Thus the Yahoo, Facebook, and Google's of the world success stories and implementations.

Some of the drawbacks as of the day of this post include open-source components maturity & support (HDFS, HBASE, MapReduce, PIG, HIVE, AVRO, SQOOP, Chukwa, and Zookeeper), security, backup/recovery, integration to other systems, weak consistency, specialized resource skills/cost, and existing resource skills retraining.

Based on the expanding market, many of the enterprise Business Intelligence vendors (Informatica, SAP, etc..) are building plug-ins to ensure support of Hadoop in order to protect and expand their client base.