muSOAing for 10/14/11 – The Big Data Universe

It seems that with each passing day,  the big data universe keeps growing and one sees new offerings in this space.   With all this noise and clutter, how is one to formulate their Big Data strategy.    It seems that a few patterns of Big Data usage are emerging.  The basic tenet is still the same which is MPP on commodity servers.   The code is co-located with the data and is processed and sent back to a master node.

This being the paradigm,  there are a plethora of offerings now in the Big Data space.   You of course have the Hadoop Universe with HBase, Hive, Pig etc.    Other opensource platforms like Cassandra.  Then you have the commercial ones like MongoDB,  Couchbase,  Allegrograph, MapR, LexisNexis…   You then have this third category of vendors who were erstwhile purveyors of traditional data management technologies like Teradata, EMC and Oracle now trying to re-invent themselves as Big Data leaders with offerings like AsterData, GreenPlum and Exadata.

So how are these folks getting mindshare.   Here is my take on this.   Hadoop is still the bread and butter of Big Data computing.   The barrier for entry is low,  anyone can download and play around with the various distributions from Apache and Cloudera and it’s ecosystem of products like HBase and Hive can and are solving real life data management problems across verticals.   This is evidenced by the various implementations that are running in production today and Yahoo is probably the best example.

The MongoDBs and Couchbases of the world seem to be solutions that are not so much general purpose but aligned with specific verticals like Advertising, Telcos or Healthcare.   Given the unique requirements of these domains,   the vendors have built the value added layers on top of the basic frameworks in the form of search and aggregation algorithms so for now they probably can be viewed as purpose built Big Data platforms aligned with specific verticals.

At the other end you have  Teradata/AsterData,  EMC/Greenplum and Oracle/Exadata.   These can be viewed as vertically integrated solutions that have everything you need in a box, kind of like a Big Data Happy Meal.    For all intents and purposes it is a blackbox.  You get this refrigerator size Big Data appliance that will have everything you need,  the software, the processors and storage all packaged into one based on proprietary standards.     These would be useful probably for folks who have been using the traditional offerings from these companies and now want to upgrade their existing infrastructures to support Big Data paradigms.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: