Archive for August, 2010

muSOAing for 8/30/10 – NoSQL

August 31, 2010

As data and information proliferates, so does the NoSQL movement. NoSQL is basically dealing with data that is not necessarily stored in a traditional relational data store like a relational DB. With the proliferation of grid based and MPP architectures, a lot of new techniques such as parallel processing of queries to run on each node of the distributed data stores are now the norm.

Of these Map/Reduce seems to have the largest mind share. Platforms like Hadoop and Hadoop based implementations make extensive use of this. Also, similar grid based architectures from commercial vendors like AsterData and GreenPlum have enhanced implementations that marry the best features of SQL with Map/Reduce to come up with SQL-MR.

If you thought SQL was cool, then wait till you see what Map Reduce can do. It is really SQL on steroids. It is text processing elevated to the nth degree, supported by a truly multiplexed, parallel processing engine that can execute at lightening speeds and collate the results for you with great speed. In put it in plain terms, kinda like the google search results you get based on the search keywords you submit. In fact BigTable is the first Map/Reduce implementation and it was Google who invented and pioneered this and later on published this algorithm.

I feel that this field is going to or has already created a whole new career group and you in future you are increasingly going to see requirements for positions like “Big Data Architect”, “Hadoop Programmer” etc. Data and Information is only going to increase and expected to reach the Zetabytes per day stage very soon, if it has not already. So there are going to be challenges in all the areas of information management ranging from Transferring, Storing, Mining and Analyzing. So are you ready for Big Data?


muSOAing for 8/26/10 – Infosphere

August 26, 2010

Big Data and it is on everyone’s lips. It is ubiquitous and yet few know what it is and how it works. Well, given the rate at which levels of information is going to explode then you better get your act together.

Forget petabyes now we are in the zetabyte range and according to CW, we will be generating that much data in a really short period of time and that is a lot of data. Now imagine having to mine this ocean of data and convert that into meaningful Information and then onto Intelligence so you get the picture.

The Big Data Analyst/Architect is going to be one of the hottest positions in demand in the coming years and it is already beginning to get that way. After all one of the goals of Google is to digitize all the books and create this immense store of knowledge. I would say that it is a very noble goal indeed.

muSOAing for 8/25/10 – Mesosphere

August 25, 2010

A small segway, Computerworld that very respectable rag that reports on IT trends has in it’s issue of 23rd Aug, predicted a list of hot careers for 2020 and among them include Cloud Enterprise Architect, Cloud Capacity Planner, Cloud Infrastructure Administrator and Cloud Integration Architect.

Very true I say but I hasten to add, why wait till 2020, some of these positions have already been created and will be hot and in demand well before 2020. The position of Cloud Enterprise and Integration Architects are proliferating and even though folks playing those roles are not realizing it, they are playing dual roles. One role that is anchored in the traditional world and as applications are being hived off to be executed elsewhere, outside the four walls of the company, unconsciously they are also wearing the hat of Cloud Architects.

To this I would like to add one more position and that is of Data and Information Architects which will become as important as these other roles. Why? Well that will have to wait for the next muSOAing.

muSOAing for 8/22/10 – Stratosphere

August 22, 2010

To Big Data or not is the question. Being a man of letters and hence data, even shakesphere would have run into this conundrum. So the shakespheres of the present digital world who are the folks who belong to the traditional DW/BI world are presented with exactly that.

The term Big Data is really a misnomer, why I say that requires some explanation. While the term may be appropriate when it comes to dealing with vast amounts of information, given it’s inherent distributed DB architecture, it may not be when it comes to advanced analytics as this can be availed of by datastores of any size big data or not. The lynchpin of these Big Data stores is really the distributed architecture and the the Map/Reduce let algorithmic analytics which is paralellized by the Big Data engines to have truly multiplexed query and data mining architecture.

So this getting interesting enough for you, well wait till we start exploring the gory details of Map/Reduce. Hardcore UNIX addicts will rejoice over this Woodstock for UNIX is being played.

muSOAing for 8/19/10 – Troposphere

August 19, 2010

The whole Big Data space has become a very cool one to be in. It deals with all the aspects of data right from mining it, moving it to your big data platform, managing it and then mining it and all of it is done very differently from your traditional way. A whole other ecosystem has evolved around this concept of storing and accessing big data sets fueled primarily by algorithm and heuristic driven paradigms like Map/Reduce, platforms like Hadoop and tools like Pig, Hive and similar implementations like AsterData, Karmasphere etc.

It is a happy amalgam and confluence of paradigms like MPP, Grid Computing, Algorithms like Map/Reduce, Java, SQL, UNIX like text processing, Inter Process Communication, Data Structures and all the other cool stuff some of which you dealt with in your early college years on a Green Phosper UNIX terminal. The days and nights you spent studying sed, awk, grep, pipes, shared memory, linked lists, semaphores, threads, processes, regular expressions. UNIX was and still is the programmer’s nirvana compared to the dull intellectual wasteland of windoze.

All of it has converged in this utopian landscape of Big Data only pumped up several levels up with addition of Java, commodity servers, innovation, hungry, nerdy, smart and UNIX fueled programmers, venture capital, california wine, sunshine, pale ale, mountain air, sunny beaches…..

muSOAing for 8/17/10 – Got Big Data?

August 17, 2010

Big Data has arrived and is now mainstream. While a lot of folks are still grappling with the intricacies of traditional Relational DB stores and problems like DW and getting actionable intelligence that can be used for business agility, many do not realize that Big Data can be a true panacea for all those worries.

There are several aspects to Big Data management starting with the migration of data from traditional DB to Big Data stores. Then having data stored in BD stores my itself has several advantages associated with distributed DBs such as partitioned data, parallel queries, data distribution, Queen/Worker architecture, grid based computing, dynamic server provision etc.

The true power inherent in such infrastructures can really be summarized in two words Map and Reduce. The real power of actionable BI is truly Map Reduce. To explain Map Reduce in a short sentence, it is truly harmonius confluece of Java, SQL, Procedural SQL and UNIX Regular expressions. It is SQL on very heavy steroids. It’s power really cannot be underestimated.

More on this later.

muSOAing for 8/9/10 – Atmosphere

August 9, 2010

One very common question these days is “What is the Cloud?”. It depends on who is asking this question and you have to craft your answer accordingly. I have broadly classified users under these categories.

So what does the cloud mean to an,

End User: Service available on Web/Internet, no S/W or H/W to own and pay per use or monthly subscription payment models
Developer: Development Platform (S/W tools and H/W) provided in internet, no S/W or H/W costs and everything built and deployed in the cloud
CIO: Infrastructure (S/W, H/W) to build and run apps provided off premises in cloud so no capital costs and Server Room is empty
Cloud Service Providers: Provide grid based virtual and elastic environment to support SaaS, IaaS and PaaS platforms that grow organically based on user needs
Cloud Integrators: Understand the various Cloud Offerings and build tailor made solutions for customers

muSOAing for 8/4/10 – Microclimate

August 4, 2010

Big Data infrastructures make for some interesting study. They differ very vastly from your traditional Relational DB architecture. Here are some key characteristics of Big Data Databases

– Designed as Master/Slave Architectures with Queen and Worker Nodes
– Designed to deal with myriad data sources including raw, text, unstructured and relational
– Column oriented rather than row oriented
– Capable of expanding elastically in a grid based architecture along with growth of data
– Designed for real-time operations and heuristics for on-demand results

These are just a few key characteristics and there a whole lot more that set these apart from your MYSQLs and Oracles of the world. Big Data Analytics is another interesting area and there are a lot of players in this arena. The idea is to cater to both real-time and historical analysis and there are a few interesting quirks that come into play when you are dealing with terabytes of information.

muSOAing for 8/3/10 – Sleet

August 3, 2010

Big Data, Schmig Data. This is the common refrain. A lot of the folks from the Traditional DW/BI world tend to dismiss this outright by bandying terms like There is no such thing as BigData or BigData is Big Lie etc. I have got news for you folks, Big Data is really big, getting bigger by the minute and is here to stay to hop on the bus or be left out forever.

Such is the case that a whole new ecosystem has evolved around this. Starting with Software (HBase, BigTable, AsterData, GreenPlum, Pig, Hive, Pentaho….), Server and Storage (This is increasingly based on Grid, MPP as it has to be elastic). BD is a big component of the ever expanding Cloud family. When talking about BD we are beyond the realm of Relational DBs and even VLDBs so now we are talking a lot of Terabytes and even Petabytes of information.

These days with even small organizations generating enormous amounts of data, gleaning meaninful Information and Intelligence from the bits and bytes and using them as strategic business weapons will mean the life and death of that company. Data has always been the life blood of any organization but the constant challenge has been to make meaningful sense out of it to grow the business. Now with these host of tools and infrastructure that dream is increasingly being realized but there are still significant challenges.

More on this and typical architectures of Big Data later.