Archive for April, 2011

muSOAing for 4/27/11 – Relax and be RESTful

April 27, 2011

Having always dealt with WSDL based SOAP services, when the RESTful mantra started to be bandied around a few years ago, I was really curious. I was of the firm belief that there should always be a firm contract between the caller and provider of the web service, even if the caller or client is internal to the organization. Having lived in that comfort zone, I was one for dismissing REST services as trivial and not to be taken too seriously. This idea of overloading the URL to send the metadata and then sending just the payload and processing it at the backend did not appeal to me a lot.

However with increased adoption and annotation support from frameworks like Jersey, I have started taking a serious look at REST. The ease with which one can churn out a service warrants a serious second look at this paradigm. I am of the firm opinion now that unless services need to be published for external (B2B) or enterprise wide consumption through a service registry, there is really no need to adopt WSDL based SOAP services. Where a contract is not of prime importance and you can do away with the overhead, REST services will suffice.

muSOAing for 4/17/11 – Write once Read Many?

April 17, 2011

One of the features of a Big Data setup is it’s Write once Read Many paradigm. Any Big Data infrastructure like Hadoop is still a data warehousing infrastructure used for analyzing historical information. Your relational store will still be your repository for ongoing OLTP needs with data being ETLd into your Big Data infrastructure. With data being written to file systems and being analyzed using map/reduce at the lowest level. Advocates encourage the use of higher level tools like Pig and Hive to perform analytics. These tools do execute map/reduce for you but provide you with higher level SQL like interfaces that you are already familiar with to issue your commands which are translated into map/reduce directives under the covers.

With the adoption of Hadoop increasing by the day across all verticals, the need in this area is only going to increase. It also has something for everybody, the technology nerd who can get started on the cheap to your CIO who can now have a multi-node Big Data infra up and running in no time and churning out useful and timely business analytics.

muSOAing for 4/9/11 – What is all this buzz around Hive?

April 9, 2011

It need not be stressed anymore that Hadoop has taken the lead in Big Data infrastructures. Nearly everyone I speak to has a Hadoop cluster installation. While Hadoop by itself has been quite ground breaking, the tools that are evolving in it’s ever growing ecosystem are even more interesting. Of these, I want to focus today on Hive. For someone from the relational world, exposure to Hive is like a Kid being let into a candy store.

Hadoop should still be viewed as a massive data warehouse on steroids which adheres to the write once read many paradigm. The data being still stored in HDFS and a bulk of the analytics being done in memory. Hive on the other hand acts as a layer over HDFS by providing two key features. One is it’s ability to map the HDFS metadata (file system data) as tables in it’s own relational meta store and the other key feature is providing a SQL like query language to run analytics on this metadata.

Another amazing feature is it’s ability to deal with multiple terabytes of information even possibly a few petabytes. As if this is not enough, along comes HBase with is massively distributed filesystem management system overlayed on HDFS, sort of a Hive on steroids.