Big Data: Hadoop Still Dominates

Posted by on in Big Data

Big data continues to drive the growing trend among businesses to rely less on intuition and more on data analysis when making business decisions.

The promises of better customer segmentation, improved customer service, more efficient product development, and the discovery of meaningful customer insights has the C-Suite planning new Big Data implementations going forward. CIOs continue to focus on the major developments in big data technology as they continue to
evolve. Increasingly, Hadoop has become the technology of choice when CIOs consider new Big Data initiatives.


We recently showcased the advantanges and disadvantages of using Hadoop ( NoSQL ) database technologies:

 

 

NoSQL - In-memory non-relational databases
These don't support the SQL language (hence the name) but more significantly don't support ACID or relationships between tables. Instead they're designed to query document data very quickly.
Examples: Hadoop, MongoDB, CouchDB, Riak, Redis, Cassandra, Neo4J, MemBase, HBase, etc

 

Benefits:
Cheap, mostly open source implementations. Systems can scale out very easily, tables can be readily sharded/federated across servers.
Most store native programmer objects, so no translation to tables.
Very, very fast at finding records from massive datasets.

 

Problems:
No common model and there is quite a bit of differences between the many solutions.
No ACID guarantees, instead high fault tolerance must be built into the application.
Transactions are at the row level only (if supported at all).
Poor at aggregation - where an RDMS solution would use SUM, AVG and GROUP BY a NoSQL solution has map-reduce, which (some minor optimizations aside) has to do the equivalent of a table-scan.
Poor at complex joins, although arguably this is something you'd design differently for.


In a recent survey, IDC asked businesses about their plans for implementing a Hadoop based project. The data shows those enterprises seeking a big data solution will most likely select Hadoop. Among the businesses surveyed, 32 per cent reported already deploying a Hadoop based system.

 

An additional 31 per cent said they planned to deploy Hadoop within the next year, and 36 per cent said they planned to deploy Hadoop but it would be more than a year before it was deployed.


Why Hadoop?
The survey found the primary motivator to be the advanced tools for analysis of raw data including both operational and transactional data
and the capability to associate it with customer behavioural data sets.


The participants second reason for selecting Hadoop was described as modelling "if-then" scenarios of products and services.


Other motivators mentioned included replacing an older data warehouse and using Hadoop as a platform for Web analytics and content-sharing.

 

Business benefits are, of course, the ultimate measure of whether a Hadoop solution is a worthy investment or not. When asked what the quantified business benefits of their Hadoop solution were projected to be, 82 per cent of those surveyed were able to provide a numerical range. The most common response (about 24 per cent of those surveyed) was that the business benefit would range from $5 million to $9.9 million.

 

A common pattern the surveyors found was that businesses tended to use a mix of databases in order to complete big data analytics. NoSQL databases, such as HBase, were most commonly mentioned as being used in conjunction with Hadoop after the traditional database that most businesses had already been using for many years.

 

Another trend surveyors noted was the use of commercial options of Hadoop, rather than the open source version, which requires a lot of manual setup and maintenance. Respondents said the main criteria they used when selecting a Hadoop platform was support, management and storage costs. For respondents who used Hadoop for critical business operations or sensitive data, using an alternative system to HDFS, similar to what MapR offers, that was more secure and reliable also became important.

 

Finally, the survey addresses how businesses handled data security in their Hadoop solution. A large majority indicated that sensitive information is removed before it is imported into Hadoop and after it is exported from Hadoop. Common security measures in place include storage-based cloning or snapshots and data protection software.

 

Implementation Considerations

An interesting implementation fact in most successful Big Data systems show that that the value of an individual piece of data decreases with time and the value 

of a collection of data rises with time. Additionally, the value of aggregated data should continue to increase over time, and closing the gap in the time taken to extract, transform, and load a data item will increase the value of the data more rapidly as the system tries to approach the theoretical concept of real-time decision making. Like many well engineered systems, the closer we get to zero defects and real-time processing, the more expensive the implementation becomes for the system owner.

 

So how do we most effectively achieve our Big Data decision making objectives given the tools available today? By selecting the proper database management tool that most closely matches our analytical decision making requirements. Increasingly that tool seems to be Hadoop.

 

 

Enjoyed the article?

Sign-up for our free newsletter to kick off your day with the latest technology insights, or share the article with your friends and contacts on Facebook, Twitter or Google+ using the icons below.


E-mail address
Tagged in: analytics CIO Hadoop

Bill has been a member of the technology and publishing industries for more than 25 years and brings extensive expertise to the roles of CEO, CIO, and Executive Editor. Most recently, Bill was COO and Co-Founder of CIOZone.com and the parent company PSN Inc. Previously, Bill held the position of CTO of both Wiseads New Media and About.com.

Comments