– Jack Norris, Vice President, Marketing, MapR Technologies, says:
Big Data analytics
The demand for Big Data analytics is challenging traditional data storage and processing methods. Enterprises are capturing large amounts of data from many new sources, like the web or social media, and now need a new tool to rapidly process and analyze the mix of structured and unstructured data. Traditional data warehousing technology was not designed to deal with this kind of volume or blend of structured and unstructured data. Instead, organizations are tapping into Hadoop’s power to provide the rapid processing and analytics to unlock the informational value of their data.
Hadoop applications are increasingly moving out of the lab and into production environments. The shift coincides with a push for Big Data analysis to move from batch oriented historical analytics, to processes that deliver more immediate results. Enterprises are looking to tap into their data as quickly as they capture it and use it to add more value to their products and services. As organizations look at leveraging Hadoop for Big Data analytics, enterprise-level requirements are increasingly important to many organizations.
Enterprise readiness constitutes critical elements such as high availability, security, data protection and also multi-tenancy capabilities that ensure the ability to effectively share a cluster across disparate users and solutions.
When evaluating a Hadoop solution, understanding where a cluster is vulnerable for failure and what can cause downtime is essential. There are alternatives that offer automated failover and the ability to protect against multiple and simultaneous failures.
With multiple users trying to access and change data on the cluster, there is a need to ensure proper authentication, authorization as well as isolation at the file system and job-execution layers. Not all distributions provide these capabilities.
Using point-in-time snapshots to restart a process from a point before a data error or corruption is something enterprise storage administrators have come to expect, however, snapshots are not available in all Hadoop distribution offerings. This is also true for mirroring. In order for Hadoop to be included in an enterprise’s disaster recovery plan remote mirroring must be supported.
Multi-tenancy support so that different users and groups can effectively share a cluster is a required feature for most large deployments. Sharing of the cluster allows for efficient utilization of the storage and computational power of Hadoop. This requires features such as segregated user environments, quotas and data placement control which is available only with some Hadoop distributions.
Meeting all of these enterprise level requirements while constantly upgrading features with new analytical capabilities will ensure Hadoop’s strong position in the Big Data industry. The newly emerging data analytics processes driven by Hadoop will push well beyond what is currently possible with Big Data. Hadoop has emerged as a standard platform and soon will be found in all the well-established, production data centers of the Fortune 1000.
About the Author
Jack Norris, vice president, marketing, MapR Technologies
Jack has over 20 years of enterprise software marketing experience. As VP of Marketing at MapR Technologies, Jack leads worldwide marketing for the industry’s most advanced distribution for Hadoop. Jack’s experience ranges from defining new markets for small companies, leading marketing and business development for an early-stage cloud storage software provider, to increasing sales of new products for large public companies. Jack has also held senior executive roles with Brio Technology, SQRIBE, EMC, Rainfinity, and Bain and Company.