MapR

NoSQL

– M.C. Srivas, Co-founder and CTO, MapR Technologies, says:

Increasingly, organizations are turning to the practice of using clusters of commodity servers in order to build low-latency Big Data applications. NoSQL (Not Only SQL) technology is a compelling alternative to relational database solutions, due to its ability to store and process large amounts of loosely structured data.

There are different types of NoSQL databases, and some are more enterprise-ready than others. But at the core, they all rely on distributed processing by sharding (splitting) the data across multiple commodity servers. Let’s take a look at the implications of NoSQL databases across two key operational aspects of the data center:

Cluster Deployment and Expansion

It’s relatively easy to deploy a NoSQL database system on a few machines when you’re just getting started.  However, it can quickly become more complex once you start expanding your cluster. Some NoSQL databases come with a ring topology that recommend doubling the cluster for easier shard management, while, on the other hand, there are others that do not scale very well at all. Hadoop-based NoSQL databases, another category, provide scalability by relying on the underlying Hadoop platform for storage and processing.

Daily Operations

How do you manage NoSQL databases on a daily basis? Reliability, performance consistency and data consistency play key roles in your ability to manage daily operational tasks:

Reliability – System reliability and how the database is intrinsically architected are key factors for meeting SLAs and in determining how much administrative pressure you will face on a daily basis. Choose wisely.

Performance Consistency – Most databases use LSM trees (log-structured merge-trees) to manage disk writes, which can lead to huge I/O storms in the system. Some NoSQL databases require you to run manual compactions, typically over the weekend to ensure that performance is back up for the following week. Other types of NoSQL databases attempt to solve this performance problem intrinsically.

Data Consistency– Data consistency is perhaps the largest issue in terms of managing daily operations. Eventually-consistent database structures require that you run daily anti-entropy checks to ensure that the data is consistent across all replicas. If nodes go down, it is a complex system to manage, as there is no easy way to ascertain if you are in a data loss situation. For NoSQL databases that provide strong data consistency – much like the relational databases – the impact of hardware and software issues are addressed by the system and there is no onus on administrators to manually address High Availability (HA) issues and determine data loss exposure.

One powerful NoSQL solution is MapR’s M7 Enterprise NoSQL Edition for Hadoop that provides a Big Data platform delivering ease of use, dependability and performance advantages for NoSQL and Hadoop applications.  M7 removes the trade-offs organizations face when looking to deploy a NoSQL solution by providing scale, strong data consistency, reliability and continuous low latency performance.  With the M7 Hadoop-plus-NoSQL database distribution, the convergence of NoSQL with MapReduce analytics can offer significant advantages to IT administrators who are assessing Hadoop solutions versus traditional RDBMS systems.  With M7’s tight integration of NoSQL and Hadoop, there is no need to run OLTP and analytics workloads separately, you can run them on one cluster.

Today’s NoSQL database system can be a powerful tool for storing and processing large amounts of structured, semi-structured, and non-structured data.  When looking at NoSQL options, be sure that the one you choose can scale well, offers optimum reliability, performance consistency, and data consistency, and is easy to manage for your operations team.

About the Author

M.C. Srivas, Co-founder and CTO, MapR Technologies

Srivas ran one of the major search infrastructure teams at Google where GFS, BigTable and MapReduce were used extensively. He wanted to provide that powerful capability to everyone, and started MapR on his vision to build the next-generation platform for semi-structured big data. His strategy was to evolve Hadoop and bring simplicity of use, extreme speed and complete reliability to Hadoop users everywhere, and make it seamlessly easy for enterprises to use this powerful new way to get deep insights. That vision is shared by all at MapR. Srivas brings to MapR his experiences at Google, Spinnaker Networks, Transarc in building game-changing products that advance the state of the art.

Srivas was Chief Architect at Spinnaker Networks (now NTAP) which built the industry’s fastest single-box NAS filer, as well as the industry’s most scalable clustered filer. Previously, he managed the Andrew File System (AFS) engineering team at Transarc (now IBM). AFS is now standard classroom material in operating systems courses. While not writing code, Srivas enjoys playing tennis, badminton and volleyball. M.C. has an MS in Computer Science from University of Delaware, and a B.Tech. in electrical engineering from IIT Delhi.