Tackling Big Data Through a New Approach to Distributed, Unified Storage Systems

– Bryan Bogensberger, GM Ceph Distributed Storage and vice president of Business Strategy at DreamHost (http://dreamhost.com/), says:

In this era of Big Data, enterprise storage needs are increasing at an exponential rate. With the advent of the cloud, IT professionals are expected to provide instant access to data, as well as auto-scaling to handle ever-growing data stores. However, incumbent storage systems are often proprietary and expensive to maintain. As the deluge of Big Data in the enterprise continues to grow, these storage systems are being left further behind, simply unable to scale to meet today’s data needs.

To meet the growing challenge of Big Data, IT professionals are looking for a new distributed, unified storage system capable of massive scaling. Going beyond just a file system, it is essential to also include object storage and block storage to ensure optimum flexibility and performance. In an object-based file system, file metadata is separated from file data, and the data is then split into flexible-sized data containers called objects. One advantage to object-based architecture is the elimination of metadata bottlenecks: because the metadata is stored separately from the data, metadata servers are contacted only once when the file is accessed. Block storage provides a reliable, scalable disk interface for compute within cloud and legacy environments.

An emerging technology designed to meet the challenges of rapidly expanding storage demands including in the world of Big Data is Ceph, a distributed storage system that is designed to be massively scalable, self managing, without any single point of failure. Ceph is unique because it offers block storage, object storage, and a POSIX-compliant file system all in one. Able to seamlessly scale from gigabytes to exabytes and beyond, Ceph is designed to handle extreme workloads—for example, when tens of thousands of clients simultaneously access the same file—a usage scenario that brings typical incumbent storage systems to their knees. Ceph uses the CRUSH algorithm for efficient, scalable and highly specific data placement. Ceph runs on commodity hardware, with intelligent storage nodes and no single point of failure. Because it is open source, Ceph is free to use and can be easily integrated into existing architectures.

Originally created as an open-source project by DreamHost co-founder Sage Weil, Ceph is making waves in the industry, including working closely with Dell and contributing to the OpenStack project. Ceph was included in the Linux Kernel in 2009. The Ceph team continues to work on improving the entire storage system, and they recently announced a project Roadmap explaining their vision for the future of Ceph, including Hadoop integration and full RADOS Block Device support within Openstack.

As the amount of data being generated and stored in the enterprise continues to grow, IT professionals will be forced to abandon their crumbling, legacy storage systems that simply cannot handle today’s data needs. The next generation of distributed, unified storage systems is the answer for the enterprise’s Big Data needs. When evaluating storage systems, companies should look for massive scalability (to the exabyte level and beyond), self management for lower management costs, and maximum reliability that will ensure that the organization will have ready access to all of its expanding pool of data.

Tackling Big Data Through a New Approach to Distributed, Unified Storage Systems

Recent Posts

Archives