Where to Deduplicate, and How to Keep it Simple

Deduplicate

– Eric Bassier, Director of Product Marketing for Quantum‘s Data Protection solutions

Deduplication is now widely recognized as a proven technology in the datacenter. In fact, it seems to be cropping up everywhere – from flash arrays to backup applications, and of course disk backup appliances. There’s no end in sight for structured and unstructured data growth, and with the proliferation of technology like deduplication, it is not unusual that the level of complexity increases, as does the challenge to keep it in check. A good first step is to recognize where deduplication can best be applied. Here are a few of the considerations.

Not all deduplication is created equal

There are plenty of algorithms designed to reduce data, including compression algorithms as well as different ways to deduplicate redundant bits or blocks of data to reduce the data written to disk. Ultimately, the amount of data reduction can vary dramatically, from 50% reduction (2:1 ratio), all the way up to 99% reduction or more using variable block deduplication. In fact, I have seen customers achieve over 100:1 deduplication rates using their combination of production data. The benefit of deduplication is two-fold: (1) reduce data stored on disk, and (2) reduce network traffic (LAN or WAN), so a 2x or 10x difference in data reduction can have very material impact to storage and network costs.

No matter where it’s deployed, deduplication uses compute resources

All data reduction algorithms have one thing in common: they use compute / processing power to perform the algorithm, as well as to keep track of the various bits of data. If a user turns on deduplication for primary storage, flash or SSD, that device will use up processing power on deduplication that it can’t use to perform other tasks – such as serving the clients, applications and users of that storage. Not only is processing power used during the time when data is initially being deduplicated, but with all algorithms some sort of defragmentation or disk space reclamation process needs to be run to clean up the pool of bits and blocks. This requires processing power, and again when the device is performing this function, it won’t be performing other tasks.

Purpose-built backup deduplication appliances enable storage administrators to simplify their tasks by reducing the amount of data needed to be backed up and retained, as well as cutting the bandwidth needed to replicate the deduped data over the wire for disaster recovery to another deduplication appliance. However the return on investments (ROI) can vary greatly among dedupe appliances depending on what deduplication approach is used. Some deduplication approaches can become surprisingly complex and hard to manage as the amount of data to be backed up grows.

Deduplication is about more than just data reduction, it’s about data protection across sites, and in the cloud

Deduplication was originally designed as a technology for backup and disaster recovery, reducing or even eliminating tape for backup. It’s great as a backup technology, since backup data contains a lot of redundant data sets over time. But deduplication has enabled so much more than just data reduction –it’s becoming a fundamental cloud technology. Direct integration of deduplication to the cloud allows simple, low-cost off-site disaster recovery protection.

Variable block deduplication reduces disk storage, but also dramatically reduces network bandwidth, since only deduplicated data is replicated. This means data can be replicated between sites, as well as to and from the cloud in a highly efficient way to minimize network traffic and costs.

Sound Advice

As users deploy deduplication in these types of multi-site, multi-tier deployments, data reduction algorithms that are “good enough” just don’t cut it. More efficient algorithms can drive a dramatically better ROI, with shorter pay back periods.

By eliminating the daily complexity of managing storage, deduplication appliances simplify a storage administrator’s work load, so they can spend time on higher value activities. Intelligently engineered deduplication appliances offer some key attributes:

1) Scale with Simplicity: It’s important to have the ability to scale seamlessly within a single platform across entry and midrange applications. Scalability should not come at the expense of data center footprint, however.

2) Ease of use: Customers increasingly value appliances they can install and upgrade by themselves. A simple, intuitive wizard-based setup and management function can go a long way toward that end. The ability to scale through “pay-as-you-grow” licensing in capacity-based storage increments is also desirable. Increasingly, the ability to monitor and manage dedupe appliances through a “single pane of glass” is becoming a “must have” feature, as is the ability to perform system diagnosis and monitor system health to optimize operations.

The best place to deploy deduplication is STILL on purpose-built appliances designed specifically for the task. These devices should have management features that make it easier to manage deduplication and replication in multi-site, multi-tier deployments that are increasingly going to involve cloud storage in an effort to minimize OPEX.

With these thoughts in mind, perhaps the most underrated quality in a deduplication solution may be simplicity. In deduplication, simplicity means reducing a customer’s backup burden so they can focus on managing their data. A good design is intelligently engineered, intuitive, and highly adaptive.

Eric Bassier is Director of Data Center Product Marketing at Quantum.

Where to Deduplicate, and How to Keep it Simple

Deduplicate

Recent Posts

Archives