– Frederic Renard, vice president of marketing at Arkeia Software (http://www.arkeia.com/), says:
Source-side deduplication of backups, as featured in Arkeia Network Backup, makes backups faster and cheaper.
Deduplication is a data reduction technique that divides files into blocks. With deduplication backup sets are smaller—and require less disk for storage—because any block that is present in multiple files is stored only once. Less storage means less cost.
For backup, deduplication can be performed on the computer being protected, or on the backup server that manages the backup copy. The first appearance of deduplication was on the backup server. This centralized solution was easier to deploy and vendors (e.g. Data Domain) with this “target-side” deduplication solutions were the first to gain market traction.
Source-side deduplication means that the data reduction takes place on the computer being backed up, *before* the data are sent over the network. The backup agent only sends new (heretofore unknown) blocks of files to the backup server. This reduction in data sent over the network makes backups faster.
Source-side deduplication’s reduction of network bandwidth and storage needed for backup sets represents a huge advance over traditional backup solutions, including those that use target-side deduplication. Today, for backup, source-side deduplication represents the gold standard.
Deduplication represents a major shift in the way data are managed in the enterprise. The mathematical foundation of deduplication was established in the late 1980s. The first applications were delivered in the late 1990s, and widespread adoption didn’t begin until the mid 2000s. Today, according to industry analysts Enterprise Strategy Group, fewer than 30% of mid-market companies use backup software products that leverage data deduplication. Penetration of the technology is expected to double by the end of 2012.
Data volumes are growing, especially with the adoption of virtualized environments. Cloud storage can reduce costs, but the challenge is getting data to the cloud. Finally, the desire to run a “24-hour shop” puts increasing pressure on IT managers to shorten backup windows.
Virtualized environments generate large numbers of virtual machines. While each machine needs to be protected, most virtual machines share common blocks with other virtual machines. Deduplication is very effective at reducing the volume of data that must be backed up. This data reduction accelerates backups and reduces storage costs.
Cloud storage is a great alternative to magnetic tape for moving backup sets off-site. Off-site storage of backup sets is essential to protect again natural disasters, theft, or other loss of backup media. The challenge for cloud storage is the cost of the wide-area network (WAN). Deduplication allows WAN bandwidth to transfer 5x or 10x more data with the same bandwidth over the course of an hour.
Businesses want their IT systems to be operational as many hours of the day as possible. This desire for shorter backup windows competes with the natural growth of data in the data center. Source-side deduplication for backups is one of the best ways to protect more data in less time. Shorter backup windows mean that systems are available more hours of the day.
Data center managers can benefit from source-side deduplication in several ways. We’ve already seen how source-side deduplication can make the backup process faster, shortening backup windows, and reduce the volume and cost of required storage.
In addition to on-site backup storage, which is important for rapid recovery of data, IT managers must store backup sets off-site to protect against a major disaster. Historically, the solution has been tapes transported by truck. For large volumes of data, trucks are still the most cost-effective solution. A truck filled with tapes represents a very large bandwidth.
However, for smaller volumes of data like those for a small company or a branch office, backup to the cloud can be a cost-effective alternative. The traditional issue impeding the adoption of cloud storage has been the costs of bandwidth. Large bandwidth WAN links can be very expensive.
Deduplication puts cloud storage within reach. Deduplication of backups reduces the volume of data to be transferred over the WAN, and hence the costs of the bandwidth. Typically, small companies use public “cloud storage” provided by their hosting provider as a backup destination. Companies with remote offices typically replicate backups to a company datacenter in another city.