How to Avoid One of the Biggest Silent Killers of SAN Performance

San Performance

– Brian Morin, Senior VP, Global Marketing, Condusiv Technologies, says:

Have you ever felt like your SAN isn’t performing like it did “out of the box” and that performance has somehow degraded over time? Perhaps you’ve dismissed it with assumptions that your SAN performance hasn’t actually degraded, but rather you’re just hitting throughput or IOPS ceilings due to an increase in performance pressure related to additional workload and/or users. However, if you’ve been feeling like your SAN performance has degraded, you’re right. It has. SAN performance degrades over time. It’s a “mileage-will-vary” kind of discussion on how much performance has degraded in your specific environment, but when spending hundreds of thousands of dollars on SAN subsystems, giving back performance to I/O inefficiencies is not something many organizations can afford. Nor can they often afford the brute force approach of simply overcoming those I/O inefficiencies with another premature forklift upgrade to the SAN environment or just throwing more flash at the problem.

As much as the “keep throwing hardware at the problem” approach can medicate symptoms of I/O inefficiency, it can’t solve the root cause problems why performance degrades. When it comes to identifying I/O inefficiencies and how much is affecting your real-world workloads, it starts by understanding the culprits.

One of the biggest culprits is an “I/O tax” of sorts that occurs in Windows server environments connected to SAN storage. Since the operating system is abstracted from the physical layer, the actual blocks are physically managed by RAID controllers and other block management technologies within the SAN itself. When a volume or LUN presents itself to Windows, it is presented as a logical disk software layer. Windows is in control of this logical disk. It determines what data is written to which address or addresses at the logical layer. The SAN simply sees the allocations, receives the data, and determines where to physically write and manage those blocks to disk or SSD.

Here’s where one of the biggest silent killers of SAN performance begins to take shape. As the relationship between I/O and data begins to erode at the logical layer, the average KB per I/O operation begins to drop which inflates the IOPS requirement for any given file or workload. In other words, it takes more time to move the same amount of data. Instead of requiring a single, contiguous I/O to process a unit of data, it could very well require eight split I/Os to process the whole file, or even hundreds of I/O. In fact, the Windows performance monitor utility will tell you exactly what percentage of your I/O is being split. Here’s why it happens:

Unfortunately, Windows isn’t aware of file sizes as data is received from the application, so Windows takes a “one-size-fits-all” approach. Instead of looking for the best allocation for a unit of data at the logical layer, it just fills the next available free address no matter how large or how small it is. If the address is too small to handle the full file allocation, Windows will “split” the file by filling the logical address then continue to split until the whole file has finally been allocated to as many addresses as needed to store the whole file. Since each address at the logical layer requires its own dedicated input/output operation, that means a file that could have been written or read as a single I/O, ends up being written or read with multiple I/O. The more I/O required to move a unit of data, the longer it takes to process that data. As time goes on, the relationship between data and logical free space erodes even further until much of the I/O being processed is tiny, fractured I/O. The surplus of increasingly smaller I/O creates unnecessary I/O overhead for the whole infrastructure but harms SAN performance the most since more IOPS are now required to process any given workload or file.

The unintended consequence of this phenomenon is that administrators are led to believe they have an IOPS problem and need to raise their IOPS ceiling, when in fact if they could solve this inefficiency and raise the average block size, or amount of data carried with each I/O operation, they would increase throughput, be far less IOPS dependent, and immediately process more data in less time on existing systems.

If you refer to the “vSphere Monitoring and Performance Guide” for ESXi 5.5, VMware repeatedly suggests “defragging” the file system to keep this phenomena from occurring so you can reclaim SAN performance. However, since running a defragmentation utility should never be run a “live” production SAN, it can be a very tedious and time consuming process to migrate data, take the volume offline, defrag, then bring it back online.

Before you think this is a problem you just have to live with, there is at least one company that has done something about it – Condusiv® Technologies. They provide a fragmentation prevention solution to stop logical disk fragmentation before it occurs, so a defrag process never needs to be run. Their solution feeds file size intelligence to Windows to help it make the perfect allocation at the logical layer so when the SAN device receives I/O requests, it only processes clean, contiguous I/O, enabling the SAN and server to process more data in less time. Their latest Diskeeper® 15 Server software solution provides this benefit for physical servers connected to SAN while their V-locity® I/O reduction software provides this same engine for virtual servers while also including server-side caching technology to further reduce I/O strain to the SAN.

# # #

Brian Morin, Senior Vice President, Global Marketing, Condusiv Technologies

Brian is Senior Vice President, Global Marketing, responsible for the corporate marketing vision by driving demand and awareness worldwide. Efforts over the last year led to growing adoption of V-locity^®, which has quickly amassed over 1,000 new customers looking to accelerate their virtual environment with a 100% software approach.

Prior to Condusiv, Brian served in leadership positions at Nexsan that touched all aspects of marketing, from communications to demand generation, as well as product marketing and go-to-market strategies with the channel. Brian notably steered rebranding efforts and built the demand generation model from scratch as an early marketing automation adopter. Growth led to the successful acquisition by Imation.

With 15+ years of marketing expertise, Brian has spent recent years on the forefront of revenue marketing models that leverage automation for data-driven determinations. His earlier background has roots on the agency side as creative director, helping companies build brands and transition to online engagement.

– See more at: http://www.condusiv.com/Company/Leadership/#sthash.EXyPyTAC.dpuf

How to Avoid One of the Biggest Silent Killers of SAN Performance

San Performance

Recent Posts

Archives