Mike Workman
 

« Previous · Main · Next »

April 02, 2008

Changing the Industry: QoS - The Simple View

One of Pillar’s innovations is the extension of Quality of Service concepts often seen in networking, to storage. When we first introduced our QoS, we were the only company that allowed storage administrators to set priority levels of LUNs and filesystems relative to others. In terms of the inherent nature of system-wide QoS versus a chargeable, bolt-on feature, we are still the only one. 

IT folks used to avoid the sharing of a storage pool amongst applications because it was very difficult to predict the interference that might occur between various applications when they share storage resources. Without QoS, there is a “first-come, first-served” approach to reading or writing.  Depending on the applications sharing the storage, this can produce very bad results, hence sharing was simply avoided, and given a bad name: contention. This, in turn, has led to an industry average of less than 40% utilization. Try to imagine another industry where the buyer was perfectly OK with a 40% yield. Would you go to the grocery store and buy the strawberries where the label reads...”60% of these strawberries are not edible and may cause contention with your digestive tract!”  But I digress - Without sharing & contention, the whole concept of a “storage pool” becomes moot.

Pillar’s QoS allows certain applications to take preference over others. There are five levels of priority if 15K RPM Fibre Channel disk is present, 4 if not.  Essentially, most of the pitfalls of sharing are eliminated, and thus efficiency or storage utilization can drastically improve. At customer sites we see up to 80% utilization without performance implications!

A standard trick people use to try to affect the same result without QoS is to define LUNs and file systems on separate spindles.  This helps, but not much, because it says nothing about all the other resources which drastically impact the performance of the system. These include CPU utilization, cache (how much, if any), network, and Pillar’s disk platter layout. Furthermore, isolating spindles often causes a huge waste; stranded storage. The number of spindles determine IO rate. And, unless you are very lucky, this nearly always implies far more capacity than you really need, given today’s 300GB 15K Fibre channel drive is no faster than its 9GB grandfather was 8 years ago.  On a standard system, this is utilization hell. Yet you still use the same metrics when buying storage ($/GB) without taking into consideration that your utilization rates are decreasing. On Pillar Axiom, this space is there to be used, and actualized $/GB is half that of our competition!

Let’s list a few fun facts about Pillar’s QoS on Axiom:

1. It is not all about the layout of data on disk. We do put higher priority LUNs and filesystems on the outside of the disk because data rate and data density (sectors per track) are higher there, yielding better performance.

2. QoS is not just about where the data is on the disk Actually placing data on the disk platters without a prioritized queue manager will often make things worse, not better.  Here’s a tip: If someone comments on Axiom QoS, ask them what a “Prioritized Queue Manager” is.  If they don’t know, but still feel comfortable talking about Pillar’s QoS, they are a boob and/or you are listening to a load of crap.

3. Part of being Application-Aware implies knowing that some applications need more cache than others, while still others run better without it; this is part of our QoS.

4. Allowing some test applications to hog up resources from production applications is a bad idea. Instead of buying separate systems, use QoS to protect your resources according to business priority.

5. If you think our QoS doesn’t work when many servers share the storage pool, then we need to talk. The whole point of QoS is to differentiate between applications with varying service levels. In fact, Pillar guarantees minimum service levels so applications and hosts don’t time out just because higher priority stuff is in contention with lower priority stuff – if you want to starve applications, your bolted-on version of QoS has to come from the Northeastern US.