Mike Workman
 

Changing the Industry Archives • Home

April 02, 2008

Changing the Industry: QoS - The Simple View

One of Pillar’s innovations is the extension of Quality of Service concepts often seen in networking, to storage. When we first introduced our QoS, we were the only company that allowed storage administrators to set priority levels of LUNs and filesystems relative to others. In terms of the inherent nature of system-wide QoS versus a chargeable, bolt-on feature, we are still the only one. 

IT folks used to avoid the sharing of a storage pool amongst applications because it was very difficult to predict the interference that might occur between various applications when they share storage resources. Without QoS, there is a “first-come, first-served” approach to reading or writing.  Depending on the applications sharing the storage, this can produce very bad results, hence sharing was simply avoided, and given a bad name: contention. This, in turn, has led to an industry average of less than 40% utilization. Try to imagine another industry where the buyer was perfectly OK with a 40% yield. Would you go to the grocery store and buy the strawberries where the label reads...”60% of these strawberries are not edible and may cause contention with your digestive tract!”  But I digress - Without sharing & contention, the whole concept of a “storage pool” becomes moot.

Pillar’s QoS allows certain applications to take preference over others. There are five levels of priority if 15K RPM Fibre Channel disk is present, 4 if not.  Essentially, most of the pitfalls of sharing are eliminated, and thus efficiency or storage utilization can drastically improve. At customer sites we see up to 80% utilization without performance implications!

A standard trick people use to try to affect the same result without QoS is to define LUNs and file systems on separate spindles.  This helps, but not much, because it says nothing about all the other resources which drastically impact the performance of the system. These include CPU utilization, cache (how much, if any), network, and Pillar’s disk platter layout. Furthermore, isolating spindles often causes a huge waste; stranded storage. The number of spindles determine IO rate. And, unless you are very lucky, this nearly always implies far more capacity than you really need, given today’s 300GB 15K Fibre channel drive is no faster than its 9GB grandfather was 8 years ago.  On a standard system, this is utilization hell. Yet you still use the same metrics when buying storage ($/GB) without taking into consideration that your utilization rates are decreasing. On Pillar Axiom, this space is there to be used, and actualized $/GB is half that of our competition!

Let’s list a few fun facts about Pillar’s QoS on Axiom:

1. It is not all about the layout of data on disk. We do put higher priority LUNs and filesystems on the outside of the disk because data rate and data density (sectors per track) are higher there, yielding better performance.

2. QoS is not just about where the data is on the disk Actually placing data on the disk platters without a prioritized queue manager will often make things worse, not better.  Here’s a tip: If someone comments on Axiom QoS, ask them what a “Prioritized Queue Manager” is.  If they don’t know, but still feel comfortable talking about Pillar’s QoS, they are a boob and/or you are listening to a load of crap.

3. Part of being Application-Aware implies knowing that some applications need more cache than others, while still others run better without it; this is part of our QoS.

4. Allowing some test applications to hog up resources from production applications is a bad idea. Instead of buying separate systems, use QoS to protect your resources according to business priority.

5. If you think our QoS doesn’t work when many servers share the storage pool, then we need to talk. The whole point of QoS is to differentiate between applications with varying service levels. In fact, Pillar guarantees minimum service levels so applications and hosts don’t time out just because higher priority stuff is in contention with lower priority stuff – if you want to starve applications, your bolted-on version of QoS has to come from the Northeastern US.

March 26, 2008

Changing the Industry – Software Licenses

So let’s say you own a storage system and you want to triple its capacity. Well, unfortunately for you that means another system, because it is very likely that your system doesn’t support another 16 storage shelves or if you added them the performance of the system would go into the crapper.

So you buy another system instead of expanding the old one.  At first this may not seem so bad, as you need the storage shelves anyway, so adding them to an old system versus getting a new shiny one doesn’t seem so bad….until you have to buy WAFL, ONTAP, NFS and CIFS protocols, all over again. Wait a minute, can’t I just put those shelves on my old system – I already bought that stuff!

Well, no you can’t Mr. Network Appliance Guy.  You could upgrade platforms, and then get that performance as the new one has a bigger back-end so you can attach the disk you want. Well, not only do you need a forklift for that upgrade, but you will pay more for the software again, because they charge more for the same software on the bigger platform than when it is loaded onto the smaller one. Get ya either way.

Unless you own an Axiom. With Axiom you can add up to 64 Bricks of SATA and/or FC disk to scale your back-end performance and capacity, as well as up to 4 Slammers to scale your front-end performance, and use the software you bought originally – same license, under the premise that it “scales”….because viola! It does.

Not a nice change for most storage vendors, but certainly a nice change for storage customers.

March 21, 2008

Changing the Industry – Distributed RAID

A good friend of mine described a childhood experience, which involved the waiting room in a hospital. Although not the point of the story, it reminded me of our product. OK, I know… I am a geek, but hey, at least I am self-aware.

In hospitals 45 years ago there was one TV in an “Entertainment Room”. TV’s used to be quite a luxury, and relatively expensive, so you needed to share it amongst the patients. And, of course, there was one in the waiting room. In today’s health care facilities, patients not only have their own TV, but they often have a computer of their own. Waiting rooms have Wi-Fi and of course PDA connectivity. Sharing is not necessary anymore because the economics of technology have changed considerably.

So 20 years ago RAID was invented, and it was relatively expensive. As a result, RAID controllers were designed to handle a hundred disk drives or more. Amortizing the cost of a RAID controller over lots of spindles was great, but the performance was compromised as the number of spindles increased (they taxed the RAID controllers as one would expect).

Unfortunately, one or two RAID controllers per storage system is a bit too similar to one or two TV’s per hospital….nobody quite gets what they want and everyone waits for their turn much too often. Our competitors still build their storage systems in this fashion.

Pillar includes a RAID controller for every 6 drives (12 drives for FC disk) and scales to 128 RAID controllers in a single system, providing redundancy, linear scaling of performance with capacity, and unsurpassed performance under fault conditions. If a disk drive fails, we rebuild at least four times faster than our competitors when the system is busy.

If you think this isn’t a big change to the industry, try adding another 10 storage shelves to one of our large competitors’ systems. And if you expect a commensurate increase in performance, the salesperson will most likely tell you that you should buy another storage system. If you do, you will pay for all your software all over again. NAS, SAN, iSCSI protocols, Thin Provisioning, Management Software, the works.

Ka-ching! Ka-ching! Ka-ching! Isn’t it time to change this model?

And if you want to talk on that PDA or Cell phone, please go outside, you are disturbing the guests in the waiting room….

March 09, 2008

Changing the Industry

It might sound hyperbolic, but we built Pillar to change a few things in the storage industry. Our Customers know it – they experience it. But the “Industry” we are changing in large part doesn’t want to be changed. This includes just about everyone in the industry: Customers, Analysts, Reporters, but overwhelmingly the resistance comes from entrenched competitors who feed all kinds of BS to the rest of the ecosystem to prevent change.

There is a lot of money in storage, and the dilemma is in bringing about change when people (not necessarily Customers) are doing just fine, thank you. If you look over time, there have been a lot of changes brought about in the IT industry. Mainframes to Mini-computers, mini’s to Workstations, Workstations to PC’s. Disk drives have gone from high gross margins to the high teens or low twenties. The PC business started off with high gross margins and has hovered around 18-24% now for a long time. Business executives didn’t want that to happen of course, but competition brought it about. Score one for the free market system.

So given all the change in the IT industry, what about storage? Well, yes there has been an evolution of change: Proprietary to Open, DASD to Networked, SAN to SAN+NAS and iSCSI. Of course we are describing a 40 year journey here, not 10 years as in the PC, or Workstation business.

For those of you who aren’t convinced, what margins do NetApp and EMC brag about? Well, try 60%! This is because their businesses are so small? Storage systems and the software that runs on them, for just those two companies are $14B per year. I should be so small (no short jokes please). So if you were running a business making 65% gross margins and your top line revenue was growing, why would you want to change?

Well, you might change if you thought you could get the jump on your competitors and steal more market share. However, the firmware and software that storage subsystems are comprised of includes in the neighborhood of 5-15M lines of code. Change that? Sure, can I work the weekend? Talk about a self-imposed barrier to change!

OK, so what do we at Pillar want to change? Here’s the short list:

  1. Purpose-built storage.
    You don’t need to support, manage and maintain 10 different storage platforms in a data center to get your job done. This is inefficient at best.
        
  2. Platinum coated hardware for reliability and performance.
    Active-active, redundant systems with modern architectures can coax more performance out of off the shelf hardware and desktop disk drives than the big guys want to....it eats their gross margins. This is a fact: I have worked in these companies and watch this behavior all the time.
        
  3. Scalability of typical storage systems sucks.
    When you have two RAID controllers for hundreds of spindles you are doomed. Instead of adding more spindles to an array, storage companies want you to trade-up (forklift usually) or buy another system and split the work. What happens then? You have to buy all the software again. This is a ridiculous treadmill that companies get on. There are shops with hundreds of nearly identical EMC or NetApp boxes where a modern architecture would allow 20, and would save many millions of dollars, headaches, staff attention, and space.
        
    Proof point: Pillar’s single Software license model. This scares the crap out of incumbent suppliers, and if you are a Customer, you should get excited by this.
        
    Pillar scales to 8 control units in a midrange product, and we use distributed hardware RAID to let performance scale linearly from 24 to 832 spindles on a single frame. We also use QoS to deliver consistent performance to applications at the same time allowing them to share their spindles. Everyone fought us there, but they were wrong and hence are now copying us. It makes too much damn sense.
        
    A shared, virtualized storage pool with QoS and thin provisioning encourages high utilization rates, 2-3 times the industry average: This is serious coin folks.
        
    Proof point: Pillar’s Bricks (storage enclosures) have hardware RAID, so as you add spindles you add RAID engines. Rebuild times under load are 4X better than competitors. Performance during rebuild is 2X better than competitors. Instead of fixing the problem our competitors ask you to add another disk drive that adds no capacity to your little group of drives and loses performance so you can withstand double failures in a 6 or 12 drive array. For Pete’s sake! Then they feature it!!
        
  4. Maintenance.
    This is a doozie. …and a prime mover of inefficiency. First, the standard in the industry is “we’ll do that for you”, or “Don’t touch this system as it will void your warranty”. Big installations end up with EMC on staff! Well of course they do! The systems are as easy to maintain as the Space Shuttle. They are flexible, but within the confines of a 15 year old architecture and writing bin files to configure them. (I love my PDA, but just wish I could program it in FORTRAN). Let’s face it – maintenance and support are big business for the incumbents, why make it easy?
        
    Proof point: Pillar has built in Guided Maintenance that allows users to replace drives, or almost every other component in the system if they want to. Of course if you don’t want to, we will do it for you. I can repair my sprinklers too, but I don’t want to....the choice is mine. Sometimes when I see the gardener’s bill for replacing the sprinkler my kid ran over, I rethink that one. The good news is, it doesn’t void my warranty if I do, because I would fire my gardener if he told me that....hey?
        
  5. Easing Storage Administration.
    How many Microsoft Exchange Servers, or Oracle installations do we need to configure per hour before we might just automate this task? Again, for Pete’s sake, take best practices and put them into the machine.
        
    Proof point: Pillar put policy-based, Application-Aware Storage into the market almost 3 years ago. When customers see how easy it is --you wait and see -- competitors will start copying it just like they tried to copy our QoS.  However, they will charge you more for it instead of baking it into the array, but they will offer it. EMC tried to copy QoS, and they often wave the charge if they are up against Pillar. Lucky for us, they did a lousy job of copying it too, free stuff that is difficult to use, and doesn’t work well, is worth exactly what you pay for it.

There’s more! For those of you who found this interesting, don’t worry, I’ll be back with more. For those of you who didn’t, please comment, and thank my daughter who ran over the sprinkler again. Cement and rebar protection just might be in order....

-Mike