Mike Workman
 

Avatar

Avatar%20Movie%20image%20Navi So I just went to see Avatar – in 3D of course. Wow. It was an amazing movie to say the least. The opening took me off guard a bit. It was fine – but nothing like what I had come to expect from the buzz surrounding this movie.  But then the technology kicked in and I was astounded.


Several thoughts occurred to me during the movie. First, of course, was an understanding why the movie drew $2B in box office in just 6 weeks. As much as I hate things that are popular becoming more popular because, well, they’re popular, this movie warrants the popularity in my opinion. Score 4 points for using the word popular 4 times in one sentence!

By the way, the appeal here is not the plot. I won’t give it away for the 9 people on the planet who haven’t seen it yet, but you will have the whole thing nailed after the first 20 minutes. Although predictable, the story is a great depiction of the Gaia hypothesis (or theory if you prefer) – that our planet is a completely connected living entity. I wholly embrace this and always have. But I digress.

The second thing I thought about was the moon. Huh? Well, it so happened that we had a gorgeous full moon out and while appreciating it this weekend, I couldn’t help but think about landing on the moon, NASA, and all that. During the movie, the thought occurred to me that far more processing power went into this 3 hour movie than a 10 year journey to the moon. Damn.

Not to be morbid, I also thought how lucky I was to get to see this in my lifetime. In other words, glad I didn’t miss this.  I think technically, the depth of the 3D computer rendering was a milestone in movie making, if not human events. Oh I suppose some people might think milestone is a bit of exaggeration – after all there have been a few spectacular movies made already that peg the processing horsepower meters. The thing is, this was especially creative and had content in it not done solely for the sake of special effects, but it blew my mind at the capability to bring to life an imagined reality far removed from our own. After a while, the movie seems real, rather than a movie. It’s kind of like watching a foreign flick with subtitles. If it is a good one, you forget about the damn subtitles and you start thinking you can hear the voices in your own language.  Is it just me?

Part of the story offended a few critics and pundits: a repeat of the big bad corporation ruining our environment out of greed. Oh for God’s sake. There have been and are plenty of real life examples of this. This is not a stretch. I am a raging capitalist for sure (just look at the books I love and recommend – Ayn Rand for crying out loud).  But just because a movie suggests this scenario doesn’t mean it is a subversive tool of the eco-fascist global warming nut-balls who have 17,000 square foot homes, use 2 megawatt hours of electrical energy annually, fly a private jet, and buy back their own carbon credits in their quest to double the $100M winnings they have made, does it?  Well if it is, I ain’t buying that story; I prefer to believe that we are capable as a species of not destroying everything we touch. Mostly.

There isn’t that much to say about the storage required to hold all the content, versions, texture files, etc. that has any wow factor. "Weta Digital in New Zealand did this, and they built an impressive render farm for this project consisting of 35,000 processor cores, 3 PB of disk storage, and 104 TB of RAM. NetApp's involvement with this movie is an honor for them for sure. Congratulations to the gang over there."

I love technology. The power this movie has to drop you into an alternate reality is a bit scary. I highly recommend seeing it in 3D and at a great theater, even IMAX if you can. Have fun.

Mike_signature_5

Note: I did not include a link to a Trailer of the film – I would recommend you see the movie without viewing the trailer if you can.

January 29, 2010

Pass the Morton’s Salt

Morton-salt-girl-close-up

When I was quite a bit younger some really great folks at IBM gave me the opportunity to help start a Hard Disk Drive OEM business.  I was part of the Storage division in San Jose California. At the time we built proprietary, non-standard products with all custom mechanical and electrical parts.

The writing was on the wall, the future lay in using high volume, and hence lower cost parts. Not only did this amortize engineering costs (NRE), but tooling and test process costs could be amortized over a much larger volume as well. The idea was – use custom parts only where they provided a distinct competitive advantage. Then, build designs that could be sold into many products, not just one.

IBM wasn’t alone in this, the rest of the world was trying to gain leverage by producing standardized components as well. Seagate was building an empire out of providing 5.25” standard form factor drives to everyone, including the IBM PC (AT back then).  But IBM had invented the disk drive, and its leadership was furious about ceding the high volume low cost drives to the likes of Seagate, and Conner Peripherals. Besides, it was clear that before long, the mechanical advantages of smaller form factors and advancing technologies would obsolete the “big drives” that were sold two or four spindles to the refrigerator sized box.

The IBM AS400 group had the same idea: Build smaller drives with advanced IBM technology to sell to internal customers like the AS400 and IBM PC groups. While the AS400 came from the “custom” world, the IBM PC guys new that they needed best of breed cost in all their components, and the thought of being locked in to some over-transfer-priced HDD from another division was repugnant. The Rochester team made an “almost standard” product: Little things like non-standard mounting holes were rendering their drives incompatible for PCs inside or outside of IBM.

I was asked by “The Chairman” and a few San Jose execs to build an entrepreneurial program inside IBM – the goal of which was a) To build a standard form-factor and interface HDD, and b) Build one packed with enough technology like MR heads to allow even the high-end storage guys to incorporate it into a modular version of the product.  Unfortunately the IBM Rochester team was heading in a similar direction, so a political battle ensued in which after a squabble, I landed in Rochester, Minnesota. As my California friends said at the time “He really must have pissed someone off to be sent to Minnesota”. From Rochester (home of the Mayo clinic) I managed what I named Allicat – an enterprise class drive in reliability and performance that fit Industry standard electrical and mechanical specifications.  The “Alli” in Allicat came from the Alliance of San Jose and Rochester. At 2GB, 5400 RPM, SCSI and IPI-2 interfaces, the drive was the beginning of the OEM HDD storage team within IBM. We went from about $0 top-line revenue to about $4.6B in the next 11 years. 

Disk drives today are indeed labeled as a commodity. Lots of definitions of a commodity exist including simply something that is bought or sold. I maintain that when most of us think commodity, we think about a product that has minor differentiation against others that are adequate substitutes. Table salt for example: Nobody says “Please pass the Morton’s Table Salt”.  Instead, salt is salt, and rarely is anything but “Please pass the salt” heard at any table. Likewise, gold is gold, wheat is wheat, etc.  Differentiation of one commodity over another is usually at the fringes -- fringes which are desperately held on to by manufactures (But when it rains, this salt still pours!).

Moving up the food chain buyers of PCs and Servers that incorporate HDDs always make sure that their commodities include two or more sources. Same for muffin fans, chassis, cables and connectors.

What about storage arrays? Well the more complicated the system, and the smaller the volume requirements are for a system, the less easily it is commoditized. After all, the how many Golden Gate bridges are needed in the world and how standard is the interface between the bridge and the terra firma it sits on? So the truth is, while buyers try and push arrays toward the commodity spectrum, it is difficult to substitute one array for another at some level. Training, management, interoperability, application APIs are all different enough that one vendor is much easier than three, and disparate types of arrays at some level cost the buyer money by shear reason of their differences.

What are some of the consequences of commoditization in the storage business? Here are a few, I am sure that many of you can add to this list:

  1. Disk will continue as a commodity.
  2. SSD will become a commodity. Manufacturers will struggle valiantly but much like the HDD business, that large OEMs will drive toward standardization and multiple sources as volumes increase. One might argue that we are nearly there already, but firmware maturity is still disparate amongst manufacturers. 
  3. The number of manufacturers of SSDs will grow for awhile, and eventually decline as margins force consolidation.
  4. Flash memory used in SSDs will become a commodity. Today there are still some differences but there will be a convergence.
  5. Plug-in Cache modules (PCIe based Flash Memory) will converge into a commodity. Right now many players are striving to differentiate themselves, but the pace will be fast and furious and largely decided by large volume OEM’s wins.
  6. As SSDs reduce in price and increase in capacity, there will be larger and larger a substitution of SSDs for HDDs. 
  7. A trend toward SSDs over HDDs will cause all storage arrays to be re-architected. Today’s arrays are not built properly for maximum utilization of the performance benefits of SSD. This will affect everybody in the business. Pillar’s advanced Axiom architecture is already under development. This will be fun.
Oh, and I like Minnesota, really. Sure it is cold, but that wasn’t the real problem. Rather, it was how long it was cold. And thank goodness for the commoditization of salt, because they use a heck of a lot of it.

Mike_signature_5

January 21, 2010

Trick or Treat

I must admit I’ve been remiss in posting this year.  We’ve been off to a busy start at Pillar with the announcement of our Series 2 Slammer and everything else that sits squarely on my shoulders as the CEO…mea culpa. 

More on the Axiom 600 Series 2 soon, but I saw this yesterday and I just have to write about it:  3PAR’s new Save 50% of Your Capacity Guarantee. I can’t help it. Perhaps thinking that the “thin” story has worn thin, 3PAR has a new guarantee that goes like this:  Assuming you have over-provisioned your LUNs by 2X, and you don’t own an array that will allow you to thin provision (just about everyone’s does), you can toss out your array at capacity of C  and buy a new one with a capacity of only C/2? And 3PAR guarantees that you will only have to buy C/2 to replace what you already own (C)?

Hello? Helloooo? I am sure there is more to this, there has to be. This seems goofy at best. I’d love to read the “terms” that they reference (but don’t disclose).

How about I offer my own guarantee: Don’t buy anything from anyone and save even more! After all, you if you fill up C/2, it’s not like you have more storage by buying C/2? I suppose that you have the virtue of not wasting anything. I like that.

Meanwhile, back at 3PAR’s guarantee, for equal capacity spindles, your performance will drop in half, guaranteed.  That sounds kinda crappy to me. Who needs a 50% performance drop after spending money to save space that you don’t apparently need?  And how do you increase performance with 3PAR?  Well, add more disks of course!  Seems to defeat the purpose of this guarantee doesn’t it? I mean, buy half as much, but double it to get your performance back to where it was before you started.  Earth to 3PAR Marketing, over. Come in 3PAR Marketing, are you near Uranus yet? Apparently so. Who came up with this one? I hope it wasn’t some guy eating a cinnamon roll behind the steering wheel of his car again when this idea popped up?

Oh, you need the space? For more applications and stuff? Well, then buy the same capacity you had, restore your performance (for equal spindle count), and provision the original capacity out at C/2 instead of C – this leaves C/2 of empty space no longer hogged up by the over-provisioners out there. Good. But if your new applications require more I/O than you originally had, well, you are SOL.

Wait a sec. If you’re going to migrate all your data to another system for this “technology refresh” and go through all the LUN resizing exercises, why not skip the 3PAR purchase and rearrange what you already own? After all, it is possible, and you wouldn’t have to fork over any bucks to do it.

I think this comes down to a foot in the door. It’s one-dimensional and focuses only on capacity and not performance. It’s a trick, not a treat. You can’t replace the performance of 100 spindles with 50 spindles.  It won’t work.  You also can’t buy the same 100 spindles (for performance reasons), use thin provisioning to get more space and then layer more applications on those drives.  Where do they get their performance?  You can’t cram 100lbs of stuff in a 50lbs sack.  3PAR needs to figure out a way to meet service level agreements for each LUN when they are all in contention (hint: QoS).  This will allow them to use Thin Provisioning, tap the recovered capacity and provide decent performance minimums.  As it stands, this guarantee is goofy, and in the hopes that people just listen to the story, 3PAR may end up making a quarter disappear behind their ear. Problem is, it is your quarter.

Here is the shocker: 3PAR’s stuff is most probably better than the old fashioned 15-20 year old array architectures, and I recommend it. This aspect of the “technology refresh” is worthwhile.

Now before you fall off your chair, I don’t recommend them over Pillar, but certainly I think the next-gen architectures are better than the stuff the legacy suppliers keep hawking.

My suggestion is that you bring them in for better reasons than a goofball 50% capacity savings guarantee. While you’re at it, bring in Pillar. We’re thin too, but we have a lot more technology than thinly striping on all disks….like QoS, distributed hardware RAID, unified NAS/SAN, single-license software model without capacity scaling taxes (like 3PAR has for example), a real guarantee of 80% utilization at expected performance levels, SSD, 2TB disks, and the choice to stripe on smaller number of spindles than “all”.

And it’s not even Halloween.

Mike_signature_5

December 17, 2009

Feature or BS?

At the Gartner Data Center Show a couple weeks ago in Las Vegas, I heard some corkers that I could hardly believe. I suppose we all hear a fair amount of crap from every company including Pillar. But usually it’s within expectations – within a “norm” so to speak – and emphasize the positives or differentiators of a system. My marketing pals would say this spin isn’t crap, and I think that is a fair viewpoint. By my definition, a corker is a deceptive statement with no regard for the underlying truth designed to lead you to false conclusions. Upon discovery, a corker leads you to uncork the intoxicating beverage of your choice.

The RAID 10 Ingest Corker:

A guy gets up for Compellent to say all kinds of good stuff about the product, and at one point he says “All our LUNs start out at RAID 10, so when you are initially writing these LUNs, they write with the speed of RAID 10, which is fast. This is a huge advantage. Later, after it is written, the LUNS are converted to RAID 5 if you don’t need the write speed any more.”

Brilliant.  What a feature!  Of course they don’t mention that means the LUNs take up 2X the space as they should. And they don’t mention why they do this, leaving you to believe it’s just a great idea. They also don’t mention that it is not “fast” compared to competitors’ systems (nearly all, not just Pillar). Rather, it is fast compared to their system without this patch.

So why do they do this?

  1. Software RAID. If you want to have fast write speed with Software RAID running on that same (even multi-core) server with the rest of the Control Unit software, you really need to bypass the RAID calculations.  Oh sure you can make this all look transparent if you only have 20 spindles, but when you are trying to handle 200+ spindles you are looking at a serious degradation in performance with this architecture. So this isn’t a feature; it’s an architectural “patch”.  Hardware RAID makes the issue of RAID level for sustained writing far less important.
  2. One half of one gigabyte of Write Cache. This is a ridiculously small amount of memory compared to any Enterprise Class storage array. Most systems can do “bursty” writes (not sustained) very fast by using write cache.  Of course this cache is RAM backed up with battery or SuperCap and RAM/NAND Flash combo, as well as mirrored.  But the burst you can sustain is limited by the size of your write cache. Pillar supports 24-96GB of Cache, 8-32GB of it are write cache, 16-64 times the amount of cache a Compellent system has. For most applications, this is important, and the Pillar Axiom can accommodate write bursts that are one to nearly two orders of magnitude larger than a Compellent System can.

So why isn’t this a good idea? In other words, assuming a system design not hobbled by a petite write cache and software RAID built into the Control Unit (using shared memory resources and control unit to disk connection bandwidth), why wouldn’t we do this too?

  1. We don’t need to. Ingest of new data often consists of lots of sequential writing. With a proper hardware RAID implementation, RAID-5 sequential writing is efficient. Performing the ingest with RAID-10 on a Pillar Axiom is unnecessary. However, should you decide to change RAID levels with the Pillar Axiom, the change is performed quickly, even under full load. It doesn’t take days to convert one way or another.
  2. It takes IOPs to convert back to RAID 5. Wouldn’t you rather use those IOPs for your applications?
  3. Enterprises need deterministic performance. This means that the performance characteristics are always known, that they have minimums. QoS means determinism in the face of contention for resources, something you cannot get if the critical parameters of the system are slowly moving between one kind of RAID and another, between one kind of storage element and another. We opted for deterministic application performance.
  4. Storage arrays that optimize through past patterns need time (days), and they need a target that moves slowly enough.  Don’t surprise a system using this “feature” with aperiodic workloads or ones that are sporadic, last portions of a day, and expect miracles; instead expect to be disappointed.
  5. Fast moving changes in the working subset of data, or the composition of the “data at rest” are covered by cache memory. One access, optimized in the Axiom is cached at the 64KB “big block” level, immediately. The working set not cached in the servers, is cached in the Axiom, not based on days of use, but based on what is being used by applications now. This finer granularity fits application patterns more accurately than what can be done at 512KB or worse yet 2MB “superblocks”.
  6. All of the algorithms run in the background for days do so during idle time assumed to be available every day. Axiom works 24x7 at full load, even under fault, without the assumption of hours of idle time every day.

I don’t mind a vendor patching around a shortcoming in their system. This is normal for engineers trying to make the best out of what they have. After all, it is cheaper to use software RAID and not provide for scalability beyond a hundred spindles, and much cheaper to us one tenth to one hundredth of the cache memory.

I do mind blatantly featuring something like this particular corker as if it were of value to the customer over competitive systems with Enterprise designs.

Mike_signature_5

Post Script

Maybe I’m being too harsh. Maybe the Compellent guy didn’t even know the underlying reasons for the patch. There’s a fantastic paper and subsequent book on the topic of bullshit – On Bullshit by Harry Frankfurt, published by Princeton University Press. I highly recommend it, as it is informative, and fun. We all use terms like humbug, bullshit, corker, and malarkey and Frankfurt makes a great study of the subject, exploring the differences between bullshit and lying. This is a real and scholarly piece of work and not a joke, although I cannot help laughing out loud seriously discussing the topic of bullshit. For a great audio reading of this, check here.

December 02, 2009

SSDs Anyone?

At Pillar, we got more than our fair share of attention with the introduction of SSD to our storage pool.  I believe this was because we took a different approach than “the big guys”.  We introduced SSD in our enhanced function SATA storage shelf rather than FC.

We made this choice to agglomerate more capacity and RAID protection with lower overhead at a lower cost per IOP than our competitors.

Most FC interface SSDs have higher capacity at over twice the cost per TB of the SATA version. Single SSD performance isn’t interesting because at a minimum you have to supply a parity drive and a hot spare. So you get one drive’s capacity for the price of three, a horrific tradeoff in the competitor’s product that is already astronomically pricey.  The Pillar approach using 12 drive groups with two parity drives and one hot spare affords a much better option, especially since the performances add up in our configuration (13 drives give 12 drives of performance, the hot spare doesn’t help).  Thus we can sell an SSD brick at a list price equal to that of 4 SSD drives from EMC (which give you only 2 drives worth of capacity, and performance). 

Furthermore, for every 12+1 drives, you get a pair of high performance RAID controllers that can rebuild an SSD in 19.2 minutes under heavy load!!  Try rebuilding a 200GB EMC drive under heavy system load in 19 minutes. Hmmmm. Can you say more like many hours? A day? Depends on how many other SSDs you have on the same system sucking up the control unit’s bandwidth doesn’t it?

Add a few SSDs on an FC loop architecture like the competitors do and the system’s hard disk performance quickly grinds to a standstill. At the same time, performance under fault can be devastating. Besides, without QoS, SSDs will be in huge contention for IOPs relative to the FC or SATA disks in the array, so unless you dedicate hardware to them you will not get that performance reliably.

Personally, I am thrilled to be sitting on a system based on distributed hardware RAID to support SSDs in the storage pool.

Mike_signature_5

November 14, 2009

Storage XIV MMIX Take II in MMX

Nothing communicates more clearly than a pile of Roman numerals. 

In a previous post I mentioned IBM’s Storage XIV.  Perhaps it was hyperbolic to say it’s not much of a product and to rate its SAN function as lackluster. It wasn’t just because “we have the technology” to use a green “unimpressed” faceFace-GreenNorm2 . Everyone else was pretty much green and happy by virtue of having a reasonable product. In summary, here’s why I just couldn’t bring myself to give a hearty happy face to the XIV:Table-Nov13-2009

I mean seriously, is everyone looking at this thing trying not to laugh? I feel like I am being mean here, but where’s the silver lining in this baby?

Someone told me they call it Storage XIV because it took them 14 tries to get it right. This is not credible; there couldn’t possibly be 13 worse instantiations of this thing.  I personally think IBM should have waited until take XX or so. But then the designers would have had to graduate from the 20th Talpiot class instead of the 14th.

The way they market it is hysterical. They call it “unified storage.” Unified? How so?  It’s SAN-only. They call it “green.” Green?? Really? You have to power up (and cool down) 180 TBs worth of drives to get 73 TBs of capacity. That’s just 44% of the raw capacity. That’s so far from green, it’s brown.  They say it’s “designed to eliminate the complexity of administration and management” yet they won’t even let you replace a failed disk. Page 35 of the redbook states “These operations [replacing a hard drive] must be performed by the IBM XIV technician.” Yeah, make a system so complex that normal IT staff can’t maintain it without using IBM technicians and then say it’s simple and easy to administer. 

What a load. If this is “marketing spin” it is world class: totally deceptive.

There isn’t much more to say on the XIV, except perhaps to ask if the next generation will improve on this embarrassing situation. Well, I would imagine they will replace their Ethernet back-end as I have heard it is the source of many of their field problems – can we all guess the XV or XIV II  or XIV Take II will be Infiniband? 

I am sorry for the IBM’ers who inherited this thing. I am sorry for the guys at IBM who have to watch marketing and sales try to pawn XIV off as a reasonable storage product. It makes me furious. There are a hell of a lot of great engineers and managers at IBM that know better. Damn it, they helped invent this industry, and the jackasses in Armonk or wherever that decided it was a good idea to go this direction should be ashamed of themselves.

I suppose IBM wants to displace the OEM LSI product with their own stuff. While LSI makes a far better product than the XIV (so does everyone else), the business forces of IBM including the ability to price it to win can cause a large-scale adoption of the thing. Yikes.

Mike_signature_5