« Episode 49: Don't Get Comfortable | Main | Episode 51: Storage Technology - Part 2 »

July 29, 2007

Episode 50: Storage Technology - Part 1

So, I blog a good deal on information technology and technology management but I have yet to give my basic impression of what will happen with the future of core storage technologies themselves.

First, I know in advance that this post is going to create a firestorm of opinions and I know that there will be disagreement. I think it is important to note that EMC, as a customer-centric company will work now and always to offer the technologies and solutions demanded by the marketplace. Don’t look for us to stop building products just because I have a prediction. The actions we take are based upon demand, not prognostication.

Before I can discuss the storage needs, however, I first need to give you my take on the evolving requirements around data and information. Data is the “customer” of storage so to understand where storage is going, you start with data.

I believe that the “data” world will continue to divide itself into 2 distinct types – often previously called structured and unstructured data. But it is not quite that simple anymore because organizations must move more and more to add some “structure” into their unstructured data to make it usable. So, in effect, all data and information is going to become more “structured.” These terms are no longer good monikers to define the data types.

Rather, I believe the bifurcation of data will be more and more based on the need for what I call “single transaction latency.” OLTP systems have this requirement today and transactional performance remains a paramount attribute for the associated storage systems. Single transaction latency is critical because, most OLTP systems operate off of a single relational database (for consistency). In this case, total bandwidth and IO capacity are typically secondary elements to latency. You can view these systems like superhighways with a single toll booth - the performance of the toll booth (IO latency) drives system performance.

In contrast, the bulk of the remaining information (estimated at 70%+ today and growing to 95% by 2010) will fall into the “other” category – I will just call this “Web” data. As I noted, the defining difference in this data is that single transaction latency is not the most critical factor. Take, for example, a search on the Web. Any search you or a do may take ½ a second. Does it matter that much if it takes 0.45 sec or 0.55 sec? Not really. Since the searches from many people can be run in parallel, the need here is aggregate performance. On the superhighway, you can have slower toll booths but they are not the bottleneck as long as you have enough of them.

The change that I believe we will see is that the “unstructured” data needs to become more “structured.” Clearly, using a classic relational database is not the answer. Isolating Web data within a database application would be far to constraining. The “structure” will come from tagging, indexing, metadata, and object structures with defined ontology’s.

We recently acquired a company called XHive that builds some great technology to aid in our efforts. Essentially XHive build XML database technology. This provides a way to structure data in a more relational way while avoiding the constraints of using a proprietary db structure. Since the data and metadata is maintained in XML there is no lock-in to any particular application either.

Within these data types, there are an infinite number of other performance, reliability, and information requirements that will continue to drive tiered storage and Information Lifecycle Management needs. So why did I zero in on only one attribute to define the data types? The reason is simple, for OLTP applications, the transaction latency need drives the optimized storage architecture. For “Web” data, architecture is driven by more aggregate system requirements.

While there are clearly almost an infinite number of data types and requirements, the first premise is that storage architectures will, for the foreseeable future need to address these two fundamental needs for data (in the past characterized to be “Structured” and “unstructured” data) – I now consider it to be more appropriately labeled “OLTP” and “Web” data.

In the next blogs, I will discuss the use of core storage technologies and the future state of information availability.

Mark…

 

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341f672753ef00e39331c8458834

Listed below are links to weblogs that reference Episode 50: Storage Technology - Part 1:

» There Are Two Kinds… from Stephen Foskett, Pack Rat
There are two kinds of people in this world: those who believe there are two kinds of people in this world, and those who dont. This week, EMCs Chief Development Officer, Mark Lewis, posted a thoughtful blog episode... [Read More]

Comments

I like your blog, it’s always fun to come back and check what you have to tell us today.

Hello from Spain.

I came across this blog because I was searching for information to know more about EMC.

Quite interesting article, although my knowledge about storage systems is very limited and I don't undestand some of the abrevations you use.

I am applying right now for a job (a junior position) at EMC and I think is very interesting to be able to read a blog from the "Executive Vice President and Chief Development Officer" of the company.

It gives you a very different pespective from the company and its business. Very good iniciative.

I will keep visiting this blog :)

Sorry to disappoint you ... I'm not disagreeing but very much agreeing with you. What's more, quite a lot of OLTP workload can and should be offloaded into Business Intelligence / Data Warehousing. It makes no sense to run big reports against the transactional DB. And many of those reports should be 'printed' to PDF and then accessed in the frozen state, e.g. end of quarter P&L, etc.

Wonder when will enterprises start having nano-GOOGLE as their core? The appliances are being marketed, but I'm not seeing organisations taking up what would appear to be a no-brainer.

Ok.... is EMC going to buy Google or vice versa?????

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.