« July 2007 | Main | September 2007 »

August 22, 2007

Episode 52: Data Integrity in Real Life

I think blogs were probably first invented by travelers as a way to try to relieve some frustration. I guess it is understandable but, I still am amazed at the frustration associated with travel. Here is my story of the day.

I get up early this morning for a trip from Boston to San Francisco. I have my travel arrangements in hand from our travel service confirming my reservations. I have the Email from my airline confirming my flight with a confirmation number and seat assignment and even asking if I would like to “check in online.” Wow – the era of modern technology.

OK with all of this data being sent to me, you would think I should feel reasonably confident that I actually have a seat for this flight – right? 

I get to the airport and the little “kiosk” kicks me out with one of those “we are unable to process your request” messages. I always wonder who they were referring to when they say “we” but I will let that one go. Why be so cryptic? How about something more simple like “hey moron – you have no seat – go away” – so I must seek out human assistance to sort this out. One trick I learned long ago was that it is usually quicker to call for help; especially on these morning flights where there are 2 people staffing a check in counter with 100 people in line.

So I call the reservations desk. First, of course, I must go through my favorite game of “get a human on the line.” This is an art in itself and fodder for another blog entry but after a few buttons and some verbal banter with the voice recognition system, I am in! It feels like a game to me; if you not careful, you can get lost in “the abyss of automation.” 

A very nice-sounding person gets on the line and I explain my situation. She looks up the record and says “well the problem is that you don’t have a ticket for this flight.” Her explanation is that the travel agent didn’t “re-ticket” a schedule change I had made. Now I know if I call the ticket agent – they will claim that the airline did something wrong and this will get me nowhere so I try a different approach. After securing a seat on the flight (and I guess paying a lot more for ticketing 1 hour prior to departure) I asked what I thought was a simple and logical question; “how can verify in the future that I actually do have a ticket?”

“What do you mean?” She asked 

“I mean is there any way to know that I have a ticket to fly on your airline?” I asked.

“Well, we do send you a confirmation and you can always go online and check,” she answered.

“I know,” I said. “And you even sent me an email with a confirmation number and a link to check-in when I didn’t even have a ticket. Clearly a confirmation number is meaningless.” 

Obviously, she we not going to be able to solve this herself but I hoped the she understood the irony in what she was saying. I am actually not sure she ever did.

This really is an information integrity problem. One interesting thing she did tell me was that the automated Email system that sends me the offer to check-in online only runs against the “reservation data” and not the “ticket data.” The bad news is that when I go to actually check in, they do check for a ticket. 

Today we have so many new systems trying to use information and, often, assumptions are being drawn from that data that are simply not accurate. As I have found several times now, I can call or go online to confirm travel and receive confirmation numbers, times, and even seat assignments with literally no warning that I really don’t have a ticket to fly.

Systems are only as good as their information sources and, clearly, this is a major driver for more information-centricity to our IT approach. In most organizations today, too much information remains trapped behind monolithic applications effectively making this task even more difficult. Success at leveraging information depends highly on the ability for access to timely, consistent, and accurate data.

I doubt that IT can cure all of the travel frustrations but insuring information reliability and integrity could go a long way towards eliminating at least a few frustrating times. 

Mark…

August 05, 2007

Episode 51: Storage Technology - Part 2

OK, so down to business. Here are my top 5 inflection points for storage technology in the next 3-5 years.

  1. Offline Storage (Tape) becomes extinct for most uses
  2. Flash becomes a viable Tier 1 storage option
  3. High Capacity/Low Cost Disk becomes the principle “bulk storage” medium
  4. FCoE (Fibre Channel over Ethernet) SANs become the FC evolution path for OLTP storage and enterprise data centers
  5. Web Storage Applications move away from SCSI and File System protocols and become connected principally via “Object” protocols (e.g. SOAP, REST). 

Here are some deeper thoughts on each of these topics.

1. Offline Storage becomes extinct for most uses 

Yes, I know, I know – folks have been saying “tape is dead” for so many years that it has become the Chicken Little story of IT (right next to the death of the mainframe). One point to remember is, even when the change is evident, existing technologies do not simply just fall off the planet. We can still buy VCRs right? Yet, I would hardly consider VCRs relevant to the video market. This is where tape is headed.

There are so many factors here. The cost/availability of network bandwidth, the cost of the “people” side of tape storage and handling, disk cost declines, multi-site DR, data de-duplication and many more factors will drive this change. Suffice to say I think this one is happening now and will accelerate going into next year. Sure it will still take time to transition but, just because it is a long trip, it doesn’t mean we don’t know where we are headed. 

Short of a few things, there is just not that much that we really need to store in caves. You could argue that some information is replicated so much that there is no need to backup at all.

2. Flash becomes a viable Tier 1 storage option  

Folks are starting to talk more and more about this one. Flash is fast (relative to disk), especially in terms of read latency. The technology is power efficient, and the costs are plummeting. As the total read/write cycle limits improve, expect to see flash begin to play a role as a Tier 1 storage technology especially for OLTP applications (see the previous blog if you don’t understand why).

3. High Capacity/Low Cost Disk becomes the principle “bulk storage” medium 

If you can imagine flash growing for the very low latency applications and offline (tape) shrinking, where does all that data go? I believe it goes to very basic low-cost disk. Performance is not an issue for most applications and the data can be spread to both utilize the capacity and reduce effect of individual disk performance limits. These drives will also be designed to facilitate power saving features.

With the technology and power costs/limitations of moving to higher performance drives (power consumption grows as the cube of rotational speed), I believe that we will achieve performance more effectively in two ways. Flash will be used for OLTP applications, and simple replication will be used for Web Storage Applications. This will make high-capacity power-optimized disk drives an ideal fit for most data. 

4, FCoE SANs become the FC evolution path for OLTP storage

While there is always discussion about the future of SANs, I believe they are alive and well and will carry forward, over time, with a physical (but not protocol) consolidation to Ethernet. There have been a number of contenders, namely iSCSI and Infiniband, which have tried to challenge SANs. While each has made inroads, each also has significant other technical and commercial limitations that will prevent mainstream adoption. FCoE allows the consumer to build one physical infrastructure for both their IP and FC needs but offers two additional key advantages. First, running on native Ethernet allows FC to run “at speed” with all of the present capabilities maintained. Second, it can be connected to an existing FC SAN without a complex gateway. Data can travel without protocol change or significant latency across the two mediums. 

While there are several options out there to take a bite out of the traditional SAN market, I believe performance with compatibility will give FCoE the edge.

5. Web Storage Applications move away from SCSI and File System protocols and become connected principally via “Object” protocols (e.g. SOAP, REST). 

This is, by far, the most significant and long term inflection point but is also one that I am confident will happen. In fact, it is happening right now with Web 2.0 applications and with almost any application that uses an Enterprise Content Management (ECM) platform. When an application assess information though EMC Documentum or a service like Amazon S3, it is using this “object” principle.

A big part of moving forward with the concept of Web 2.0 or, specifically, Information 2.0 is the simple need to decouple information from applications. To do this, we must explicitly change how we package the information so that it can be tiered, protected, and secured independently from any single application service. Obviously, interacting at the block or even file level is just not going to meet this need. 

In the next part, I will give my take on the evolution of information protection and availability.

Mark…