« June 2007 | Main | August 2007 »

July 29, 2007

Episode 50: Storage Technology - Part 1

So, I blog a good deal on information technology and technology management but I have yet to give my basic impression of what will happen with the future of core storage technologies themselves.

First, I know in advance that this post is going to create a firestorm of opinions and I know that there will be disagreement. I think it is important to note that EMC, as a customer-centric company will work now and always to offer the technologies and solutions demanded by the marketplace. Don’t look for us to stop building products just because I have a prediction. The actions we take are based upon demand, not prognostication.

Before I can discuss the storage needs, however, I first need to give you my take on the evolving requirements around data and information. Data is the “customer” of storage so to understand where storage is going, you start with data.

I believe that the “data” world will continue to divide itself into 2 distinct types – often previously called structured and unstructured data. But it is not quite that simple anymore because organizations must move more and more to add some “structure” into their unstructured data to make it usable. So, in effect, all data and information is going to become more “structured.” These terms are no longer good monikers to define the data types.

Rather, I believe the bifurcation of data will be more and more based on the need for what I call “single transaction latency.” OLTP systems have this requirement today and transactional performance remains a paramount attribute for the associated storage systems. Single transaction latency is critical because, most OLTP systems operate off of a single relational database (for consistency). In this case, total bandwidth and IO capacity are typically secondary elements to latency. You can view these systems like superhighways with a single toll booth - the performance of the toll booth (IO latency) drives system performance.

In contrast, the bulk of the remaining information (estimated at 70%+ today and growing to 95% by 2010) will fall into the “other” category – I will just call this “Web” data. As I noted, the defining difference in this data is that single transaction latency is not the most critical factor. Take, for example, a search on the Web. Any search you or a do may take ½ a second. Does it matter that much if it takes 0.45 sec or 0.55 sec? Not really. Since the searches from many people can be run in parallel, the need here is aggregate performance. On the superhighway, you can have slower toll booths but they are not the bottleneck as long as you have enough of them.

The change that I believe we will see is that the “unstructured” data needs to become more “structured.” Clearly, using a classic relational database is not the answer. Isolating Web data within a database application would be far to constraining. The “structure” will come from tagging, indexing, metadata, and object structures with defined ontology’s.

We recently acquired a company called XHive that builds some great technology to aid in our efforts. Essentially XHive build XML database technology. This provides a way to structure data in a more relational way while avoiding the constraints of using a proprietary db structure. Since the data and metadata is maintained in XML there is no lock-in to any particular application either.

Within these data types, there are an infinite number of other performance, reliability, and information requirements that will continue to drive tiered storage and Information Lifecycle Management needs. So why did I zero in on only one attribute to define the data types? The reason is simple, for OLTP applications, the transaction latency need drives the optimized storage architecture. For “Web” data, architecture is driven by more aggregate system requirements.

While there are clearly almost an infinite number of data types and requirements, the first premise is that storage architectures will, for the foreseeable future need to address these two fundamental needs for data (in the past characterized to be “Structured” and “unstructured” data) – I now consider it to be more appropriately labeled “OLTP” and “Web” data.

In the next blogs, I will discuss the use of core storage technologies and the future state of information availability.

Mark…

 

July 21, 2007

Episode 49: Don't Get Comfortable

Everyone likes comfort -a favorite chair or a comfortable bed. It is a good thing - except not in business, especially technology. 

I am reminded of Andy Grove's famous mantra "only the paranoid survive." For me, my mantra is similar - "don't ever get too comfortable with your business."

EMC is now, by any ranking, a "big" company - and a big software company (7th largest in the world the last time I checked) as well. The challenge with being big is simply to never get too comfortable and realize that past successes do not guarantee future success. 

It can be hard - the industry is littered with companies, once strong, now gone, mostly because they simply got too comfortable with their technology and products.

Clearly innovation is important in most businesses today but, often time, it is less about pure innovation and more about simply being responsive to the natural innovation that can occur in the industry. 

Take, for example, the internet. Clearly it has changed so many things in our daily lives and transformed how we get information and how we communicate. At the core, there was innovation necessary to make the communication possible but the real growth and change happened as companies exploited these capabilities. Most of the companies that had a hand in “inventing” the internet have not profited nearly as much as those who were able to exploit it.

Building value is as much about execution as it is about vision. As I look at the industry trends and potential out there today – there are not a lot of “secrets.” The landscape and technologies are there – in plain view. I go to conferences and everyone is sharing thoughts and ideas. Customers will openly discuss their issues and pain points. 

So with all that, you would think it would be easy for big companies to use their vast resources to quickly respond to customer needs and adopt new ideas and technologies but it is never that easy. Companies get comfortable with their products and philosophies and, since these strategies have been successful in the past, there is often a misguided belief that past success will predict future performance. In the changing face of technology and business, getting comfortable is the worst thing a company can do.

Hopefully everyone both inside and outside of EMC can see that I for one am not going to let us get comfortable with the successes of the past. While we will continue to use acquisition as a part of our strategy to offer the most innovative products and technologies, we will also continue and expand our internal efforts to yield new products and solutions to address changing needs and leverage new technologies. 

In future blogs – I will talk more about these efforts and more of the exciting things going on here at EMC. And one thing is certain - we are not going to get comfortable!

Mark…

 

 

 

July 08, 2007

Episode 48: Did the FSF go too far with GPL v3?


The recent publication of version 3 of the Free Software Foundation’s (FSF) GNU General Public License (GPL) (http://www.fsf.org/news/gplv3_launched) was an event that probably was not even noticed by most users or SW companies. The fact is, however, there are some changes here with the new version (v3) that could fundamentally diminish the present benefits the industry enjoys from the open source paradigm.

Some Background

First, let me clarify what I mean by “Open Source”. At the highest level I believe “Open Source” is a development process whereby a community of developers collaborate, innovate and share code to produce software. Of special importance is that each community determines its own business models and software licensing policies. Open Source is often free to license but that does not imply, of course, this there are no costs to deploying this software. Companies and individuals generally make money in other ways – usually by providing services and support. While I said the code is “usually” free, I do also believe that if a community wants to allow its software to be licensed for money then it should be able to do so, even if the code is freely available. You might think that this would stifle innovation, but it actually does the opposite.

Caveats and incentives have always existed with most open source licenses to insure that code that directly enhances a given open source project would always be “given” back to the community. This is clearly desirable to maintain consistent functionality, reliability and drive competitiveness (of the open source project). This works well for almost everyone. It gives companies the ability to contribute to the “core” while innovating in their particular area or application. The company can add value and maintain the right to sell its own applications.

About The GPL

There are many licenses that can be used for open source projects, over 50 licenses alone are listed by the Open Source Initiative (http://www.opensource.org/licenses/alphabetical). One of the most popular ones is the GPL. The GPL was first published in February, 1989, with version 2 released in June, 1991.

I don’t want to go into a lengthy history of GPLv2 here, but suffice it to say that by a combination of intent and accident, GPLv2 created a very broad software community of diverging political and economic beliefs. GPLv2 created a world in which not only is core software technology actively maintained and improved by its users, but also one in which all sorts of proprietary innovation beyond the core software technology is leveraged based on that core technology.

The results of these efforts, I believe, have been to accelerate innovation. Companies have been able to focus their “proprietary” efforts on specialized functions while sharing from (and contributing to) core technology via open source and collaborative efforts.

The net result has been broad sharing of development costs and efforts across large communities that benefits all users of the core software in terms of cost, features, and compatibility across systems.

So What is the Big Deal About GPL v3?

There are three primary areas that concern me about GPLv3:

  1. New restrictions on combining open source and proprietary code
  2. Incompatibility between GPL v2 and GPL v3
  3. Patent licensure

New restrictions on combining open source and proprietary code

The newly aggressive provisions about what code combinations must be covered by GPLv3 create cumbersome issues for anyone, proprietary or open, who wants to combine code under another license with GPLv3. GPLv3 provided only an eleventh-hour one-way compatibility with Apache. So Apache code can be combined with GPLv3 code under the GPLv3 license but not under the Apache license. This is not a gift to the open source community.

For developers who use GPL code for “nuts and bolts” and innovate around it, GPLv3 is harder to work with than GPLv2 and much, much harder than Apache or BSD. It isn't impossible - just the equivalent of a full employment contract for many lawyers. This is not a direction I see as useful, it’s just added confusion.

Also by clamping down on the allowed means of mixing software, GPLv3 rules out some of the current types of proprietary innovation, thereby removing some existing incentives to contribute to the maintenance and improvement of open source software. This sacrifices  investment in open source software in order to pursue political principles, and I wonder whether that tradeoff is the wisest move.

Incompatibility between GPLv2 and GPLv3

GPLv3 is a new license in some important respects and not a graceful evolution of GPLv2. The two are incompatible, so there’s a definite “my way or the highway” feel. All of the projects that are currently licensed under GPLv2 must make a decision whether a migration to v3 is worthwhile. As observers there’s no knowing when, if and how long that might take.

In the interim, this incompatibility restricts combinations of GPLv2 code with GPLv3 code. Open source developers and distributors must carefully track the GPL licensing of software that passes through their hands, as GPLv3 prohibits distribution of some GPLv2/GPLv3 code combinations that are acceptable when all the code is under GPLv2. This expenditure of effort by open source participants is an unfortunate side effect of GPLv3’s incompatibility with GPLv2, and it may rise to the level of rewriting code that cannot be re-licensed.

Patent Licensure

With GPL v3 the FSF attempted to clarify issues related to software patents, but writing tactical patent provisions into a new license is not a good strategic move. Two examples of tactical provisions are the grandfathering of the Microsoft/Novell agreement, and the special exemption of broad patent cross-licensing from requirements that patent licenses are accompanied code. Only a few very large companies, such as IBM make money from the practice of broad patent cross-licensing, so why should this provision be in there? 

Sooner or later, the decision to have GPLv3 pick and choose among the means of exploiting software patents will rebound to the license’s detriment; having a fundamental software license become a tactical weapon in software patent battles is not a good strategic move because it risks that software license becoming a shaky foundation.

Summary – A Step Backwards?

It’s too early to tell whether the FSF went too far with GPL v3, but I’m starting to think so. I’m worried about the heavy handed stance on mixed uses; I’m concerned with license compatibility confusion and I think adding tactical patent provisions will degrade the value of the license of time. All of these changes could ultimately undermine the level of collaborative innovation that has happened under the present licensing systems. Ultimately the success of GPL v3 rests in the hands of the open source community, so let’s see whether they adopt it or not.

Mark…

July 03, 2007

Episode 47: Infotainment

As I deliver a lot of speeches, sometimes I feel like more of an entertainer than what I really am (which would take too long to define). It often seems like I am "on the circuit" so to speak. I feel like one week I will open my calendar and it will say, you are doing the Forum on Monday, hitting Leno Tuesday and then a 2 week run at Caesar’s in Vegas!

When it comes right down to it any speaker, educator, or sales person needs to "entertain" to some degree. Ever notice how the teachers that could make learning interesting were the ones you remember (as well as the subject matter). It is not different for any of us.

I often watch certain "star CEO's" and executives present horribly and I wonder how much more impact they could have if they "jazzed" things up a bit. I don't much listen to this type of person any more as they are simply reciting from their market message without passion or added value.

There is clearly a place for messaging but, if that's all the CEO is going to say, then they should hire an entertainer to deliver it. At least the audience would have more fun.

I assume that if anyone wants to know about the products or services we offer or our key value propositions they can go to www.emc.com. This blog connection is to explain why we are doing things and give a personal take on “stuff” - and hopefully do it in a way that is fun - infotainment if you will...

Mark…