Archive

Archive for April, 2009

Paradigm Perturbations

April 23rd, 2009 Alan Atkinson No comments

Once upon a time (about 18 months ago actually) data protection was considered one of the most boring areas of the storage market.  If ever there was an area ripe for change (in fact, ripe for an entire paradigm shift), it was backup.  Well, ask and ye shall receive.  Data protection is now the most dynamic area in storage today.

In the interest of brevity, I’ll confine this discussion to just backup (although there’s a lot happening in replication too).  First, let’s start with tape.  For most companies, tape now comes in two very distinct flavors: real (the old fashioned kind) and virtual (really a disk library).  For the most part, the virtual kind of tape is eventually written out to real tape for vaulting (as well as costs and long term storage) purposes.  Secondly, there’s deduplication.  Deduplication comes in many flavors, but the net effect is that less data is stored (sometimes, less data is moved over the wire as well).  Deduplication is complicated because not all data de-dups well.  It’s good to know which data does and which data doesn’t (by the way, this often depends on the deduplication solution being used).  Thirdly, there’s virtualization.  Now, virtualization is not a data protection technology, however, it is the impetus behind this inflection point for backing up data.  Virtualization basically destroys the old fashioned “back everything up to tape every night” backup strategy.  Why?  For starters, take the most I/O intensive process in your whole IT operation, backup, and layer it on top of the worst technology available for I/O performance, virtualization.  Also, in a virtualizaed environment, there is a lot more data because there are a lot more servers.  Additionally this data has a ton of redundancy.  Lastly, virtual servers raise a huge number of configuration issues.  It’s not as simple as the old days when a server was really a server, and it was backed up to a physical tape.  If you don’t get this right, recovery can be unbelievably fun (e.g., sorting through tapes to figure out what data was where on a given day) (NOT!).

Enter Data Protection Management (DPM)…

DPM has been one of the fastest growing sectors in storage software for all the reasons stated above.  Put another way, backup is: too expensive, too risky and too hard to properly manage.  These are the problems that DPM solves.   Most DPM products are focused on backup/recovery today,  however this is changing rapidly.  Vendors in the space are hearing from their customer that they to manage the entire stack of data protection technologies including replication as simply, cheaply and with as little risk as possible.  Fundamentally, customers are telling me that they want to be able to apply one service level to critical production data, another to email and a thrid to less important generic user data.  These SLA’s cover everything from recovery time objectives (RTO) to retention periods.  Customers want to  be able to leverage old and new technologies (traditional tape backup and deduplication for example) to create an efficient, cost-effective data protection environment that meets their business requirements for availability.  Using DPM, you not only guarantee efficiency and utilization levels (thus eliminating costs by purchasing more capacity than is needed), you also reduce risk and manage configuration changes and such that occur in today’s hybrid virtual/physical environments.  Customers have told me that they are seeing payback of 12 months or less in real, hard dollars after implementing a DPM solution.  That’s real money.  As an added bonus, they also sleep better at night knowing that their data protection policies (especially policies associated with recoverability) have been rigorously enforced and that they can cleanly demonstrate this via flexible reports to anyone who cares.  To me, DPM goes hand in hand with all the new, emerging data protection technologies.  Given the payback period, one might wonder why every company hasn’t implemented DPM.  The surprising fact is that an ever increasing percentage have implemented DPM and are already reaping the benefits.

Posted by Alan Atkinson

Post to Twitter Tweet This Post

Scridb filter

Information Classification – IT’s Hardest Job

April 16th, 2009 Steve Kenniston No comments

I have decided information today, is like a group of friends. If you look at my LinkedIn page or my Facebook page you see that I have over 600 connections and over 180 friends respectively. What does this really mean? Obviously don’t stay in touch with all of these people. So why do we have these connections? I think it is because we believe that in the future, each one of these connections will offer some kind of value to us. It may be that they will be a friend to us, they may share common experiences to help us through a personal issue, and they may help us find a mate or even a job. We just don’t know so we hang on to the connection.

This is not unlike information. We are all tired of hearing that “data is growing at an exponential rate” but we never look at why. It is simple. We believe that ‘someday’ we may need that ‘valuable’ piece of content so we better not delete it. More importantly, the people who are accountable for managing that data (IT) are one step removed from the ‘value’ discussion (usually) so rather than delete anything and be responsible for “loosing data” they save and protect everything.

Recently I spent 4 hours on my Facebook page ‘categorizing’ my friends. I created a number of categories, friends from high-school, friends from college, colleagues from work (current), colleagues from work (past), industry connections and relatives. As you can imagine there are some friends that belong in more than one category – so how do I choose which one they should go in? Also, what happens if I change jobs? Where do the ‘colleagues (work)’ friends go? When do I move them? Do I remember to move them?

I have often said when presenting to customers, “EMC can help you with all aspects of you data except for one thing. EMC will never know the value of a piece of your content to you. You have to tell us, and then we can manage it properly.” Typically when customers hear that statement, they agree, but they also agree that the process of classifying data is a daunting task. You can see the challenge of just organizing friends in Facebook. There are so many permeations of how data can be classified that IT chooses the path of least resistance, store and protect everything.
While storing and protecting everything is easy, it also hits at the three biggest challenges IT are faced with; cost, complexity and compliance. These three vulnerabilities are the toughest to balance because not only are they important in their own right, they also are interdependent. As data grows, the inability to protect it grows which means IT either needs to spend more money or be out of compliance.

The cycle is only broken when new processes are introduced. These processes are a part of a key message when it comes to data protection; assess (classify), archive, backup, manage. Only when customers believe that the struggle of trying to keep cost, complexity and compliance in check happens when a new process is introduced, can the cycle be broken. Once new processes are in place, the data center can become more efficient.

Consider this analogy: In July 1936 Henry Philips received a patent on a new type of screw and screw driver he had invented. This new “technology” changed the world of mass production and machine repair.

He didn’t set out to make the life of hand tools easier, he was trying to solve an industrial problem. The new screw and screwdriver was designed for use with power tools and more specifically power tools on an assembly line.

The slot in the screw allowed itself to seat itself in the tool automatically when contact is made which saves a second or two and if you have 100’s or 1000’s of screws like in cars or airplanes then it saves a great deal of time.

In 1938 Henry was able to get the American Screw company to spend a $500,000 to develop a manufacturing process around the new screw. By 1940 nearly all of the American manufactures had switched to the new process and the new screws. It made all the assembly of military air craft and jeeps much more efficient. Having these vehicles made faster and more efficiently contributed to a competitive advantage.

So, it’s like I say when talking to customers; “The hardest thing to change in the data center is not technology it is process “. Once the psychological inertia of dealing with a new process is overcome, then progress can be made.

Once customers start to classifiy their information (assign value to it), they can begin to archive their ‘old’ data.  This will still provide them access to it, just not as quickly. Once this data is removed from the backup stream, backups will then run much more efficiently. Additionally, deploying new technologies such as deduplication for specific data types (realized during a proper classification effort) allows IT to more efficiently backup specific data types in specific areas for much lower cost. Now that all the work has gone into establishing a new set of processes, IT will want to continue to manage this new set of processes to ensure that all the hard work they have done has tangible business capabilities. New processes can help IT attack cost, complexity and compliance but it all starts with information classification.

Posted by Steve Kenniston

Post to Twitter Tweet This Post

Scridb filter

Betamax Redux

April 9th, 2009 Steve Kenniston 6 comments

I often joke w/ customers that when my friends were growing up they would dream of being a professional baseball player or a rock star and I used to dream of becoming a data protection technologist.  Recently I read something very profound in Chuck Hollis’s internal EMC blog. Chuck said, “Decide what you’re passionate about …and write about it… it is hard to write about stuff you don’t care about.”  I am passionate about data protection.  Not because data proteciton is “cool” or anything, but it is one of the most important practices in the data center.  It is also one of the most challenging practices in the data center and it involes not just technology but people and process as well.  I had an old boss once who said, “Where there is chaos, there is cash.”  and given the fact that the data protection market is a $10B market, I would say he was correct.  I have started this blog along with my colleagues because we truly believe in what we do, who we work for, the challenges we solve and benefits we bring to a customers challenging world around data protection.  We write because we are passionate about data protection, not because we are being paid to.

Something I read a while ago in Tony Assaro’s blog, Leaders Dilemma as well as Setting the Record Straight really got me charged up but I wasn’t sure how I wanted to comment. Tony, you see, writes for money (not passion), which means he has to write ‘for’ the company that is paying him and at the same time, spend time ‘Manufacturing Confusion’ in the market. (Sorry Tony, I liked you better as an analyst when you heard all the vendors product messages and would form an opinion about what was really going on in the market.) What I am referring to are the comments specifically about “EMC is the one big player going after this market in earnest with three different products (which will confuse the market and themselves)”. Quite frankly, EMC’s philosophy and message to its customers regarding data deduplication isn’t confusing at all. In fact when I speak with our customers, they believe we have one of the more thoughtful and consistent messages around this topic.  So in an effort to educate, let me share EMC’s data deduplication philosophy and how EMC will take backup, beyond.  EMC will:

  1. Provide deduplication as a pervasive & architecturally consistent service
  2. Coordinate deduplication throughout combinations of data storage and data movement
  3. Deduplicate at the highest level of abstraction
  4. Deduplicate as close to the source as practical

When these values are leveraged, the entire spectrum of data protection morphs into methods that will be used to protect data well into the future.

Back to the subject of the blog. Data Domain will continue to sell good products to customers. Data Domain will continue to innovate their existing technology to meet customers’ demands. But they will do this at the expense of a lack of innovation. Remember, the hardest thing to change in IT is process, not technology. Backing data up to disk targets is nothing new and now, backing data up to disk devices that perform deduplication is not innovative. However, the paradigm of using traditional backup software to move full files across an expensive network is beginning to evolve. It MUST evolve, and when it does, what happens to the companies that have interesting features that are just one small morsel in the food chain? If you don’t own any significant IP in the extended processes that is data protection, then you will be left out of the backup buffet.  And as Maslow would say, “If all you have is a hammer then everything looks like a nail.”

EMC has taken a leadership position in the data deduplication space not because they offer multiple products but because of the way we look at technology.  Data deduplication is made up of different components:

  1. Data ‘chunking’
  2. Compression / Encryption
  3. Assign Content ID
  4. Store

The goal is to be able to leverage these components across multiple storage platforms providing deduplication at the highest level of abstraction as possible and as close to the sorce as practical based on the requirementsof the application .  Preserve the content by deduplicating content instead of data.  The objective, over time, is to provide deduplication as a pervasive and architecturally consistent service across EMC’s entire storage portfolio.  When you do this the entire paradigm of protecting information evolves and this is why EMC is the leader in data deduplication.  Not because we have 3 (or however many) products, but because of the way in which we look at data deduplication.

At the end of the day EMC has over 2000PB of deduplicated data under protection utilizing both source and target based deduplication solutions. And, I would venture to estimate that if you include NetWorker, RecoverPoint etc… EMC has exabytes of data under protection. EMC has a long history of changing with the times, listening to their customers, investing in new technologies and protecting customers data they way they want and need it to be protected.  That is taking backup, beyond.

Posted by Steve Kenniston

Post to Twitter Tweet This Post

Scridb filter

Don’t forget to Archive

April 2nd, 2009 Rob Emsley No comments

Hello, my name is Rob and I’ve been recovering for many years.  Recovering Data that is:-)

Before considering many of the new innovations to help improve backup I suggest that you look at implementing an archive first. This will reduce your primary storage usage dramatically and make backup easier.

At EMC we started archiving our employee e-mail at the start of 2007. Personally, this meant no more management of PST files. Management involved creating PST files on my notebook, manually moving e-mails and then performing my own backups to ensure that I always was able to recover. Basically, I was my own backup administrator.

Today EMC announced EMC SourceOne, a new family of products for archiving, e-discovery and compliance.

  • EMC SourceOne Email Management archives e-mail from Microsoft Exchange and IBM Lotus Notes/Domino as well as SMTP and instant messages to improve operational efficiency of messaging systems, reduce production, storage and backup costs and enhance message retrieval and system recoveries.
  • EMC SourceOne Discovery Manager provides high volume discovery search and collection for e-mail archived by the SourceOne Email Management. It can quickly find, safely hold, efficiently cull and defensibly produce archived e-mail in response to legal/regulatory notice and/or corporate policy complaint. Discovery Manager is built around a legal matter or case metaphor and supports secure authorized investigator access, defensible collection results and chain of custody.
  • EMC SourceOne Discovery Collector is an indexing appliance that automates the in-house identification, collection, preservation, and policy management of unstructured content that resides on data sources such as desktops, laptops, common Internet file systems (CIFS) and network file systems (NFS), networked attached storage, Microsoft Exchange, SharePoint and other content management repositories

I would describe EMC SourceOne Email Management as a 2nd generation e-mail archiving product, delivering an architecture capable of supporting even the most demanding requirements, especially as E-mail continues to be a critical application for most customers.  SourceOne components can be deployed on just a single server or distributed across multiple physical or virtual servers. To support the EMC user community the new product is being implemented on a VMware ESX infrastructure which will allow for easy configuration changes.

No more e-mail backups for me as all the messages I keep are either stored on our Exchange servers or  archived onto our EMC Centera storage.  One less thing for me to worry about.

Posted by Rob Emsley

Post to Twitter Tweet This Post

Scridb filter
Categories: Archive Tags: ,

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.