Archive

Archive for the ‘Archive’ Category

Data Protection Management from ‘Nice to Have’ to ‘Need to Have’

December 15th, 2009 Steve Kenniston No comments

Data protection management has come a long way in the past decade.  More importantly the features and functionality that are in products these days and what customers have come to expect are now no longer ‘nice to have’ feature in the data center, they are ‘need to have’ features.

Additionally, the term ‘data protection’ is morphing every day and has different meanings to different people.  Questions like ‘is replication data protection?’ or ‘is archive data protection?’ or ‘is DR / BC a function of protection?’ are now common in IT circles.  Each in their own right is a methodology for protecting information or has some play in the grand scheme of data protection.  The reality is, much like every answer in IT, the answer to these questions is ‘it depends’.  Data Protection has many different definitions, which start to expand the scope of what it actually is and more importantly, how it is managed cost effectively across the whole environment.

It is this expanding scope of data protection  where data protection management tools come into play, and the more flexible and granular the tool, the more effective.  It is hard to have good data protection capabilities without having insight to the environment.  First, understanding what type of data lives in the environment, where it is, how it is used and some characteristics about its age or its access frequency helps to determine how to best protect the information.  This is where a data protection management tool that provides some insight to the file system adds a great deal of value.

Next, if archive is a part of data protection (and I would argue that a functional archive, when used properly, is) then a data protection management tool that provides insight to the data in the archive can also help manage the overall protection process within the greater environment.  Knowing if the data in the archive is actually being accessed or if it can be deleted (unless stored for compliant purposes) can help to control archive costs.

If replication is a part of the overall data protection scheme, a data protection management tool that provides insight to this process can also add a great deal of value.  Identifying if links are up, if data is moving between sites and if the data is available, accessible and meets my recovery point objectives at the remote site can ease the concern of recoverability in the event of a disaster.

And finally, providing as much information as possible such as deduplication rates,  tape growth, disk growth (in disk based backup targets – including deduplication targets), as well as providing true analytics into the backup environment to help make decisions as to when to switch from a tape-based solution to a disk-based solutions.  These analytics need to be in-depth enough to show that if some data that is being protected with traditional backup technologies are moved to a next generation solution, such as source-based deduplication, then what affect will it have on the overall backup environment, will it help to better control costs, will it help to increase SLAs?

At a higher level, customers are telling me that they no longer want to manage backup, they just want it to work and they want proof it is working.  As customers move to a more virtualized IT infrastructure, they find that they are being forced to rearchitect their data protection environment and they are now looking to solutions that elevate the process.  IT is looking for tools to make their environment “data protection aware.” As virtual machines are added to the environment they are automatically protected and want notification if they are not so they can mitigate any risk, and let’s face it, backup is all about risk mitigation.  Backup is insurance.  Wouldn’t it be nice if your insurance company had deeper insight to all the cars / drivers in your family and told you when your teenager was speeding on a monthly basis and told you that your premiums are going to go up if they don’t start driving the speed limit before they got the ticket and your premiums increased?

Any tool that IT invests in for a common process, data protection in this case, needs to be flexible enough to allow IT to manage as much of the overall process from a single pain of glass.  Good data protection management tools need to provide IT as much visibility into the overall data protection environment as possible in order to help make good decisions about what data technologies should be invested in, in order to help IT meet its overall SLAs and hence business objectives.

There is no sense spending a great deal of money on rearchitecting a backup environment if there is no insight to the success of the new architecture.  Sooner or later, management needs to have the pretty graphs that prove to someone that the right decisions are being made when it comes to protecting information, or when it comes to how much is spent on data protection or if the SLAs can be met.  Not having good data protection management tool, and spending too much on new data protection architectures while not meeting your SLAs could lead to a RGE (resume generating event).  Data protection management tools today are a need to have, not a nice to have.  Make the investment and put your data protection environment back on the Road to Recovery.

Post to Twitter Tweet This Post

Scridb filter

Comprehensive Capacity Optimization – Deduplication 2.0

October 7th, 2009 Steve Kenniston No comments

Technology is great isn’t it?  When someone thinks they have a new idea on the same old technology foundation they call it “X 2.0″.  I have been watching the banter between analysts and vendors (specifically NTAP’s Dr. Dedupe and Permabit’s CEO Tom Cook) on the topic of Deduplication 2.0 and it is my belief that the proverbial boat is being missed (since we are using water analogies).  I have been watching these guys hash it out for the past few weeks and decided I have to jump in.  I find the real value to these conversations is the value to the end user.  At the end of the day, it doesn’t really matter who ‘coined’ or ‘invented’ a term (like deduplication 2.0) but what does matter is if  the term actually helps describe a technology and how that technology can be leveraged to make things better in the data center.  We should focus on the implications of this new generation of deduplication – ‘deduplication 2.0’.

In May I delivered a presentation to a number of EMC customers on the topic of Data Deduplication 2.0 – Comprehensive Capacity Optimization.  The point of my presentation was simple (and keep in mind this was before the Data Domain acquisition); there are a number of capacity optimization technologies/capabilities that are available to customers today.  Originally these deduplication technologies were used primarily for backup purposes but slowly, deduplication is making its way into primary storage. Deduplication in primary storage makes a lot of sense FOR DATA THAT IS STATIC.  Why only static data?  Static data is data that isn’t used frequently (doesn’t mean it’s not important, it just simply is not accessed often); because access to this data is infrequent, the performance requirements for this data is less than that of active data. Remember; nothing in IT is free.  If I deduplicate data, in order to use it, I must ‘rehydrate’ it and thus there is a performance implication so I want to be careful where I deduplicate data so as not to inhibit performance on production data.

Dr. Dedupe and Tom allude to Deduplication 2.0 moving beyond backup storage and into primary storage.  While deduplication in primary storage is technically possible, it is important that customers understand two important points:

1) Performance: whatever I do to deduplicate (I like optimize) capacity in order to save space, I must ‘undo’ in order to use the data.  If I set a policy that says any data that is 30 days old can be ‘optimized’, I need to be sure that data 30 days old is not active or I could pay a substantial performance penalty when using this data.  I may set a policy ‘any data that hasn’t be touched in 30 days, can be optimized.  I would just want to make sure that there is no scenario where at the end of a quarter let’s say, I would need to rehydrate all data in order to run some report.

2) Comprehensive and cumulative deduplication throughout my storage tiers.  What do I mean?  If I compress and single instance (deduplicate) data on my primary storage utilizing one set of deduplication technologies, say single instancing and compression algorithms, and then I backup this data using sub-file deduplication, a separate set of algorithms, then what I am left with are two separate sets of deduplicated data silos, and no one wins in this scenario.

It is important, no matter what deduplication technology you decide to use, that you can actually leverage the data stored in the deduplication device and that as data moves from device to device it doesn’t need to be rehydrated before it is moved.

A great use case of capacity optimization in primary storage is how EMC evolved the Celerra product this year.  Through a policy, let’s say any data that is older than 30 days, is compressed and stored as a single instance, with users seeing as much as 30% to 50% storage savings.

The real goal of Deduplication 2.0, and I think Dr. Dedupe alluded to this in his post “The Dedupe 2.0 Pundits Are Still Swimming in Lake 1.0” is that customers win when deduplication technology is a part of the core system or file system, when I no longer need to rehydrate data as I move it from primary storage to secondary storage.  If each storage device in the ’stack’ understands the language of the device in the stack ahead of it and the ‘deduplication’ or file system is coordinated and cumulative from device to device than the customer is the winner.  This pertains to primary storage, backup storage and archive storage.  Never having to rehydrate data allows for more efficiency and a reduced tax on devices that can save the end user money.

Tom Cook, CEO of Permabit points out in his blog post “Dedupe 1.0 vs. Dedupe 2.0: The debate ensues” that the only value to deduplication for primary storage is to move your data to a deduplicated archive which allows you to store data, efficiently, long term which I agree with, but as we have seen, not that practical.  Why? Because at the end of the day, the costs to manage storage are going up, up, up and the costs to buy storage are going down, down, down.  End users (NOT IT) are generally lazy or should I really say, just too busy to manage this storage.  In order to properly archive data, you need to have a policy that tells you what to move and when to move it.  IT can make all the recommendations in the world about the value of archive, but if users or really, lines of business managers don’t tell IT what data is important and what can be archived, then IT doesn’t really have a choice, which makes the premise of moving data to an archive, deduplicated or not – moot.

The real issue is balancing capacity optimization (to what granularity you deduplicate data) against performance on the appropriate tier of data, given that deduplication will happen on all tiers of storage.  The higher the performance requirements (tier 1) the less ‘optimized’ I make the data, the lower the performance requirements (tier x, archive) the more optimized I make the data.  The benefits to the customer are that I can A) optimize data, consistently among each of its devices, and B) it can be cumulative from device to device, removing silos of deduplicated data across the stack.

For more on tiered dedupe, read my Betamax Redux blog post on EMC’s vision for deduplication and hopefully this will put you on a high performance ‘Road to Recovery’.

Post to Twitter Tweet This Post

Scridb filter

A Data Protection Reference Architecture – The Final Chapter

September 1st, 2009 Steve Kenniston 2 comments

The Architecture

This ‘architecture’ diagram, as you can see, is not a typical architecture diagram, but hopefully it can be used to align your business and business objectives with the technologies that are available and can best be applied to solve your issues helping to balance, cost, complexity and compliance.

This diagram can also be used to do a couple of other things.  It can help you begin to classify your data and align your  data to your business objectives.  It also lets you begin to identify what data or data services in your environment that may be more important to you than others and based on this help you to choose areas you may want to outsource or move to the cloud.

As you can tell, there really is not one solution for meeting all your data protection needs.  The challenge comes with managing multiple solutions in an effort to meet your business objectives.  While there are only a few technologies available that allow you to manage your environment across all your RPOs and RTOs, it is important that I point out EMC’s NetWorker is able to do this, centralizing your data protection infrastructure  for ease of management.  It allows you to manage traditional backup, source based deduplicated backup with Avamar, CDP with RecoverPoint, as well as the EMC disk libraries and tape where the data is stored.  Now, I am not saying that NetWorker solves all of your data protection challenges, nor am I suggesting that replacing one traditional backup technology for another is the right answer, but what I am saying is that if you’re looking to have all the feature functionality required to meet all your business objectives and you want easier management, NetWorker is one avenue to get you there.  Additionally, the underlying image of the triangle represents data protection management.  Putting all the new technology in place is one thing, managing it, and ensuring you are now meeting your business needs is another.  EMC’s Data Protection Advisor can help here as well.

This diagram can help customers layout a new, better data protection schema for their environment and start thinking about data protection a bit more strategically versus tactically.  It can also help vendors speak to customers about how they should look at their environment in order to identify specific challenges and the means they need to alleviate these challenges , taking backup, beyond.

Post to Twitter Tweet This Post

Scridb filter

A Data Protection Reference Architecture – Part 2

August 20th, 2009 Steve Kenniston No comments

Archive

The most fundamental part of developing a good data protection architecture starts at the base of the triangle with Archive.  Archive is often an overlooked component of data protection – It’s not just for regulated business anymore.  Archive essentially gives users 100% data deduplication efficiency.  What I mean by this is that you have the ability to remove ‘stale’ data (and by ’stale’ I don’t mean unimportant data, I just mean data that is not accessed frequently) completely from your backup stream so you don’t continue to back it up.  Let’s face it; the two most important commodities in backup are time and capacity.  Both of these are interdependent of one another.  The more capacity you have, the longer it takes to backup and the more money it costs to store.  The longer it takes you to backup, the less likely you are to be meeting your business objectives.  Data capacities aren’t shrinking, they are growing.  According to the latest IDC data, capacity is growing at a staggering pace of 65% year over year and the digital pack rat in all of us is too afraid to get rid of anything,  compromising backup windows and hence the business.  By archiving data that hasn’t been touched in some period of time and removing it from the backup stream, you can relieve some of the pressure on your backups and possibly not have to make any significant changes to your backup infrastructure.

Also, you don’t have to backup to a special purpose device or appliance for archive.  You can archive data to any file system.  I would keep in mind however, that you want to archive to a platform that can keep costs low.  Remember this data is not unimportant, just not highly used.  Take into account your RTO and store the data on the most cost effective platform possible that also aligns to the business objectives.  This may be tape, it may be optical or it may be disk.  If it is disk, you want to store it on disk that is optimized for this type of data, optimized for capacity (deduplication, compression, single instancing), has low power and cooling costs, can replicate for availability and is highly reliable.  You will also want to make sure that it is integrated to some extent with an application that lets you find the data pretty quickly when you need it and put you further down the Road to Recovery.

In my next post we will talk about what I call the ‘fat middle’.  In this area most all of the data has a 24 hour RPO and is where traditional and next generation backup applications play.  There are many use cases for data protection in this area and RTOs tend to drive the medium to which data is backed up to (disk or tape).  Stay tuned for Part 3.

Post to Twitter Tweet This Post

Scridb filter

A Data Protection Reference Architecture – Part 1

August 14th, 2009 Steve Kenniston No comments

This blog will have multiple parts.  I will introduce my view of a data protection reference architecture and the next few blog posts will talk to components of that architecture.

The other day  I had a very interesting conversation with a colleague of mine in Australia.  He was looking for a data protection reference architecture that he could use to speak to his customer.  As you can imagine having this conversation over the phone could pose to be a difficult challenge.  When the conversation began, my fear was he was looking for an ‘architecture’ diagram that included data protection appliances, backup servers, disk libraries, tape libraries and backup agents.  I quickly realized that this is an impossible conversation to have with him without knowing:

A)     the customer’s environment or challenges

B)      the customer’s business objectives

I find that most vendors don’t know A or B when speaking to a customer about their data protection ‘issues’, but they really should.  Having a more thoughtful conversation with customers in a consultative fashion is more relevant to customers in understanding their challenges and helping to align these challenges to the best possible solution.

I started my conversation with the diagram shown below (Figure 1).  A simple triangle divided horizontally into 4 segments and the middle two segments divided vertically in half.  Each segment represents different business objectives within a company.  As you go around the triangle, you can see that there are different technologies and different methodologies for attacking data protection challenges, which is why there is no longer a “one size fits all” approach when it comes to protecting data today. Let’s face it; the two most important commodities in backup are time and capacity.  One of the primary drivers behind the type of protection that is used is the Recovery Point Objective or RPO.  Different technologies provide different RPOs and each has a different price point as well as there are different processes that can be applied to attach RPOs.

Figure 1

Figure 1

Having a conversation specific to this diagram can have a tremendous amount of value on a number of fronts, including; aligning technology needs with business objectives as well as highlighting critical pain points and beginning a roadmap that helps implement data protection technology based on business needs and budget and put you on the Road to Recovery.

The next post will cover the foundation of the triangle – Archive.

Post to Twitter Tweet This Post

Scridb filter

What Happened in Vegas, Stayed in Vegas

June 21st, 2009 Steve Kenniston No comments

Well, until now.  This is an interesting story about archiving and how it could have, but didn’t help a friend of mine.

Often, when speaking with customers, I talk to them about the 4 fundamental principals with regard to data protection:

  1. Assess
  2. Archive
  3. Backup
  4. Manage

The assessment phase is a multi-dimensional phase.  It’s about people, process and technology.  Like with most things, the technology piece is the easy piece.  EMC has tools that allow us to scan file systems, data bases and email systems that report back a litany of information including but not limited to:

  • Number of files
  • Age of files
  • Volume of data
  • Owner of the data

Once EMC passes the information to the customer about their data, the real hard work begins.  Armed with the information, IT now has to go and speak to line of business managers in order to determine the value of the data, and how data of a specific value needs to be managed and protected.  The problem is line of business managers want everything saved forever, until IT tells them what the bill would be.  IT begins to describe the different ‘classes’ of service capabilities and line of business managers, who don’t really care about the details (not because they don’t care, they are just too busy), finally say “Just give me the highest level of protection I can get for the least amount of money.”  IT now does the best they can to align their perceived value of the data, to the most appropriate backup and archive capabilities they have.

Now, in Vegas, I think we can all agree that the video surveillance has a ton of value to  the stake holders of the hotels and casinos.  The amount of debauchery that takes place in Vegas with the amount of money that is ‘rolling’ around Vegas, it is important to ‘know what is going on’ and to make sure all situations can be handled as efficiently as possible and this is where video surveillance comes into play and the more you ’save’ on high speed disk, the easier it is to get to the truth or solve the mystery.

The exception is that this data is not available for just any general purpose.  Case in point.  A good friend of mine, lets call him ‘Josh’ was running around Vegas one evening having a grand time.  He and some friends ran into a group of young ladies and had a great time seeing the sights of Vegas for the rest of the evening.  As the night was winding down and people were going back to their hotels, Josh, being a very nice guy decided to ensure his ‘date’ made it back to her hotel safely.  He rode with her in the cab and then walked her to her hotel room.  Now, if any of you have been to Vegas, you know that from the cab stand to the room can be a mile and you will take one of several elevators and walk down one of many corridors to a hotel door that looks exactly like the other 3500 in the building.

They young lady asked Josh in to talk and to say good night and as time went past, they talked all night until the fell asleep.  Josh, having to catch a flight the next afternoon, and not wanting to wake anyone decided to quietly leave early in the am.  Josh then took a cab back to his hotel and when he went to pay the cab driver, he realized that his wallet was gone.  After calling all the places they had been the night before, Josh was convinced that he had left / lost the wallet in hotel room of the young lady and decided to call her.  First problem.  He didn’t know the room number.  He didn’t even remember the floor she was on.  Josh went back to the hotel and started to go up and down the elevator and walk down the halls looking for anything that looked familiar so he could knock on the door and ask if he had lost his wallet in the room.  After  a few hours of walking the halls, he had his first great idea, instead of walk throughout the hotel, how about call every room?  As he started doing that, he realized he still had about 2500 more rooms to call and with his cell running out of juice and not wanting to be a spectacle in the lobby he had is second brilliant idea.  Lets ask the security department if he can have a look t the video surveillance to see if they can tell him which floor he went to the night before and what hallway he walked down so he could, perhaps,  more easily find his wallet.

Well, the security department was less than sympathetic to Josh’s request (I would bet they get this question a lot).  In fact, the security department would not even comment on the fact as to whether or not they even had video cameras covering the different areas of the hotel for ’security reasons’.  (Reminds me of a time when I worked at VERITAS and we sold some software to Bank of NY who told us to not divulge what they had purchased because they considered this piece of technology a competitive edge.)

Defeated, Josh left his name with the hotel, went back to his hotel.  It has been over 7 hours of searching and is now just moments before checkout and him having to go to the airport.

Just goes to show you, having the data, doesn’t always put you on the Road to Recovery.

(BTW: Josh got a call on the way to the airport, the hotel ‘found’ his wallet and would be mailing it to him.  What a relief.)

Post to Twitter Tweet This Post

Scridb filter
Categories: Archive, EMC Tags: , , ,

Don’t forget to Archive

April 2nd, 2009 Rob Emsley No comments

Hello, my name is Rob and I’ve been recovering for many years.  Recovering Data that is:-)

Before considering many of the new innovations to help improve backup I suggest that you look at implementing an archive first. This will reduce your primary storage usage dramatically and make backup easier.

At EMC we started archiving our employee e-mail at the start of 2007. Personally, this meant no more management of PST files. Management involved creating PST files on my notebook, manually moving e-mails and then performing my own backups to ensure that I always was able to recover. Basically, I was my own backup administrator.

Today EMC announced EMC SourceOne, a new family of products for archiving, e-discovery and compliance.

  • EMC SourceOne Email Management archives e-mail from Microsoft Exchange and IBM Lotus Notes/Domino as well as SMTP and instant messages to improve operational efficiency of messaging systems, reduce production, storage and backup costs and enhance message retrieval and system recoveries.
  • EMC SourceOne Discovery Manager provides high volume discovery search and collection for e-mail archived by the SourceOne Email Management. It can quickly find, safely hold, efficiently cull and defensibly produce archived e-mail in response to legal/regulatory notice and/or corporate policy complaint. Discovery Manager is built around a legal matter or case metaphor and supports secure authorized investigator access, defensible collection results and chain of custody.
  • EMC SourceOne Discovery Collector is an indexing appliance that automates the in-house identification, collection, preservation, and policy management of unstructured content that resides on data sources such as desktops, laptops, common Internet file systems (CIFS) and network file systems (NFS), networked attached storage, Microsoft Exchange, SharePoint and other content management repositories

I would describe EMC SourceOne Email Management as a 2nd generation e-mail archiving product, delivering an architecture capable of supporting even the most demanding requirements, especially as E-mail continues to be a critical application for most customers.  SourceOne components can be deployed on just a single server or distributed across multiple physical or virtual servers. To support the EMC user community the new product is being implemented on a VMware ESX infrastructure which will allow for easy configuration changes.

No more e-mail backups for me as all the messages I keep are either stored on our Exchange servers or  archived onto our EMC Centera storage.  One less thing for me to worry about.

Posted by Rob Emsley

Post to Twitter Tweet This Post

Scridb filter
Categories: Archive Tags: ,

Road to ‘Data’ Recovery – 12 Steps

March 25th, 2009 Steve Kenniston 2 comments

Hi, my name is Steve and I have a recovery problem.  Well, a data recovery problem that is.  So, I think it is about time that I apply the ‘12 steps’ to help me with my data recovery problem.

Step 1 – It is time that I admit that I am powerless over my backup environment and my data protection world is unmanageable.

Step 2 – I have come to believe that there is a Technology greater that I that can help me restore (my sanity).

Step 3 – I have made a decision to put our company’s data and the process of recovery into the hands of a true data protection specialist.

Step 4 – I have helped to create a classified inventory of our company’s data.

Step 5 – I will admit to our CEO that I have failed at 63% of my recovery attempts costing the business $MMs.

Step 6 – I am prepared to have the new data protection administrator remove all of my defective technologies.

Step 7 – I will humbly ask ‘him’ to remove all of my failed processes.

Step 8 – I must make a list of all the people I have been unable to recover data for and be willing to try to restore their lost information.

Step 9 – I must make amends to all the people I have been unable to recover data for.

Step 10 – I must continue to take an inventory of all the tapes we have and promptly convert them to a newer technology to enable faster recovery.

Step 11 – I will seek out best of bread technology, parnters and vendors to improve our company’s capabilities for daily operational recovery.

Step 12 – Having had this spiritual awakening as a result of these steps, I will carry this message to all IT administrators who are challenged with data recovery issues.

I believe that by following these 12 steps, I will have put our company back on… the Road to ‘Data’ Recovery.

Posted by Steve Kenniston

Post to Twitter Tweet This Post

Scridb filter

Road to Recovery

February 14th, 2009 Steve Kenniston No comments

Our domain, Backup & Beyond was the tagline for Avamar Technologies, a company EMC acquired in November of 2006.  This tagline was very fitting from a data protection standpoint because Avamar utilized a traditional client / server architecture to protect data but with a twist.  Avamar utilizes a more intelligent client side agent that provides source based, variable block deduplication to enable the most efficient backups available in the market for more than 80% of a data centers data.  Avamar also leverages this same technology to replicate this data between disk based backup targets there by dramatically reducing the reliance on tape.  This new technology, that has enabled new processes is taking backup beyond.

The title of our blog, Road to Recovery – well, like every good title it is a play on words and trust me, as with every title it took us a while to come up with it.  That said, the industry has been talking about the fact that backup is really about recovery.  The same can be said for other data protection tools.  This is why our goal is to talk about methodologies (technologies and processes) that help you to recover data.  When IT professionals are polled, they often say that data protection (backup) is still the number one issue they have in the data center.  We say it is time to stepup and admit it and start the ‘Road to Recovery’ when it comes to your data protection environment.

Let us know what your challengs are, we are here for you, your support system and we welcome you comments and questions.

Post to Twitter Tweet This Post

Scridb filter

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.