Archive

Posts Tagged ‘protection’

Data Protection Management from ‘Nice to Have’ to ‘Need to Have’

December 15th, 2009 Steve Kenniston No comments

Data protection management has come a long way in the past decade.  More importantly the features and functionality that are in products these days and what customers have come to expect are now no longer ‘nice to have’ feature in the data center, they are ‘need to have’ features.

Additionally, the term ‘data protection’ is morphing every day and has different meanings to different people.  Questions like ‘is replication data protection?’ or ‘is archive data protection?’ or ‘is DR / BC a function of protection?’ are now common in IT circles.  Each in their own right is a methodology for protecting information or has some play in the grand scheme of data protection.  The reality is, much like every answer in IT, the answer to these questions is ‘it depends’.  Data Protection has many different definitions, which start to expand the scope of what it actually is and more importantly, how it is managed cost effectively across the whole environment.

It is this expanding scope of data protection  where data protection management tools come into play, and the more flexible and granular the tool, the more effective.  It is hard to have good data protection capabilities without having insight to the environment.  First, understanding what type of data lives in the environment, where it is, how it is used and some characteristics about its age or its access frequency helps to determine how to best protect the information.  This is where a data protection management tool that provides some insight to the file system adds a great deal of value.

Next, if archive is a part of data protection (and I would argue that a functional archive, when used properly, is) then a data protection management tool that provides insight to the data in the archive can also help manage the overall protection process within the greater environment.  Knowing if the data in the archive is actually being accessed or if it can be deleted (unless stored for compliant purposes) can help to control archive costs.

If replication is a part of the overall data protection scheme, a data protection management tool that provides insight to this process can also add a great deal of value.  Identifying if links are up, if data is moving between sites and if the data is available, accessible and meets my recovery point objectives at the remote site can ease the concern of recoverability in the event of a disaster.

And finally, providing as much information as possible such as deduplication rates,  tape growth, disk growth (in disk based backup targets – including deduplication targets), as well as providing true analytics into the backup environment to help make decisions as to when to switch from a tape-based solution to a disk-based solutions.  These analytics need to be in-depth enough to show that if some data that is being protected with traditional backup technologies are moved to a next generation solution, such as source-based deduplication, then what affect will it have on the overall backup environment, will it help to better control costs, will it help to increase SLAs?

At a higher level, customers are telling me that they no longer want to manage backup, they just want it to work and they want proof it is working.  As customers move to a more virtualized IT infrastructure, they find that they are being forced to rearchitect their data protection environment and they are now looking to solutions that elevate the process.  IT is looking for tools to make their environment “data protection aware.” As virtual machines are added to the environment they are automatically protected and want notification if they are not so they can mitigate any risk, and let’s face it, backup is all about risk mitigation.  Backup is insurance.  Wouldn’t it be nice if your insurance company had deeper insight to all the cars / drivers in your family and told you when your teenager was speeding on a monthly basis and told you that your premiums are going to go up if they don’t start driving the speed limit before they got the ticket and your premiums increased?

Any tool that IT invests in for a common process, data protection in this case, needs to be flexible enough to allow IT to manage as much of the overall process from a single pain of glass.  Good data protection management tools need to provide IT as much visibility into the overall data protection environment as possible in order to help make good decisions about what data technologies should be invested in, in order to help IT meet its overall SLAs and hence business objectives.

There is no sense spending a great deal of money on rearchitecting a backup environment if there is no insight to the success of the new architecture.  Sooner or later, management needs to have the pretty graphs that prove to someone that the right decisions are being made when it comes to protecting information, or when it comes to how much is spent on data protection or if the SLAs can be met.  Not having good data protection management tool, and spending too much on new data protection architectures while not meeting your SLAs could lead to a RGE (resume generating event).  Data protection management tools today are a need to have, not a nice to have.  Make the investment and put your data protection environment back on the Road to Recovery.

Post to Twitter Tweet This Post

Scridb filter

Comprehensive Capacity Optimization – Deduplication 2.0

October 7th, 2009 Steve Kenniston No comments

Technology is great isn’t it?  When someone thinks they have a new idea on the same old technology foundation they call it “X 2.0″.  I have been watching the banter between analysts and vendors (specifically NTAP’s Dr. Dedupe and Permabit’s CEO Tom Cook) on the topic of Deduplication 2.0 and it is my belief that the proverbial boat is being missed (since we are using water analogies).  I have been watching these guys hash it out for the past few weeks and decided I have to jump in.  I find the real value to these conversations is the value to the end user.  At the end of the day, it doesn’t really matter who ‘coined’ or ‘invented’ a term (like deduplication 2.0) but what does matter is if  the term actually helps describe a technology and how that technology can be leveraged to make things better in the data center.  We should focus on the implications of this new generation of deduplication – ‘deduplication 2.0’.

In May I delivered a presentation to a number of EMC customers on the topic of Data Deduplication 2.0 – Comprehensive Capacity Optimization.  The point of my presentation was simple (and keep in mind this was before the Data Domain acquisition); there are a number of capacity optimization technologies/capabilities that are available to customers today.  Originally these deduplication technologies were used primarily for backup purposes but slowly, deduplication is making its way into primary storage. Deduplication in primary storage makes a lot of sense FOR DATA THAT IS STATIC.  Why only static data?  Static data is data that isn’t used frequently (doesn’t mean it’s not important, it just simply is not accessed often); because access to this data is infrequent, the performance requirements for this data is less than that of active data. Remember; nothing in IT is free.  If I deduplicate data, in order to use it, I must ‘rehydrate’ it and thus there is a performance implication so I want to be careful where I deduplicate data so as not to inhibit performance on production data.

Dr. Dedupe and Tom allude to Deduplication 2.0 moving beyond backup storage and into primary storage.  While deduplication in primary storage is technically possible, it is important that customers understand two important points:

1) Performance: whatever I do to deduplicate (I like optimize) capacity in order to save space, I must ‘undo’ in order to use the data.  If I set a policy that says any data that is 30 days old can be ‘optimized’, I need to be sure that data 30 days old is not active or I could pay a substantial performance penalty when using this data.  I may set a policy ‘any data that hasn’t be touched in 30 days, can be optimized.  I would just want to make sure that there is no scenario where at the end of a quarter let’s say, I would need to rehydrate all data in order to run some report.

2) Comprehensive and cumulative deduplication throughout my storage tiers.  What do I mean?  If I compress and single instance (deduplicate) data on my primary storage utilizing one set of deduplication technologies, say single instancing and compression algorithms, and then I backup this data using sub-file deduplication, a separate set of algorithms, then what I am left with are two separate sets of deduplicated data silos, and no one wins in this scenario.

It is important, no matter what deduplication technology you decide to use, that you can actually leverage the data stored in the deduplication device and that as data moves from device to device it doesn’t need to be rehydrated before it is moved.

A great use case of capacity optimization in primary storage is how EMC evolved the Celerra product this year.  Through a policy, let’s say any data that is older than 30 days, is compressed and stored as a single instance, with users seeing as much as 30% to 50% storage savings.

The real goal of Deduplication 2.0, and I think Dr. Dedupe alluded to this in his post “The Dedupe 2.0 Pundits Are Still Swimming in Lake 1.0” is that customers win when deduplication technology is a part of the core system or file system, when I no longer need to rehydrate data as I move it from primary storage to secondary storage.  If each storage device in the ’stack’ understands the language of the device in the stack ahead of it and the ‘deduplication’ or file system is coordinated and cumulative from device to device than the customer is the winner.  This pertains to primary storage, backup storage and archive storage.  Never having to rehydrate data allows for more efficiency and a reduced tax on devices that can save the end user money.

Tom Cook, CEO of Permabit points out in his blog post “Dedupe 1.0 vs. Dedupe 2.0: The debate ensues” that the only value to deduplication for primary storage is to move your data to a deduplicated archive which allows you to store data, efficiently, long term which I agree with, but as we have seen, not that practical.  Why? Because at the end of the day, the costs to manage storage are going up, up, up and the costs to buy storage are going down, down, down.  End users (NOT IT) are generally lazy or should I really say, just too busy to manage this storage.  In order to properly archive data, you need to have a policy that tells you what to move and when to move it.  IT can make all the recommendations in the world about the value of archive, but if users or really, lines of business managers don’t tell IT what data is important and what can be archived, then IT doesn’t really have a choice, which makes the premise of moving data to an archive, deduplicated or not – moot.

The real issue is balancing capacity optimization (to what granularity you deduplicate data) against performance on the appropriate tier of data, given that deduplication will happen on all tiers of storage.  The higher the performance requirements (tier 1) the less ‘optimized’ I make the data, the lower the performance requirements (tier x, archive) the more optimized I make the data.  The benefits to the customer are that I can A) optimize data, consistently among each of its devices, and B) it can be cumulative from device to device, removing silos of deduplicated data across the stack.

For more on tiered dedupe, read my Betamax Redux blog post on EMC’s vision for deduplication and hopefully this will put you on a high performance ‘Road to Recovery’.

Post to Twitter Tweet This Post

Scridb filter

The Side Effects of Backup on Server Virtualization

September 14th, 2009 Steve Kenniston 2 comments

Server virtualization has changed the IT landscape dramatically.  It has become a magic potion curing a number of ills in the physical server world such as low individual CPU utilization and excess use of space, power and cooling in the data center.  However, like all potions that cure what ails you, there can be side effects.  You need to be careful of what the Witch Doctor orders.

When I speak with customers who have aggressively implemented a virtual server infrastructure, 9 out of 10 will tell me that they underestimated the affect that virtualization would have on their backups and backup process and how backup might actually make virtualization less of the magic potion they had hoped, when not considered during the virtual server assessment and planning process.  So what is the issue?  Backup is a virtualization bottleneck, and without addressing it, you may not be able to obtain the server consolidation ratios you had been expecting which can have a negative effect on your virtual server TCO and ROI.

This is a timely discussion as VMworld has just concluded.  VMware users flocked to VMworld looking for best practices when it comes to implementing virtual server technology.  Because virtualization allows IT to reduce the overall physical hardware infrastructure, users will be looking at how to maximize their server consolidation ratios (get as many virtual servers on a physical server as they can and still provide good application performance).

I often hear that companies assess their environments by looking at the production applications on their physical server environment, identify their work loads and translating that into some consolidation ratio of physical servers to virtual servers.  I also hear, from these same customers, that backup was never taken into consideration during the assessment phase when trying to identify the best possible consolidation ratios.  These customers implement their new virtual server environments, install the backup agent they had previously been using for physical server backups and attempt to backup their virtual servers and they find that they would only be able to protect 50% to 60% of the new environment.  Why?

Let’s look at the physics.  Let’s say your virtualization ratio is 12 virtual servers to 1 physical server.  Ten physical servers backup with 12 NIC cards, 12 CPUs, 12 Memory ‘chunks’, etc… When you moved these 12 physical servers into the virtual world and put them on one physical server did you put 12 NIC cards in the new physical server?  Did you put 12 CPUs in the new server?  Do you have 12x the memory?  Chances are, probably not.  However the capacity didn’t change did it?  So how could one expect that the backup performance, which is I/O, memory and CPU intensive would operate well in a virtual world?

Diagram 1 below show how when you backup 12 servers, the resource drain on each server is roughly 25% (per system during a full backup).  When you virtualize these 12 servers onto one or two physical servers, your physical system utilization shoots up to 80%+.  This utilization can be so dramatic that it actually effects the number of virtual servers you can have on these systems which can ruin your virtual server TCO / ROI.

Figure 1

Figure 1

Simple math dictates, unless you have all the same resources on your new physical server as you did on all your physical servers before the consolidation, you won’t get the same backup performance.  I have spoken with customers who aimed to do a 25 to 1 virtual to physical server consolidation, who  were only actually able to get a 15 to 1 consolidation ratio in reality because their backup application couldn’t handle 25 virtual servers on one physical server, leaving some unprotected.

People could argue that if you properly schedule each virtual machine to backup in a window when all the other systems are not backing up, then perhaps you could get by with traditional backup.  The flip side is, IT has been telling me they don’t want to manage the backup process anymore than they have to.  So how do you ‘fix’ this problem?

The issue is that backup is a very intensive I/O application therefore there is only one way to fix the problem.  You need to reduce the amount of I/O generated and sent through the physical devices that house the virtual servers during backup.  Virtual servers were designed to provide a lot of benefits but high I/O capabilities is not one of them.  (This is okay, every technology implementation has its tradeoffs.  When the positives outweigh the negatives, especially in a substantial way, as they do with virtual servers, you usually have a paradigm shift, and this is what we are seeing with virtual servers.)

So how do you change the I/O pattern of backup?   You do so by decreasing the amount of data that is utilizing the shared resources during backup.  There are a couple of ways to do this.  One way is to leverage the storage array and snapshot the data.  Snapshots allow you to make copies of virtualized server data and mount this snapshot to a proxy host and off-load the backups from the physical server that house the virtual servers.  The downsides are:

1)      This becomes a new set of processes to manage unlike traditional backup processes

2)      You need extra storage capacity with this solution

3)      You will need to manage another physical server (proxy server)

4)      You will need more backup agents from your backup software provider

The most efficient way, however, is to take advantage of a new backup software application that leverages data reduction (data deduplication) on the client.  Your processes stay the same, there is no need for additional primary storage hardware and by leveraging a ‘smarter’ backup client, you will reduce the I/O tax on your physical server devices and thereby have the ability to maximize your TCO / ROI for your new virtual server environment.

Additionally, a number of these technologies have additional offerings that truly make them next generation.  Backup licensing is slowly moving to a capacity based license model.  One great feature of these new products is the fact that there is no charge for clients or agents.  This allows you to create a virtual server template with the backup agent embedded within it.  You no longer have to worry about proliferating backup clients and then paying for all those clients when it is time to ‘true up’ with your backup software vendor.  Data deduplication technologies also offer the ability to replicate the backup data efficiently to disk at a remote site so you can develop a more efficient disaster recovery plan that reduces the reliance on a tape and increases your overall operational efficiency.

Regardless of which path you choose, each requires IT to rethink their backup strategies when it comes to protecting virtual server environments.

I encourage you to do two things as you consider moving to a virtual server infrastructure:

1)      Make sure you are thinking about data protection when architecting your new virtual server environment

2)      Check out some of the new technologies and best practices offered by vendors for protecting virtual servers.

Hopefully this will help put your virtual server world back on the Road to Recovery!

Post to Twitter Tweet This Post

Scridb filter

A Data Proteciton Reference Architecture – Part 4

August 27th, 2009 Steve Kenniston No comments

Business Critical Applications

The tip of the triangle focuses on the applications (or data) that drives your business.  It is these applications within your business that, should they go down for any length of time, cost you money.  The recovery of this information, in the event of a ‘disaster’, needs to be very fast (RTO in minutes) and the data can’t be very ‘old’ when it is recovered (short RPO, less than 24 hours).   Typically,  the technologies that are used for these types of applications are replication (synchronous or asynchronous) or continuous data protection (CDP).  These technologies ensure that recovery at the alternate location  are instant (or near instant) and / or give users the ability to pick a point in time they want to recover to in order to ensure no data loss and the ability to bring up the applications as fast and accurately as possible.  This category, much like the rest of them, have the same disclaimer, ‘one size (product) does not fit all’.  Depending upon the value of the data in this tier, and the risk to the business if this data is unavailable drives the technology and spend in this part of the triangle.  Keep in mind, the right technology (Don’t choose CDP if you need an active remote file system) gives you the best recovery (RPO) for your business needs and can keep you on the Road to Recovery.

Post to Twitter Tweet This Post

Scridb filter

EMC World Kicks Off with Clouds and Virtualization

May 18th, 2009 Steve Kenniston No comments

EMC World kicked off this morning first with a presentation from yours truly on Data Deduplication 2.0 – Comprehensive Capacity Optimization.  We discussed how data deduplication 1.0 is morphing into all areas of EMC’s storage ecosystem in order to optimize capacity everywhere.  I talked about data deduplication as well as single instancing and compression are technology components that will help EMC achieve this goal.

Next Joe Tucci spoke in his keynote about how data deduplication as well as compression are key technologies for the data center of the future and how these technologies will aid in delivering a more efficient cloud computing strategy.  Not only will these technologies help in building out a cloud infrastructure, they will also help to protect a cloud infrastructure (which is what we are all about here).

Finally, Paul Maritz gave his keynote on how the virtual infrastructure will help to fulfill the goals of a private cloud.  He also discussed that it is time to invest in software and people and not hardware as VMware continues to drive value into their software to help make your data center, better, smarter, stronger and faster for less.

Each of these initiatives will have an impact on how data is stored and ultimately protected but new storage services will enable more efficient storage and protection across the virtual data center and the cloud and ultimately take backup beyond and put you on the road to recovery.

Stay tuned for more updates about the show.

Post to Twitter Tweet This Post

Scridb filter

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.