Archive

Archive for September, 2009

Bloody Backup and Archive

September 29th, 2009 Steve Kenniston No comments

Another great post from my colleague Mike Dutch

Many users believe that their backup tapes are their archive as well.    Additionally deduplicating storage systems are driving a similar notion that a backup platform and archive platform could be common.  Opinions definitely vary on this topic so I encourage all to comment.  Let’s take a deeper look…

The reason you “backup” a set of data is because you might need to recover the primary data if it becomes unavailable or corrupted.   If you want to access a data set as it existed at a particular point in time but couldn’t, you could replace the primary data with the backup copy.   (SNIA defines backup as … “A collection of data stored on (usually removable) non-volatile storage media for purposes of recovery in case the original copy of data is lost or becomes inaccessible; also called a backup copy.

The reason you “archive” a data set is because you want to preserve it.  It remains the primary data but because you’ll rarely access it, you want to put it somewhere safe just in case you ever want or need to access it again.  The SNIA Data Management Forum defines an archive as “a specialized repository (including the supporting processes, policies, hardware, and software) used to preserve information and data for the long-term.”  The capabilities of an archive “include the ability to preserve, protect, control, maintain authenticity and integrity accommodate physical and logical migration, and guarantee access to information and data objects over their required retention period.”

Regardless of whether archive should be used as a noun or a verb, the point is that the purpose and therefore the lifecycle of data in an archive repository differ from a backup copy. While few would disagree with this premise, I’d wager that most people believe this implies you must store and manage these copies separately.  You can, but you don’t have to if you’re using a data protection solution that fully supports your business processes.

Someday, the notion of data protection will be subsumed by the notion of data storage.  If we store data, why shouldn’t we expect to get it back when we want it?  Why shouldn’t we expect to resume an application from whatever point in time we want to?  If the system can’t do this, is it really protecting my data?  This leads us to the question of what data protection is.

The SNIA definition of data as “The digital representation of anything in any form” obscures its richness (sight, sound, touch, smell, taste). After all, shouldn’t analog information such as printed books be considered data?  Of course, a dictionary is not an encyclopedia and a definition should be succinct. I’ll read the SNIA definition as meaning, “Data is something that can be processed by a computer after any format transformations as necessary.”

Let’s posit that data protection means assurance that data is accessible to authorized users with acceptable performance in an auditable manner.  Sounds reasonable yet this definition exceeds the usual scope of data protection. Data protection is usually measured in terms of availability metrics, that is, in terms of RPO and RTO.  We also want assurance that data has not been altered or destroyed in an unauthorized manner (data integrity).  And of course, we don’t want our data to be available to anyone that should not have access to it, whether “leaked” over a network or by losing control of physical storage media.  Even if an authorized change was made, the user may change their mind and want to access an earlier version of the data. Also, what about poor performance?  I know I’ll find something else to do if application performance degrades to a point when I cannot remain productive.  Unacceptable performance equates to unavailability.  Auditable means the ability to verify who controlled what when (to comply with GRC initiatives and provide a chain of custody).

The traditional definitions of operational recovery and disaster recovery, distinguished by the impact of the outages (whether caused by operational errors, data corruption, or hardware failures), are subsumed by this accessible-performant-compliant definition of data protection.  Retention and long-term preservation of fixed content (and related metadata) within an archive repository also falls under the “ensure data is accessible” umbrella of our broad definition of data protection.  Regardless of whether performance and compliance capabilities are included in your definition of data protection, they remain requirements of conducting an effective and responsible business.

Let’s get back to the main idea of this post, namely, that while it is necessary to MANAGE backup and archive data separately, it is not necessary to STORE backup and archive data separately.  Storage systems with data deduplication capabilities are one proof point. An accessible-performant-compliant definition of data protection broadens the opportunities for both resource sharing and risk reduction.  Data protection is much more than backup and archive.  It’s about keeping your business fit by ensuring its lifeblood, its data, is clean and flowing freely.

Post to Twitter Tweet This Post

Scridb filter
Categories: Backup Tags:

The Side Effects of Backup on Server Virtualization

September 14th, 2009 Steve Kenniston 2 comments

Server virtualization has changed the IT landscape dramatically.  It has become a magic potion curing a number of ills in the physical server world such as low individual CPU utilization and excess use of space, power and cooling in the data center.  However, like all potions that cure what ails you, there can be side effects.  You need to be careful of what the Witch Doctor orders.

When I speak with customers who have aggressively implemented a virtual server infrastructure, 9 out of 10 will tell me that they underestimated the affect that virtualization would have on their backups and backup process and how backup might actually make virtualization less of the magic potion they had hoped, when not considered during the virtual server assessment and planning process.  So what is the issue?  Backup is a virtualization bottleneck, and without addressing it, you may not be able to obtain the server consolidation ratios you had been expecting which can have a negative effect on your virtual server TCO and ROI.

This is a timely discussion as VMworld has just concluded.  VMware users flocked to VMworld looking for best practices when it comes to implementing virtual server technology.  Because virtualization allows IT to reduce the overall physical hardware infrastructure, users will be looking at how to maximize their server consolidation ratios (get as many virtual servers on a physical server as they can and still provide good application performance).

I often hear that companies assess their environments by looking at the production applications on their physical server environment, identify their work loads and translating that into some consolidation ratio of physical servers to virtual servers.  I also hear, from these same customers, that backup was never taken into consideration during the assessment phase when trying to identify the best possible consolidation ratios.  These customers implement their new virtual server environments, install the backup agent they had previously been using for physical server backups and attempt to backup their virtual servers and they find that they would only be able to protect 50% to 60% of the new environment.  Why?

Let’s look at the physics.  Let’s say your virtualization ratio is 12 virtual servers to 1 physical server.  Ten physical servers backup with 12 NIC cards, 12 CPUs, 12 Memory ‘chunks’, etc… When you moved these 12 physical servers into the virtual world and put them on one physical server did you put 12 NIC cards in the new physical server?  Did you put 12 CPUs in the new server?  Do you have 12x the memory?  Chances are, probably not.  However the capacity didn’t change did it?  So how could one expect that the backup performance, which is I/O, memory and CPU intensive would operate well in a virtual world?

Diagram 1 below show how when you backup 12 servers, the resource drain on each server is roughly 25% (per system during a full backup).  When you virtualize these 12 servers onto one or two physical servers, your physical system utilization shoots up to 80%+.  This utilization can be so dramatic that it actually effects the number of virtual servers you can have on these systems which can ruin your virtual server TCO / ROI.

Figure 1

Figure 1

Simple math dictates, unless you have all the same resources on your new physical server as you did on all your physical servers before the consolidation, you won’t get the same backup performance.  I have spoken with customers who aimed to do a 25 to 1 virtual to physical server consolidation, who  were only actually able to get a 15 to 1 consolidation ratio in reality because their backup application couldn’t handle 25 virtual servers on one physical server, leaving some unprotected.

People could argue that if you properly schedule each virtual machine to backup in a window when all the other systems are not backing up, then perhaps you could get by with traditional backup.  The flip side is, IT has been telling me they don’t want to manage the backup process anymore than they have to.  So how do you ‘fix’ this problem?

The issue is that backup is a very intensive I/O application therefore there is only one way to fix the problem.  You need to reduce the amount of I/O generated and sent through the physical devices that house the virtual servers during backup.  Virtual servers were designed to provide a lot of benefits but high I/O capabilities is not one of them.  (This is okay, every technology implementation has its tradeoffs.  When the positives outweigh the negatives, especially in a substantial way, as they do with virtual servers, you usually have a paradigm shift, and this is what we are seeing with virtual servers.)

So how do you change the I/O pattern of backup?   You do so by decreasing the amount of data that is utilizing the shared resources during backup.  There are a couple of ways to do this.  One way is to leverage the storage array and snapshot the data.  Snapshots allow you to make copies of virtualized server data and mount this snapshot to a proxy host and off-load the backups from the physical server that house the virtual servers.  The downsides are:

1)      This becomes a new set of processes to manage unlike traditional backup processes

2)      You need extra storage capacity with this solution

3)      You will need to manage another physical server (proxy server)

4)      You will need more backup agents from your backup software provider

The most efficient way, however, is to take advantage of a new backup software application that leverages data reduction (data deduplication) on the client.  Your processes stay the same, there is no need for additional primary storage hardware and by leveraging a ‘smarter’ backup client, you will reduce the I/O tax on your physical server devices and thereby have the ability to maximize your TCO / ROI for your new virtual server environment.

Additionally, a number of these technologies have additional offerings that truly make them next generation.  Backup licensing is slowly moving to a capacity based license model.  One great feature of these new products is the fact that there is no charge for clients or agents.  This allows you to create a virtual server template with the backup agent embedded within it.  You no longer have to worry about proliferating backup clients and then paying for all those clients when it is time to ‘true up’ with your backup software vendor.  Data deduplication technologies also offer the ability to replicate the backup data efficiently to disk at a remote site so you can develop a more efficient disaster recovery plan that reduces the reliance on a tape and increases your overall operational efficiency.

Regardless of which path you choose, each requires IT to rethink their backup strategies when it comes to protecting virtual server environments.

I encourage you to do two things as you consider moving to a virtual server infrastructure:

1)      Make sure you are thinking about data protection when architecting your new virtual server environment

2)      Check out some of the new technologies and best practices offered by vendors for protecting virtual servers.

Hopefully this will help put your virtual server world back on the Road to Recovery!

Post to Twitter Tweet This Post

Scridb filter

A Data Protection Reference Architecture – The Final Chapter

September 1st, 2009 Steve Kenniston 2 comments

The Architecture

This ‘architecture’ diagram, as you can see, is not a typical architecture diagram, but hopefully it can be used to align your business and business objectives with the technologies that are available and can best be applied to solve your issues helping to balance, cost, complexity and compliance.

This diagram can also be used to do a couple of other things.  It can help you begin to classify your data and align your  data to your business objectives.  It also lets you begin to identify what data or data services in your environment that may be more important to you than others and based on this help you to choose areas you may want to outsource or move to the cloud.

As you can tell, there really is not one solution for meeting all your data protection needs.  The challenge comes with managing multiple solutions in an effort to meet your business objectives.  While there are only a few technologies available that allow you to manage your environment across all your RPOs and RTOs, it is important that I point out EMC’s NetWorker is able to do this, centralizing your data protection infrastructure  for ease of management.  It allows you to manage traditional backup, source based deduplicated backup with Avamar, CDP with RecoverPoint, as well as the EMC disk libraries and tape where the data is stored.  Now, I am not saying that NetWorker solves all of your data protection challenges, nor am I suggesting that replacing one traditional backup technology for another is the right answer, but what I am saying is that if you’re looking to have all the feature functionality required to meet all your business objectives and you want easier management, NetWorker is one avenue to get you there.  Additionally, the underlying image of the triangle represents data protection management.  Putting all the new technology in place is one thing, managing it, and ensuring you are now meeting your business needs is another.  EMC’s Data Protection Advisor can help here as well.

This diagram can help customers layout a new, better data protection schema for their environment and start thinking about data protection a bit more strategically versus tactically.  It can also help vendors speak to customers about how they should look at their environment in order to identify specific challenges and the means they need to alleviate these challenges , taking backup, beyond.

Post to Twitter Tweet This Post

Scridb filter

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.