Archive

Posts Tagged ‘Data Protection’

How Much Backup Capacity Does Deduplication Really Save?

November 30th, 2009 Steve Kenniston No comments

There is a lot of discussion around data deduplication for backup these days.  (I wish I could deduplicate all the turkey I ate last week.)  In fact, Gartner claims that “…by 2012, deduplication will be applied to 75% of backups.”  And when asked “Why?” the response was “…deduplication is too compelling to ignore.”  But I say “prove it”.  So I put together some backup capacity numbers for storing data on tape (non-compressed and compressed) versus storing data, deduplicated (fixed block and variable block), on disk and the numbers show a dramatic savings in backup space which translates into cost savings.

The Parameters

As with any ‘analysis’ numbers can be ‘spun’ to make them say what you want.  That said, I tried to be as straight forward as possible, so let me also show my methodology so you can see how my numbers were derived.

  • I charted the amount of capacity created using a retention policy of:
    • 14 Dailies
    • 4 Weeklies
    • 12 Monthlies
  • I selected 10TB of primary storage capacity
  • I did this for file system backups only
  • I charted the data for 30%, 40%, 50% and 60% primary storage growth rates
  • I charted traditional tape based backup (non-compressed)
  • I charted traditional tape based backup (compressed, 2:1)
  • I charted fixed block disk based deduplicated backup
  • I charted variable block disk based deduplicated backup (3 to 5 times more efficient than fixed block deduplication)

The Effect

The first thing to think about is the sheer number of full backup copies that must be maintained when utilizing the above retention schedule.  The above retention policy leads to 17.2 copies of the primary storage (12 yearly’s + 4 monthlies + the equivalent of 1.2 with dailies = 17.2 copies) .  Translation: one terabyte of primary storage becomes 17.2 terabytes of tape storage.  This means, backup administrators need to pay for the physical tapes as well as the offsite transport and storage costs.  Now 17.2 terabytes of tape doesn’t sound like much but keep in mind that is for 1TB of primary capacity.  Ten TB of primary capacity yields 172 TB of tape capacity.  Now add in year over year storage growth.  At 30% primary storage growth, the backup storage growth grows 23%, at 40% primary storage growth, the backup storage growth grows 29%, at 50% primary storage growth, the backup storage growth grows 33% and at 60% primary storage growth and the backup storage grows 38%.

Figure 1 below shows, 10 TB of primary capacity growing at 30%, 40%, 50% and 60% along the x-axis respectively and the corresponding capacity of tape or disk consumed along the y-axis is.

Figure 1

The graph shows that compressed backup to tape obviously yields a 50% capacity improvement over non-compressed tape as one would expect. It also reflects that fixed block deduplicated disk capacity is only about 48% more efficient than uncompressed tape storage yet variable block deduplication is 81% more storage efficient than uncompressed tape storage.

Interesting as well, the chart reveals that fixed block deduplication is 3% less efficient than compressed tape whereas variable block deduplication is 62% more efficient than compressed tape. Typically, with the same data change rates, and equivalent data sets, variable block deduplication is 3 to 5 times more efficient than fixed block deduplication.

The moral of the story – if you’re going to do deduplication, variable block is the way to go. From a cost perspective, there is essentially no difference in the $/TB price however there is much more value in the long run with variable block deduplication. Vendors typically charge a $/TB price for their deduplication solutions. The difference between fixed and variable block deduplication comes down to the capacity of data that is stored in the backups which directly translates into costs. If you take a look at Figure 2, over time, starting with 1TB of primary capacity growing at 25% over the course of one year, IT will need almost 2TB of backup capacity with fixed block deduplication versus less than 1TB of capacity using variable block deduplication (assumes fixed block is 5x less efficient from imperial data that has been collected in the field.). The most important part of this graph is the slope of the blue and red lines. The greater the degree of slope (red line), the more frequently IT will need to purchase capacity to protect the given data set as well as need to pay for licensing as it pertains to deduplication software. IT wants the smaller slope.

Figure 2

*Note: Some companies will position their fixed block technologies as variable block by stating that you (the user) has the ability to set the block size to what ever you want, however, once set, it stays that way for all of your data.  The difference is, true variable technologies adjust the block size on the fly using their algorithms to ensure maximum efficiency with no management.

Bang for the Buck

The most important benefit, as with most things in IT however is overall cost savings. Deduplicated disk solutions are anywhere from 2.5X to 3X more expensive than tape, however with the overall capacity savings, there can be significant cost savings. Figure 3 is representative of the overall costs of new deduplicating disk systems and traditional tape backup systems (including tapes and off-site storage costs). I will caveat this by saying every TCO and ROI has a ton of ‘what ifs’ that factor into overall costs including things like FTE for backup engineers and long term retention costs, but for the most part, disk systems reduce a good deal of these costs (with the exception of power and cooling) and increase the reliability, security and performance of backups and recoveries.

Figure 3

1 The chart above is based on a rough cost of $8,000 per terabyte of tape backup system costs (including media and off-site storage) and rough cost of $20,000 per terabyte of deduplicated disk backup system costs for the period of one year.  Prices will vary depending upon your configuration and these estimates do not include space, power, cooling or human costs.

As I stated above there are only a few factors that are involved in this very raw calculation.  There are a number of other factors involved with a backup process including WAN costs (if replacing tape with disk), remote office facilities, installation (professional services), and software and hardware maintenance to name a few.  But no matter how you look at it, disk based backup with variable block deduplication wins over tape.

Backing data up to deduplicated disk not only saves the amount of backup capacity that is used, it also has other implications for a data protection environment.  First, backing up to disk versus backing up to tape helps to reduce the reliance on tape and the inherent limitations, security concerns and reliability issues surrounding tape.  Recovery of data from disk reduces the operational costs and decreases the recovery time objective.  Additionally the reliability of disk with RAID is much higher than the reliability of tape.

New data protection technologies are evolving backup to a degree where the entire data protection process is getting easier manage by removing multiple points of management (backup servers, media servers, tape libraries and physical tape).  As backup continues to evolve, this can help simplify the overall process and;

  • Increase reliability of backups
  • Reliability of recoveries
  • Decrease backup times
  • Decrease the time to recover data

The Bottom Line

New challenges in protecting information are arising every day, whether it is data growth, remote office data protection or virtualization, backup is getting harder not easier.  Data deduplication is providing backup administrators with tremendous benefits around backup processes and cost savings.  It is important to keep in mind that everybody’s environment is different and utilizes different methods and processes for managing and protecting information.  It is also important to take a look at your data protection environment today and understand the use cases where it is time to make new investments.  I encourage you to look at new technologies to help you with emerging challenges and weigh the overall solution including costs as well as benefits of disk based recovery.  New backup technologies that leverage data deduplication can save IT a lot of money and put you on back on the Road to Recovery.

Post to Twitter Tweet This Post

Scridb filter

Enterprise Data Protection at the Edge

November 19th, 2009 Steve Kenniston 2 comments

What does that really mean?  When I worked for Veritas, back in 1998 we acquired a company based out of Canada called TeleBackup that backed up desktop / laptops.  In 1999 Veritas acquired Seagate and the Backup Exec product which also had a desktop / laptop option.  These products were meant to eventually be integrated into the main backup applications but never were.  Additionally, a lot of that software was given away (hard to make a business on that) and for the most part,  lived on a shelf somewhere and was never installed.

In 2004 I worked for Connected Corporate (acquired by Iron Mountain), who’s sole business was desktop / laptop backup.  (In fact, from 2000 to 2004 I worked as an analyst for ESG covering all the vendors in the backup space and used the Connected product to backup my work laptop – and it actually saved my hide once.)  While the company executed a successful exit, the business was (and probably still is) only about a $20M to $40M business.

Why do I bring this up?  There is a new reality in IT these days.  I have said it before, IT is accountable for 100% of the data created in any company, including that stored on desktop/laptops.  This means that not only do they have to provide a location to store this data but IT also needs to provide tools to protect this information and ensure that this information is highly recoverable for both business productivity purposes as well as corporate and legal governance.   This means that desktop / laptop backup is now gaining a lot more visibility in the enterprise.

However, desktop / laptop data protection is one of those areas in IT that is just a nuisance because it seems like it should be an easy problem to solve, but there are so many moving parts to it that it ends up falling by the wayside.

A successful desktop / laptop backup technology needs three very specific capabilities:

  • Integrate seamlessly with the existing backup solution in the enterprise
  • Share a common, deduplicated, back end repository
  • Have a very SIMPLE and robust end-user interface to allow for end-user restores

The desktop / laptop solutions I discussed above did not, and do not, have these capabilities.  Even though these technologies come from reputable companies, not having these three capabilities is what has led to their very low adoption.

These three capabilities are all inter-related.  First IT needs an integrated solution because they do not want to have yet another piece of software in their environment that they have to manage, especially data protection software.  The fundamentals of backup are pretty simple.  Install an agent on the machine you want to protect, go to the management interface of the backup application and set up a few simple rules or policies (backup this system, at this time, to this device, catalog it and finally, keep the data for ‘x’ number of days, weeks, etc..) and start protecting your data.

One challenge is that most backup products don’t have an agent that is lightweight enough to run as a client on a desktop or laptop.  This causes incredible performance degradation of the system during backups, and let’s face it, if you have a laptop, 9 times out of 10 you’re going to be working on it when the backup kicks off so you will end up shutting it down which leaves you with unprotected data.  Client side data reduction techniques help to reduce this problem.  By moving less data, they run for shorter periods of time so there is little to no end user impact.

Next, if you did have an agent that worked well enough to backup all the desktop / laptop systems, then it would impede the backups of the other mission critical systems in the environment by utilizing all of the resources on the devices where the data is being backed up too.  (Take a look at Architecting for Recovery for more info.)  This means that IT would have to set up additional, separate devices to protect one subset of systems leaving them with more devices to manage and making it a hassle to implement.  (This is one reason why ‘cloud’ like solutions have become popular, providing less things to manage, however not every company wants their data outside of their control.)

Also, if you look at the nature of data on desktops and laptops, they share a ton of common data.  Why would any IT person want to backup that much data over and over again?  Traditional desktop / laptop solutions don’t provide robust capabilities for reducing the amount of redundant data that needs to be protected which also translates into longer backup times and more ‘storage’ utilization (making it more costly).  Deduplication allows you to implement a common repository.

Finally, the tools for end user recoverability need to be very robust.  The last thing IT has time for is an increased call volume to perform data recovery for end users.  This also means that data needs to be stored on disk because end users aren’t going to load tapes to recover data which also means that data needs to be stored on disk in the most efficient manner possible to save on costs.

There are a number of other nice-to-have features, but the lack of the three capabilities outlined above have has limited the adoption of desktop / laptop backups. Until today there hasn’t been a good solution that met these criteria.

This week EMC | Avamar launched a desktop / laptop backup component as part of their enterprise solution.  The difference between traditional desktop / laptop solutions and the Avamar solution is that the Avamar solution is 100% integrated as a part of its enterprise backup application, storing data on disk with a high degree of efficiency leveraging single instancing and deduplication.  Additionally, clients are free and they all share a common backend repository with the enterprise backup application that is protecting other common data in the enterprise.  Finally, end-users are able to perform their own restores.  What does all this mean?  Simplicity and low cost.

The Avamar backup technology provides enormous economies of scale when extending from the enterprise to the desktop / laptop.  By backing up to a single common repository utilizing global single instancing and deduplication you NEVER backup the same data twice, no matter where the data lives.

Think about this scenario – a user creates some document, say a PowerPoint presentation.  This presentation ends up being emailed to a number of people in the company and then saved on the desktop as well as in a number of file shares (home directories) on the NAS system.  This one 1MB presentation can represent 120MB of backup disk capacity.

Now if you utilize Avamar, the process would be, first the enterprise application would backup the NAS box and may see the file 20 times.  Avamar would single instance and deduplicate it such that it only one instance is backed up.  Next the desktops start their backup process and see that the Avamar Data Store has already protected this data so again, it doesn’t need to move or store any additional data.  A pointer is created to let the data store know that the desktop / laptop also has the ability to recover this same file.  This provides tremendous scalability.  This essentially means protecting all your desktops / laptops for free.

The technology is easy to manage (same client, same simple management tools), it provides a simple to navigate end user interface for self restores, and provides an integrated, single instance, deduplicated backend.

Seems like a triple play from the Avamar product and is helping to put IT back on the Road to Recovery.

Post to Twitter Tweet This Post

Scridb filter

Comprehensive Capacity Optimization – Deduplication 2.0

October 7th, 2009 Steve Kenniston No comments

Technology is great isn’t it?  When someone thinks they have a new idea on the same old technology foundation they call it “X 2.0″.  I have been watching the banter between analysts and vendors (specifically NTAP’s Dr. Dedupe and Permabit’s CEO Tom Cook) on the topic of Deduplication 2.0 and it is my belief that the proverbial boat is being missed (since we are using water analogies).  I have been watching these guys hash it out for the past few weeks and decided I have to jump in.  I find the real value to these conversations is the value to the end user.  At the end of the day, it doesn’t really matter who ‘coined’ or ‘invented’ a term (like deduplication 2.0) but what does matter is if  the term actually helps describe a technology and how that technology can be leveraged to make things better in the data center.  We should focus on the implications of this new generation of deduplication – ‘deduplication 2.0’.

In May I delivered a presentation to a number of EMC customers on the topic of Data Deduplication 2.0 – Comprehensive Capacity Optimization.  The point of my presentation was simple (and keep in mind this was before the Data Domain acquisition); there are a number of capacity optimization technologies/capabilities that are available to customers today.  Originally these deduplication technologies were used primarily for backup purposes but slowly, deduplication is making its way into primary storage. Deduplication in primary storage makes a lot of sense FOR DATA THAT IS STATIC.  Why only static data?  Static data is data that isn’t used frequently (doesn’t mean it’s not important, it just simply is not accessed often); because access to this data is infrequent, the performance requirements for this data is less than that of active data. Remember; nothing in IT is free.  If I deduplicate data, in order to use it, I must ‘rehydrate’ it and thus there is a performance implication so I want to be careful where I deduplicate data so as not to inhibit performance on production data.

Dr. Dedupe and Tom allude to Deduplication 2.0 moving beyond backup storage and into primary storage.  While deduplication in primary storage is technically possible, it is important that customers understand two important points:

1) Performance: whatever I do to deduplicate (I like optimize) capacity in order to save space, I must ‘undo’ in order to use the data.  If I set a policy that says any data that is 30 days old can be ‘optimized’, I need to be sure that data 30 days old is not active or I could pay a substantial performance penalty when using this data.  I may set a policy ‘any data that hasn’t be touched in 30 days, can be optimized.  I would just want to make sure that there is no scenario where at the end of a quarter let’s say, I would need to rehydrate all data in order to run some report.

2) Comprehensive and cumulative deduplication throughout my storage tiers.  What do I mean?  If I compress and single instance (deduplicate) data on my primary storage utilizing one set of deduplication technologies, say single instancing and compression algorithms, and then I backup this data using sub-file deduplication, a separate set of algorithms, then what I am left with are two separate sets of deduplicated data silos, and no one wins in this scenario.

It is important, no matter what deduplication technology you decide to use, that you can actually leverage the data stored in the deduplication device and that as data moves from device to device it doesn’t need to be rehydrated before it is moved.

A great use case of capacity optimization in primary storage is how EMC evolved the Celerra product this year.  Through a policy, let’s say any data that is older than 30 days, is compressed and stored as a single instance, with users seeing as much as 30% to 50% storage savings.

The real goal of Deduplication 2.0, and I think Dr. Dedupe alluded to this in his post “The Dedupe 2.0 Pundits Are Still Swimming in Lake 1.0” is that customers win when deduplication technology is a part of the core system or file system, when I no longer need to rehydrate data as I move it from primary storage to secondary storage.  If each storage device in the ’stack’ understands the language of the device in the stack ahead of it and the ‘deduplication’ or file system is coordinated and cumulative from device to device than the customer is the winner.  This pertains to primary storage, backup storage and archive storage.  Never having to rehydrate data allows for more efficiency and a reduced tax on devices that can save the end user money.

Tom Cook, CEO of Permabit points out in his blog post “Dedupe 1.0 vs. Dedupe 2.0: The debate ensues” that the only value to deduplication for primary storage is to move your data to a deduplicated archive which allows you to store data, efficiently, long term which I agree with, but as we have seen, not that practical.  Why? Because at the end of the day, the costs to manage storage are going up, up, up and the costs to buy storage are going down, down, down.  End users (NOT IT) are generally lazy or should I really say, just too busy to manage this storage.  In order to properly archive data, you need to have a policy that tells you what to move and when to move it.  IT can make all the recommendations in the world about the value of archive, but if users or really, lines of business managers don’t tell IT what data is important and what can be archived, then IT doesn’t really have a choice, which makes the premise of moving data to an archive, deduplicated or not – moot.

The real issue is balancing capacity optimization (to what granularity you deduplicate data) against performance on the appropriate tier of data, given that deduplication will happen on all tiers of storage.  The higher the performance requirements (tier 1) the less ‘optimized’ I make the data, the lower the performance requirements (tier x, archive) the more optimized I make the data.  The benefits to the customer are that I can A) optimize data, consistently among each of its devices, and B) it can be cumulative from device to device, removing silos of deduplicated data across the stack.

For more on tiered dedupe, read my Betamax Redux blog post on EMC’s vision for deduplication and hopefully this will put you on a high performance ‘Road to Recovery’.

Post to Twitter Tweet This Post

Scridb filter

The Side Effects of Backup on Server Virtualization

September 14th, 2009 Steve Kenniston 2 comments

Server virtualization has changed the IT landscape dramatically.  It has become a magic potion curing a number of ills in the physical server world such as low individual CPU utilization and excess use of space, power and cooling in the data center.  However, like all potions that cure what ails you, there can be side effects.  You need to be careful of what the Witch Doctor orders.

When I speak with customers who have aggressively implemented a virtual server infrastructure, 9 out of 10 will tell me that they underestimated the affect that virtualization would have on their backups and backup process and how backup might actually make virtualization less of the magic potion they had hoped, when not considered during the virtual server assessment and planning process.  So what is the issue?  Backup is a virtualization bottleneck, and without addressing it, you may not be able to obtain the server consolidation ratios you had been expecting which can have a negative effect on your virtual server TCO and ROI.

This is a timely discussion as VMworld has just concluded.  VMware users flocked to VMworld looking for best practices when it comes to implementing virtual server technology.  Because virtualization allows IT to reduce the overall physical hardware infrastructure, users will be looking at how to maximize their server consolidation ratios (get as many virtual servers on a physical server as they can and still provide good application performance).

I often hear that companies assess their environments by looking at the production applications on their physical server environment, identify their work loads and translating that into some consolidation ratio of physical servers to virtual servers.  I also hear, from these same customers, that backup was never taken into consideration during the assessment phase when trying to identify the best possible consolidation ratios.  These customers implement their new virtual server environments, install the backup agent they had previously been using for physical server backups and attempt to backup their virtual servers and they find that they would only be able to protect 50% to 60% of the new environment.  Why?

Let’s look at the physics.  Let’s say your virtualization ratio is 12 virtual servers to 1 physical server.  Ten physical servers backup with 12 NIC cards, 12 CPUs, 12 Memory ‘chunks’, etc… When you moved these 12 physical servers into the virtual world and put them on one physical server did you put 12 NIC cards in the new physical server?  Did you put 12 CPUs in the new server?  Do you have 12x the memory?  Chances are, probably not.  However the capacity didn’t change did it?  So how could one expect that the backup performance, which is I/O, memory and CPU intensive would operate well in a virtual world?

Diagram 1 below show how when you backup 12 servers, the resource drain on each server is roughly 25% (per system during a full backup).  When you virtualize these 12 servers onto one or two physical servers, your physical system utilization shoots up to 80%+.  This utilization can be so dramatic that it actually effects the number of virtual servers you can have on these systems which can ruin your virtual server TCO / ROI.

Figure 1

Figure 1

Simple math dictates, unless you have all the same resources on your new physical server as you did on all your physical servers before the consolidation, you won’t get the same backup performance.  I have spoken with customers who aimed to do a 25 to 1 virtual to physical server consolidation, who  were only actually able to get a 15 to 1 consolidation ratio in reality because their backup application couldn’t handle 25 virtual servers on one physical server, leaving some unprotected.

People could argue that if you properly schedule each virtual machine to backup in a window when all the other systems are not backing up, then perhaps you could get by with traditional backup.  The flip side is, IT has been telling me they don’t want to manage the backup process anymore than they have to.  So how do you ‘fix’ this problem?

The issue is that backup is a very intensive I/O application therefore there is only one way to fix the problem.  You need to reduce the amount of I/O generated and sent through the physical devices that house the virtual servers during backup.  Virtual servers were designed to provide a lot of benefits but high I/O capabilities is not one of them.  (This is okay, every technology implementation has its tradeoffs.  When the positives outweigh the negatives, especially in a substantial way, as they do with virtual servers, you usually have a paradigm shift, and this is what we are seeing with virtual servers.)

So how do you change the I/O pattern of backup?   You do so by decreasing the amount of data that is utilizing the shared resources during backup.  There are a couple of ways to do this.  One way is to leverage the storage array and snapshot the data.  Snapshots allow you to make copies of virtualized server data and mount this snapshot to a proxy host and off-load the backups from the physical server that house the virtual servers.  The downsides are:

1)      This becomes a new set of processes to manage unlike traditional backup processes

2)      You need extra storage capacity with this solution

3)      You will need to manage another physical server (proxy server)

4)      You will need more backup agents from your backup software provider

The most efficient way, however, is to take advantage of a new backup software application that leverages data reduction (data deduplication) on the client.  Your processes stay the same, there is no need for additional primary storage hardware and by leveraging a ‘smarter’ backup client, you will reduce the I/O tax on your physical server devices and thereby have the ability to maximize your TCO / ROI for your new virtual server environment.

Additionally, a number of these technologies have additional offerings that truly make them next generation.  Backup licensing is slowly moving to a capacity based license model.  One great feature of these new products is the fact that there is no charge for clients or agents.  This allows you to create a virtual server template with the backup agent embedded within it.  You no longer have to worry about proliferating backup clients and then paying for all those clients when it is time to ‘true up’ with your backup software vendor.  Data deduplication technologies also offer the ability to replicate the backup data efficiently to disk at a remote site so you can develop a more efficient disaster recovery plan that reduces the reliance on a tape and increases your overall operational efficiency.

Regardless of which path you choose, each requires IT to rethink their backup strategies when it comes to protecting virtual server environments.

I encourage you to do two things as you consider moving to a virtual server infrastructure:

1)      Make sure you are thinking about data protection when architecting your new virtual server environment

2)      Check out some of the new technologies and best practices offered by vendors for protecting virtual servers.

Hopefully this will help put your virtual server world back on the Road to Recovery!

Post to Twitter Tweet This Post

Scridb filter

A Data Protection Reference Architecture – The Final Chapter

September 1st, 2009 Steve Kenniston 2 comments

The Architecture

This ‘architecture’ diagram, as you can see, is not a typical architecture diagram, but hopefully it can be used to align your business and business objectives with the technologies that are available and can best be applied to solve your issues helping to balance, cost, complexity and compliance.

This diagram can also be used to do a couple of other things.  It can help you begin to classify your data and align your  data to your business objectives.  It also lets you begin to identify what data or data services in your environment that may be more important to you than others and based on this help you to choose areas you may want to outsource or move to the cloud.

As you can tell, there really is not one solution for meeting all your data protection needs.  The challenge comes with managing multiple solutions in an effort to meet your business objectives.  While there are only a few technologies available that allow you to manage your environment across all your RPOs and RTOs, it is important that I point out EMC’s NetWorker is able to do this, centralizing your data protection infrastructure  for ease of management.  It allows you to manage traditional backup, source based deduplicated backup with Avamar, CDP with RecoverPoint, as well as the EMC disk libraries and tape where the data is stored.  Now, I am not saying that NetWorker solves all of your data protection challenges, nor am I suggesting that replacing one traditional backup technology for another is the right answer, but what I am saying is that if you’re looking to have all the feature functionality required to meet all your business objectives and you want easier management, NetWorker is one avenue to get you there.  Additionally, the underlying image of the triangle represents data protection management.  Putting all the new technology in place is one thing, managing it, and ensuring you are now meeting your business needs is another.  EMC’s Data Protection Advisor can help here as well.

This diagram can help customers layout a new, better data protection schema for their environment and start thinking about data protection a bit more strategically versus tactically.  It can also help vendors speak to customers about how they should look at their environment in order to identify specific challenges and the means they need to alleviate these challenges , taking backup, beyond.

Post to Twitter Tweet This Post

Scridb filter

A Data Proteciton Reference Architecture – Part 3

August 24th, 2009 Steve Kenniston No comments

The ‘Fat Middle’

In the ‘fat middle’ of the triangle, as I stated last week, there are a number of ways to protection information.  I have chosen to break apart the middle into two categories.  The reality is, this is meant to be used as a tool for helping you lay out a strategy so your boxes could be based on capacity and could end up in different areas of the triangle depending upon your business needs.  The thing to keep in mind is that it’s not about your environment matching these boxes exactly, but it’s about making sure that all of the critical data that requires backup with a 24 hour RPO is protected; you then alignthe data value in the box with the most appropriate technology to 1) solve the challenge 2) fit best in your environment.

SMB / ROBO

First, let me clarify my terminology.  ROBO is remote office, back office and SMB is small to medium business.  If we think about the business needs that are most important in this arena, they are:

1)      Low cost

2)      Simplicity (one tool)

3)      24 hour RPO is adequate

Small and medium businesses, as well as remote offices, need a robust data protection solution that allows them to meet their backup windows and that has the ability to recover data that is not any older than 24 hours (RPO).  The RTO drives whether the backup target is disk or tape.   Faster recoveries come from disk.  Another thing to keep in mind is that there isn’t usually a lot of technical expertise at these sites so the backup application needs to be very simple to manage.

Backup appliances or appliance-like backup technologies tend to work very well in these environments.  A self contained backup appliance, (disk based) with the ability to replicate efficiently to another site for disaster protection is a great solution for sites like these.

In the case of SMBs, they can take advantage of a single application with integrated disk that could replicate to the cloud for very little cost and management while meeting their data protection objectives.  If cost is a driving factor, and the customer just wants better backup and recovery performance, moving to an appliance-based, capacity-optimized disk solution that could replicate is a viable option.  If the customer does not have a desire to replace their existing backup solution because it is working fine for them, then moving to disk based backup can help with most performance requirements.  (This is also true for the data center as well.)  And when customers really want tape as their backup medium for getting data off-site then the management will be a bit more complex but still easily achievable.

For remote offices in large corporations, again, an appliance that IT can remotely manage and replicate efficiently back to a data center gives users at the remote site local recovery time objectives, hours, as well as a DR strategy in the event there is a site level issue.

Along these lines, I have spoken to a number of customers lately who are utilizing virtual machines.  In a number of these cases, a virtual backup appliance is a great way to reduce the amount of complexity that is added to a customer’s environment yet still achieve the business requirements.

The Data Center

Next in the ‘fat middle’ is the data center.  There are many different backup challenges here.  One challenge follows the 80 / 20 rule.  Eighty percent of the data is usually unstructured data (file system) and 20% is structured (database and email).  As a general rule, the 80% of the data that is file system data is great for next generation data protection solutions such as source-based data deduplication.  Now, there are exceptions to this rule but a majority of the time source-based deduplication is the perfect fit.

A source-based deduplication solution could require that old backup agents are removed and new ones be deployed.  It may also mean that media servers are removed or repurposed.  The tradeoff for the extra work required to implement a source-based solution however are:

1)      Faster backups

2)      Less capacity stored on backup media (disk)

3)      More time freed up for the 20% of the backup environment that needs more resources

The third item in this list is very important.  As we discussed there is no longer a ‘one size fits all’ solution for data protection.  I also mentioned that source-based deduplication is a great fit for the unstructured data in your environment.  However, for the structured data, information that has  a high change rate, traditional backup applications typically backup this data faster than source-based deduplication.  Keep in mind; if you are in a larger data center, you have probably architected your backup infrastructure to meet the demands of your more important applications.  These applications probably backup on the SAN, may be server-less, and have likely required a good deal of time spent by IT ensuring that there are no issues with protecting these applications.  However, with the data growth in the environment, backups across the enterprise are running more slowly so they are having an impact on the critical business applications.  By implementing a source-based backup solution for the 80% of the data in the environment that it is a good fit for, you off load the traditional backup application so that it can focus on the 20% of the data in the environment that may have a greater business need.

Another good fit for source-based deduplication is in virtual server environments.  The benefits of server virtualization are soon forgotten when it comes time to back them up.  The reality is, virtual servers are not designed for high I/O and backup is the application in your environment with the highest I/O.  By leveraging source-based deduplication and removing all the redundant data at the source before it needs to be sent through the virtual servers physical resources, you can dramatically  decrease your backup bottleneck and increase your TCO with virtual servers.

When it comes to source-based deduplication, one thing to consider is that some customers may not want to go through the process of changing over to a new data protection technology.  If this is the case or if there is an area where source-based deduplication isn’t a good fit, disk targets such as VTL or target based deduplication is a good way to increase backup performance over tape, and reduce the capacity of data that you are storing on a daily basis.

Also, when I speak with customers these days, they want to reduce their reliance on tape more and more.  Deduplication solutions allow for appliance to appliance replication very efficiently.  This enables customers to get data off site efficiently and store data on disk at the same cost as storing data on tape while increasing operational recovery and ensure you are on the Road to Recovery.

Post to Twitter Tweet This Post

Scridb filter

A Data Protection Reference Architecture – Part 2

August 20th, 2009 Steve Kenniston No comments

Archive

The most fundamental part of developing a good data protection architecture starts at the base of the triangle with Archive.  Archive is often an overlooked component of data protection – It’s not just for regulated business anymore.  Archive essentially gives users 100% data deduplication efficiency.  What I mean by this is that you have the ability to remove ‘stale’ data (and by ’stale’ I don’t mean unimportant data, I just mean data that is not accessed frequently) completely from your backup stream so you don’t continue to back it up.  Let’s face it; the two most important commodities in backup are time and capacity.  Both of these are interdependent of one another.  The more capacity you have, the longer it takes to backup and the more money it costs to store.  The longer it takes you to backup, the less likely you are to be meeting your business objectives.  Data capacities aren’t shrinking, they are growing.  According to the latest IDC data, capacity is growing at a staggering pace of 65% year over year and the digital pack rat in all of us is too afraid to get rid of anything,  compromising backup windows and hence the business.  By archiving data that hasn’t been touched in some period of time and removing it from the backup stream, you can relieve some of the pressure on your backups and possibly not have to make any significant changes to your backup infrastructure.

Also, you don’t have to backup to a special purpose device or appliance for archive.  You can archive data to any file system.  I would keep in mind however, that you want to archive to a platform that can keep costs low.  Remember this data is not unimportant, just not highly used.  Take into account your RTO and store the data on the most cost effective platform possible that also aligns to the business objectives.  This may be tape, it may be optical or it may be disk.  If it is disk, you want to store it on disk that is optimized for this type of data, optimized for capacity (deduplication, compression, single instancing), has low power and cooling costs, can replicate for availability and is highly reliable.  You will also want to make sure that it is integrated to some extent with an application that lets you find the data pretty quickly when you need it and put you further down the Road to Recovery.

In my next post we will talk about what I call the ‘fat middle’.  In this area most all of the data has a 24 hour RPO and is where traditional and next generation backup applications play.  There are many use cases for data protection in this area and RTOs tend to drive the medium to which data is backed up to (disk or tape).  Stay tuned for Part 3.

Post to Twitter Tweet This Post

Scridb filter

Storage Switzerland

August 18th, 2009 Steve Kenniston No comments

One of the more thoughtful analysts in the industry, in my opinion is George Crump from Storage Switzerland.  (I like the name and George is as independent as you can get in

this business.)  Yesterday I had the pleasure of briefing George on EMC’s Data Protection Vision.  I like talking with George for a couple of reasons.  First, he gets it.  What does that mean.  Read his material.  He is genuinely trying to educate IT folks on what is really important in the data center and how to address these challenges.  Next, he keeps the ‘pay for’, ‘vendor spin’ to a minimum.  George works hard to just talk about the facts of a product or industry and talk about how products can help without selling.  The reality is, we live in a great technological time.  The problem with IT is that only 50% of the problems are technology related.  The other 50%  is psychological.  IT can’t just implement new technology because its cool or even because it really does solve a problem.  Sometimes new technology is too expensive to implement or the solution that is currently in place had a three year amortization and your only two years into your product life.  Or, more importantly, the new technology may be the greatest technology at the right price but it doesn’t fit into the current IT priorities.  These are all things IT needs to work through when considering whether or not to invest in new technology.  The other thing George and I spoke about was the fact that it gets difficult to be ’strategic’ in IT especially given certain economic times.  A lot of times IT just needs a band-aide or quick fix to move on to more important issues that really drive the business.  I talk about this  a lot, especially when it comes to backup.  Lets face it, it may not be what we all want to hear but backup is not strategic to most environments.  The applications that drive the business are most important.  Backup is about risk mitigation and information availability if everything else fails.  Right, ‘if everything else fails’, and IT typically invests in technology in the front end in an effort to have as little failure as possible.  Meaning, IT doesn’t just buy JBOD with no RAID if they think the environment shouldn’t be put at that kind of risk.  So IT is  already investing in some risk management up front which drives the spend on the back end for data protection.

I wanted to say “Thanks” to George for taking the time to come in and understand the bigger strategy EMC is driving with its products in the data protection space and to talk about our existing successes with the current portfolio.  Hopefully George, as well as all of you, can see how we are helping to put customers on the Road to Recovery.

Post to Twitter Tweet This Post

Scridb filter

A Data Protection Reference Architecture – Part 1

August 14th, 2009 Steve Kenniston No comments

This blog will have multiple parts.  I will introduce my view of a data protection reference architecture and the next few blog posts will talk to components of that architecture.

The other day  I had a very interesting conversation with a colleague of mine in Australia.  He was looking for a data protection reference architecture that he could use to speak to his customer.  As you can imagine having this conversation over the phone could pose to be a difficult challenge.  When the conversation began, my fear was he was looking for an ‘architecture’ diagram that included data protection appliances, backup servers, disk libraries, tape libraries and backup agents.  I quickly realized that this is an impossible conversation to have with him without knowing:

A)     the customer’s environment or challenges

B)      the customer’s business objectives

I find that most vendors don’t know A or B when speaking to a customer about their data protection ‘issues’, but they really should.  Having a more thoughtful conversation with customers in a consultative fashion is more relevant to customers in understanding their challenges and helping to align these challenges to the best possible solution.

I started my conversation with the diagram shown below (Figure 1).  A simple triangle divided horizontally into 4 segments and the middle two segments divided vertically in half.  Each segment represents different business objectives within a company.  As you go around the triangle, you can see that there are different technologies and different methodologies for attacking data protection challenges, which is why there is no longer a “one size fits all” approach when it comes to protecting data today. Let’s face it; the two most important commodities in backup are time and capacity.  One of the primary drivers behind the type of protection that is used is the Recovery Point Objective or RPO.  Different technologies provide different RPOs and each has a different price point as well as there are different processes that can be applied to attach RPOs.

Figure 1

Figure 1

Having a conversation specific to this diagram can have a tremendous amount of value on a number of fronts, including; aligning technology needs with business objectives as well as highlighting critical pain points and beginning a roadmap that helps implement data protection technology based on business needs and budget and put you on the Road to Recovery.

The next post will cover the foundation of the triangle – Archive.

Post to Twitter Tweet This Post

Scridb filter

A Data Protection Tribute to Michael Jackson

July 7th, 2009 Steve Kenniston 6 comments

I was walking through the data center the other day when I heard one of my colleagues, MJ “Scream”, “I wish I had some ‘Morphine’”.  Well, I have to say I was “Speechless”.  I walked over to where MJ was standing, near the tape library, and when I asked him what was wrong, he replied “there was another backup tape ’Jam‘.”  MJ told me he had been “Working Day and Night” on a major backup problem and he was now bouncing “Off The Wall”.  He told me he was sick of dealing with traditional backup tools and just wanted to get rid of tape.  I told MJ that it was “Human Nature” to feel “Bad” in a time like this but I also told him, “You Are Not Alone”.  I said MJ, “’Keep The Faith’, we all ‘Remember The Time’ when backups ran like a ‘Speed Demon’ and were ‘Unbreakable’, but that is ‘HIStory’, tape isn’t that fast any more given the amount of data we now have.  I also told him that “We are Backup Administrators, we are ‘Invincible’ and ‘Heaven Can Wait’ for us, and while we may not have our issue fixed at the ‘Break Of Dawn’, we would ‘Come Together’ to ‘Heal The World,’ or at least the datacenter’ (I chuckled).  I proceeded to tell him about a revolutionary new backup concept utilizing source-based deduplication technology.  It’s “PYT”, a pretty young thing, but  more importantly it’s here to stay.  EMC  offers it with a product called Avamar , the most efficient variable block,  source-based, deduplication technology on the market that:

  • Helps to eliminate tape all together
  • Is perfect for VMware environments
  • Protects remote offices most efficiently
  • Stems the tide of data growth on NAS platforms

Well I thought MJ was going to give me “Trouble” for my comments.  I mean it, all of the sudden I had “Butterflies”, I felt “Threatened” because I knew this guy could be a loose cannon when it came to trying something new, he could be “Dangerous” he may moonwalk over to me and slap me with his glove. Change can be scary.  But just then MJ let out a “Smile” (quite frankly I thought he was going to “Cry”) and said “’I Can’t Help It’, my job is ‘On The Line’ and I ‘Wanna Be Startin’ Somethin’’ soon before my boss tells me to ‘Beat it’” he just felt “2 Bad”.  I told him, “’Don’t Walk Away’ and ‘Whatever Happens’ ‘Billie Jean’ and I were going to help get him out of ‘Trouble’ and together we would replace the tape infrastructure, make backups run 10x faster, provide him with tools that actually verified his backups and make his backup problems ‘Ghosts’”.

I called Billie Jean and at first she said, “’Leave Me Alone’, ‘Why You Wanna Trip On Me’”, but I told her we need her help, so she said she could help MJ and I.  When she asked what the trouble was, I told her that our backup environment was in shams and if MJ didn’t get it fixed, with the right solution that they were going to put MJ on a “Carousel”, that there would be “Blood On The Dance Floor” and he would end up being “Someone In The Dark” “In The Closet”.  Billie Jean hopped on the phone and called “Dirty Diana”, we are all “Just Good Friends” really.  She told her the story and when it came right down to it, it really was “Black or White”.  We needed some “Money”, “2000 Watts”, to replace the old tape libraries with the new Avamar technology and “One More Chance” to fix all of MJ’s backup issues.

I told MJ the plan; we were going to sneak past the guards (that would be simple because “They Don’t Care About Us”) and then replace the old equipment with the new equipment.  MJ asked, “’Is It Scary’ in the datacenter at night?”  I told him we would be fine, that this would not be like his “Childhood” days.  MJ just said, “I Wanna ‘Rock With You’”.  The next night we snuck into the data center like a “Smooth Criminal”.  First, we had to “Get On The Floor” the new Avamar technology.  Next we installed Avamar and it fixed our backup problem right away.  I said, “Man ‘Is It Scary’ or what?”  “Another Part of Me” was just proud of the work we had all accomplished.

The next morning we went into the office of “Little Susie” and knocked on her door (it was always closed because she liked her “Privacy”).  She was MJ’s boss and she was no “Tabloid Junkie” she was a real “Superfly Sister”.   She said, “’Who Is It’”?  We told her and she let us in.  We showed here some reports we had generated from another product we acquired called Data Protection Advisor.  We showed her where all the previous backups had been failing due to problems with network performance, tape libraries and not enough time to back everything up.  Then we showed her that with Avamar we were backing up data in just 1 hour with 100% success because we were seeing 99.5% duplicate data in our NAS environment and that was why we couldn’t meet our backup windows with tape.  We also showed her that our VMware environment could go from 10 to 20 virtual servers per ESX host because backup was no longer the bottleneck keeping us from implementing more virtual guests.  Well she was pretty happy, she said “You Rock My World” and she was not upset that the tape environment was “Gone Too Soon” because it was a true “Heartbreak”.  I told her it was a team effort and we couldn’t have done it without the help of a lot of people including EMC. It was a real “Thriller”.

Post to Twitter Tweet This Post

Scridb filter

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.