Archive

Archive for November, 2009

How Much Backup Capacity Does Deduplication Really Save?

November 30th, 2009 Steve Kenniston No comments

There is a lot of discussion around data deduplication for backup these days.  (I wish I could deduplicate all the turkey I ate last week.)  In fact, Gartner claims that “…by 2012, deduplication will be applied to 75% of backups.”  And when asked “Why?” the response was “…deduplication is too compelling to ignore.”  But I say “prove it”.  So I put together some backup capacity numbers for storing data on tape (non-compressed and compressed) versus storing data, deduplicated (fixed block and variable block), on disk and the numbers show a dramatic savings in backup space which translates into cost savings.

The Parameters

As with any ‘analysis’ numbers can be ‘spun’ to make them say what you want.  That said, I tried to be as straight forward as possible, so let me also show my methodology so you can see how my numbers were derived.

  • I charted the amount of capacity created using a retention policy of:
    • 14 Dailies
    • 4 Weeklies
    • 12 Monthlies
  • I selected 10TB of primary storage capacity
  • I did this for file system backups only
  • I charted the data for 30%, 40%, 50% and 60% primary storage growth rates
  • I charted traditional tape based backup (non-compressed)
  • I charted traditional tape based backup (compressed, 2:1)
  • I charted fixed block disk based deduplicated backup
  • I charted variable block disk based deduplicated backup (3 to 5 times more efficient than fixed block deduplication)

The Effect

The first thing to think about is the sheer number of full backup copies that must be maintained when utilizing the above retention schedule.  The above retention policy leads to 17.2 copies of the primary storage (12 yearly’s + 4 monthlies + the equivalent of 1.2 with dailies = 17.2 copies) .  Translation: one terabyte of primary storage becomes 17.2 terabytes of tape storage.  This means, backup administrators need to pay for the physical tapes as well as the offsite transport and storage costs.  Now 17.2 terabytes of tape doesn’t sound like much but keep in mind that is for 1TB of primary capacity.  Ten TB of primary capacity yields 172 TB of tape capacity.  Now add in year over year storage growth.  At 30% primary storage growth, the backup storage growth grows 23%, at 40% primary storage growth, the backup storage growth grows 29%, at 50% primary storage growth, the backup storage growth grows 33% and at 60% primary storage growth and the backup storage grows 38%.

Figure 1 below shows, 10 TB of primary capacity growing at 30%, 40%, 50% and 60% along the x-axis respectively and the corresponding capacity of tape or disk consumed along the y-axis is.

Figure 1

The graph shows that compressed backup to tape obviously yields a 50% capacity improvement over non-compressed tape as one would expect. It also reflects that fixed block deduplicated disk capacity is only about 48% more efficient than uncompressed tape storage yet variable block deduplication is 81% more storage efficient than uncompressed tape storage.

Interesting as well, the chart reveals that fixed block deduplication is 3% less efficient than compressed tape whereas variable block deduplication is 62% more efficient than compressed tape. Typically, with the same data change rates, and equivalent data sets, variable block deduplication is 3 to 5 times more efficient than fixed block deduplication.

The moral of the story – if you’re going to do deduplication, variable block is the way to go. From a cost perspective, there is essentially no difference in the $/TB price however there is much more value in the long run with variable block deduplication. Vendors typically charge a $/TB price for their deduplication solutions. The difference between fixed and variable block deduplication comes down to the capacity of data that is stored in the backups which directly translates into costs. If you take a look at Figure 2, over time, starting with 1TB of primary capacity growing at 25% over the course of one year, IT will need almost 2TB of backup capacity with fixed block deduplication versus less than 1TB of capacity using variable block deduplication (assumes fixed block is 5x less efficient from imperial data that has been collected in the field.). The most important part of this graph is the slope of the blue and red lines. The greater the degree of slope (red line), the more frequently IT will need to purchase capacity to protect the given data set as well as need to pay for licensing as it pertains to deduplication software. IT wants the smaller slope.

Figure 2

*Note: Some companies will position their fixed block technologies as variable block by stating that you (the user) has the ability to set the block size to what ever you want, however, once set, it stays that way for all of your data.  The difference is, true variable technologies adjust the block size on the fly using their algorithms to ensure maximum efficiency with no management.

Bang for the Buck

The most important benefit, as with most things in IT however is overall cost savings. Deduplicated disk solutions are anywhere from 2.5X to 3X more expensive than tape, however with the overall capacity savings, there can be significant cost savings. Figure 3 is representative of the overall costs of new deduplicating disk systems and traditional tape backup systems (including tapes and off-site storage costs). I will caveat this by saying every TCO and ROI has a ton of ‘what ifs’ that factor into overall costs including things like FTE for backup engineers and long term retention costs, but for the most part, disk systems reduce a good deal of these costs (with the exception of power and cooling) and increase the reliability, security and performance of backups and recoveries.

Figure 3

1 The chart above is based on a rough cost of $8,000 per terabyte of tape backup system costs (including media and off-site storage) and rough cost of $20,000 per terabyte of deduplicated disk backup system costs for the period of one year.  Prices will vary depending upon your configuration and these estimates do not include space, power, cooling or human costs.

As I stated above there are only a few factors that are involved in this very raw calculation.  There are a number of other factors involved with a backup process including WAN costs (if replacing tape with disk), remote office facilities, installation (professional services), and software and hardware maintenance to name a few.  But no matter how you look at it, disk based backup with variable block deduplication wins over tape.

Backing data up to deduplicated disk not only saves the amount of backup capacity that is used, it also has other implications for a data protection environment.  First, backing up to disk versus backing up to tape helps to reduce the reliance on tape and the inherent limitations, security concerns and reliability issues surrounding tape.  Recovery of data from disk reduces the operational costs and decreases the recovery time objective.  Additionally the reliability of disk with RAID is much higher than the reliability of tape.

New data protection technologies are evolving backup to a degree where the entire data protection process is getting easier manage by removing multiple points of management (backup servers, media servers, tape libraries and physical tape).  As backup continues to evolve, this can help simplify the overall process and;

  • Increase reliability of backups
  • Reliability of recoveries
  • Decrease backup times
  • Decrease the time to recover data

The Bottom Line

New challenges in protecting information are arising every day, whether it is data growth, remote office data protection or virtualization, backup is getting harder not easier.  Data deduplication is providing backup administrators with tremendous benefits around backup processes and cost savings.  It is important to keep in mind that everybody’s environment is different and utilizes different methods and processes for managing and protecting information.  It is also important to take a look at your data protection environment today and understand the use cases where it is time to make new investments.  I encourage you to look at new technologies to help you with emerging challenges and weigh the overall solution including costs as well as benefits of disk based recovery.  New backup technologies that leverage data deduplication can save IT a lot of money and put you on back on the Road to Recovery.

Post to Twitter Tweet This Post

Scridb filter

Enterprise Data Protection at the Edge

November 19th, 2009 Steve Kenniston 2 comments

What does that really mean?  When I worked for Veritas, back in 1998 we acquired a company based out of Canada called TeleBackup that backed up desktop / laptops.  In 1999 Veritas acquired Seagate and the Backup Exec product which also had a desktop / laptop option.  These products were meant to eventually be integrated into the main backup applications but never were.  Additionally, a lot of that software was given away (hard to make a business on that) and for the most part,  lived on a shelf somewhere and was never installed.

In 2004 I worked for Connected Corporate (acquired by Iron Mountain), who’s sole business was desktop / laptop backup.  (In fact, from 2000 to 2004 I worked as an analyst for ESG covering all the vendors in the backup space and used the Connected product to backup my work laptop – and it actually saved my hide once.)  While the company executed a successful exit, the business was (and probably still is) only about a $20M to $40M business.

Why do I bring this up?  There is a new reality in IT these days.  I have said it before, IT is accountable for 100% of the data created in any company, including that stored on desktop/laptops.  This means that not only do they have to provide a location to store this data but IT also needs to provide tools to protect this information and ensure that this information is highly recoverable for both business productivity purposes as well as corporate and legal governance.   This means that desktop / laptop backup is now gaining a lot more visibility in the enterprise.

However, desktop / laptop data protection is one of those areas in IT that is just a nuisance because it seems like it should be an easy problem to solve, but there are so many moving parts to it that it ends up falling by the wayside.

A successful desktop / laptop backup technology needs three very specific capabilities:

  • Integrate seamlessly with the existing backup solution in the enterprise
  • Share a common, deduplicated, back end repository
  • Have a very SIMPLE and robust end-user interface to allow for end-user restores

The desktop / laptop solutions I discussed above did not, and do not, have these capabilities.  Even though these technologies come from reputable companies, not having these three capabilities is what has led to their very low adoption.

These three capabilities are all inter-related.  First IT needs an integrated solution because they do not want to have yet another piece of software in their environment that they have to manage, especially data protection software.  The fundamentals of backup are pretty simple.  Install an agent on the machine you want to protect, go to the management interface of the backup application and set up a few simple rules or policies (backup this system, at this time, to this device, catalog it and finally, keep the data for ‘x’ number of days, weeks, etc..) and start protecting your data.

One challenge is that most backup products don’t have an agent that is lightweight enough to run as a client on a desktop or laptop.  This causes incredible performance degradation of the system during backups, and let’s face it, if you have a laptop, 9 times out of 10 you’re going to be working on it when the backup kicks off so you will end up shutting it down which leaves you with unprotected data.  Client side data reduction techniques help to reduce this problem.  By moving less data, they run for shorter periods of time so there is little to no end user impact.

Next, if you did have an agent that worked well enough to backup all the desktop / laptop systems, then it would impede the backups of the other mission critical systems in the environment by utilizing all of the resources on the devices where the data is being backed up too.  (Take a look at Architecting for Recovery for more info.)  This means that IT would have to set up additional, separate devices to protect one subset of systems leaving them with more devices to manage and making it a hassle to implement.  (This is one reason why ‘cloud’ like solutions have become popular, providing less things to manage, however not every company wants their data outside of their control.)

Also, if you look at the nature of data on desktops and laptops, they share a ton of common data.  Why would any IT person want to backup that much data over and over again?  Traditional desktop / laptop solutions don’t provide robust capabilities for reducing the amount of redundant data that needs to be protected which also translates into longer backup times and more ‘storage’ utilization (making it more costly).  Deduplication allows you to implement a common repository.

Finally, the tools for end user recoverability need to be very robust.  The last thing IT has time for is an increased call volume to perform data recovery for end users.  This also means that data needs to be stored on disk because end users aren’t going to load tapes to recover data which also means that data needs to be stored on disk in the most efficient manner possible to save on costs.

There are a number of other nice-to-have features, but the lack of the three capabilities outlined above have has limited the adoption of desktop / laptop backups. Until today there hasn’t been a good solution that met these criteria.

This week EMC | Avamar launched a desktop / laptop backup component as part of their enterprise solution.  The difference between traditional desktop / laptop solutions and the Avamar solution is that the Avamar solution is 100% integrated as a part of its enterprise backup application, storing data on disk with a high degree of efficiency leveraging single instancing and deduplication.  Additionally, clients are free and they all share a common backend repository with the enterprise backup application that is protecting other common data in the enterprise.  Finally, end-users are able to perform their own restores.  What does all this mean?  Simplicity and low cost.

The Avamar backup technology provides enormous economies of scale when extending from the enterprise to the desktop / laptop.  By backing up to a single common repository utilizing global single instancing and deduplication you NEVER backup the same data twice, no matter where the data lives.

Think about this scenario – a user creates some document, say a PowerPoint presentation.  This presentation ends up being emailed to a number of people in the company and then saved on the desktop as well as in a number of file shares (home directories) on the NAS system.  This one 1MB presentation can represent 120MB of backup disk capacity.

Now if you utilize Avamar, the process would be, first the enterprise application would backup the NAS box and may see the file 20 times.  Avamar would single instance and deduplicate it such that it only one instance is backed up.  Next the desktops start their backup process and see that the Avamar Data Store has already protected this data so again, it doesn’t need to move or store any additional data.  A pointer is created to let the data store know that the desktop / laptop also has the ability to recover this same file.  This provides tremendous scalability.  This essentially means protecting all your desktops / laptops for free.

The technology is easy to manage (same client, same simple management tools), it provides a simple to navigate end user interface for self restores, and provides an integrated, single instance, deduplicated backend.

Seems like a triple play from the Avamar product and is helping to put IT back on the Road to Recovery.

Post to Twitter Tweet This Post

Scridb filter

Architecting for Recovery

November 17th, 2009 Steve Kenniston No comments

Here is a shocker for you, backup IS a science.  Good backup administrators / architects are worth their weight in gold.  CIO’s just wish backup would go away.   Backup costs money, it’s not strategic, it chews up man power and when it is ‘running’ (successfully or not) no one really pays attention to it, but when it fails or more likely when you need to restore data and can’t, someone can lose their job – so backup is VERY important, it is a science and to architect a backup environment correctly  it takes time, skill, money and someone who knows what they are dong.

Good backup administrators architect for recovery, not for backup.  Prove it you say.  Okay, question: “Why do backup administrators do full backups of Exchange every night?”  Answer – because it is way easier and much faster to perform a one step full recovery for Exchange than it is to lay down the weekly full and apply the incrementals.  Since mail is considered a “critical application” in the enterprise these days, and down time is critical for this application, good backup administrators architect for the least amount of downtime for the application.  This also applies to databases.  Ninety-five percent of all databases are actually snapped for quick recovery and I would also bet that a full backups is performed on them (or the snap) every evening.

Recovery is a primary driver of any good backup architecture but lately I have been hearing a great deal of talk around ‘backup consolidation’.  The reality is, there is no ‘one size fits all’ when it comes to backup software or hardware.  Consolidating backup software may make your environment easier to manage, but does it provide you the tools/technology you need to maximize your data protection objectives in your environment?  Consolidating backup targets (tape / disk) may yield fewer devices to manage, but what happens to your overall backup and recovery performance when doing so?  While new technologies may help fine-tune the science side of backup, they still need an artist’s touch.

An area where consolidation comes up quite frequently in the backup arena is around new data deduplication solutions.  While these technologies add tremendous value, it should not be suggested that you forget about good backup architecture practices.  For example, if deduplication is the removal of duplicate data, how much duplicate data is there really between your production data bases and your file systems within your company?  Mixing the storage repository for your file system and data base data just doesn’t buy you a lot in your deduplicated backend so why mix them?  It would make sense, however, to have a device / appliance for each database or set of databases that have common data as well as a device / appliance for file systems that have common data.  Doing so would yield better backup and recovery performance and would probably mirror the same set of rules you would you used your ‘old’ backup environment.  (Notice, I said ‘rules’ not devices or technologies.)  Now as long as the cost isn’t exponentially higher having multiple devices (including management costs), recovery can be much easier and faster.

Another interesting side note, since most IT shops do FULL backups every night of their database, for the purpose of faster recover, then why wouldn’t you want to have a dedicated backup storage device that does a ‘full’ backup every night of the data and only needs to move the changed data?  This is the very nature of the Avamar technology and what this ‘next generation’ backup technology is designed to accomplish versus what traditional backup technologies try to do with cumbersome processes of full and incremental backups.  Why not, for example, set up a dedicated Avamar Data Store for DB backups with the proper number of nodes for performance, and leave it at that?

Best Practices / Professional Services Have the Last Word

Instead of naysayers making a bunch of statements that certain technologies ‘can’t’ solve a problem, why wouldn’t they take a page out of a professional services handbook that says ‘if the solution is architected properly (and can be delivered at the right cost, and meet your business objectives) then there is no reason not to make any technology work to its maximum potential and solve difficult problems, that is the real science.

Ten years ago, backup administrators would say, “okay, if you can’t get the backup / restore performance you need for that data set, then we will add another media server, get some more licenses and backup that data separately such that when you need to perform a restore, you can set up a dedicated media server for faster recovery.”  Should this be any different today?

Backup is about recovery and more importantly performance (RTO) but it is also about architecture and a good backup architecture will put you on the Road to Recovery.

Post to Twitter Tweet This Post

Scridb filter

Computer Comedy

November 9th, 2009 Steve Kenniston No comments

I just love this… I have seen this circulating around the internet and just too funny not to pass along.

Post to Twitter Tweet This Post

Scridb filter
Categories: Backup Tags:

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.