Archive

Posts Tagged ‘Dedupe’

Deduplication – Older than You Think

October 30th, 2009 Steve Kenniston No comments

So I am a big fan of National Public Radio – NPR.  Today I learned that yesterday 10/29/09 was the 40th anniversary of the ‘internet’.  Now, I am sure there are a number of theories on when the internet was started and who started it, but safe to say that at this time in history 40 years ago, two guys from California sent the first 5 letter message, ‘Hello’, over a wire between two computers and internet messaging was born.

Since this point in time people have been trying to reduce the amount of data sent over the internet.  From email to instant messaging, from full files to compressed files and from disk drives to USB drives – people are always trying to make information trafficked over the internet smaller and faster.  No surprise coming from a group of people who have turned every term on the internet into an acronym, from USB, ISP, PDA, and LCD to SRM, ARM, and DPM, techies are always trying to stuff more data into smaller spaces.

Over the past 2 years data deduplication has become the latest fad in putting more data into a smaller space.  By removing redundant ‘blocks’ of data from the mass of files stored it is conceivable to reduce your data foot print by as much as 70%.  Deduplication is playing a predominate role in backup, especially backup over the WAN.  With deduplication, you can easily move your data over the WAN to a central data center for protection moving only small changes (blocks not files) of data and make even more room for FaceBook, Hulu, iTunes and more.  What is next for the internet.

Post to Twitter Tweet This Post

Scridb filter

The Side Effects of Backup on Server Virtualization

September 14th, 2009 Steve Kenniston 2 comments

Server virtualization has changed the IT landscape dramatically.  It has become a magic potion curing a number of ills in the physical server world such as low individual CPU utilization and excess use of space, power and cooling in the data center.  However, like all potions that cure what ails you, there can be side effects.  You need to be careful of what the Witch Doctor orders.

When I speak with customers who have aggressively implemented a virtual server infrastructure, 9 out of 10 will tell me that they underestimated the affect that virtualization would have on their backups and backup process and how backup might actually make virtualization less of the magic potion they had hoped, when not considered during the virtual server assessment and planning process.  So what is the issue?  Backup is a virtualization bottleneck, and without addressing it, you may not be able to obtain the server consolidation ratios you had been expecting which can have a negative effect on your virtual server TCO and ROI.

This is a timely discussion as VMworld has just concluded.  VMware users flocked to VMworld looking for best practices when it comes to implementing virtual server technology.  Because virtualization allows IT to reduce the overall physical hardware infrastructure, users will be looking at how to maximize their server consolidation ratios (get as many virtual servers on a physical server as they can and still provide good application performance).

I often hear that companies assess their environments by looking at the production applications on their physical server environment, identify their work loads and translating that into some consolidation ratio of physical servers to virtual servers.  I also hear, from these same customers, that backup was never taken into consideration during the assessment phase when trying to identify the best possible consolidation ratios.  These customers implement their new virtual server environments, install the backup agent they had previously been using for physical server backups and attempt to backup their virtual servers and they find that they would only be able to protect 50% to 60% of the new environment.  Why?

Let’s look at the physics.  Let’s say your virtualization ratio is 12 virtual servers to 1 physical server.  Ten physical servers backup with 12 NIC cards, 12 CPUs, 12 Memory ‘chunks’, etc… When you moved these 12 physical servers into the virtual world and put them on one physical server did you put 12 NIC cards in the new physical server?  Did you put 12 CPUs in the new server?  Do you have 12x the memory?  Chances are, probably not.  However the capacity didn’t change did it?  So how could one expect that the backup performance, which is I/O, memory and CPU intensive would operate well in a virtual world?

Diagram 1 below show how when you backup 12 servers, the resource drain on each server is roughly 25% (per system during a full backup).  When you virtualize these 12 servers onto one or two physical servers, your physical system utilization shoots up to 80%+.  This utilization can be so dramatic that it actually effects the number of virtual servers you can have on these systems which can ruin your virtual server TCO / ROI.

Figure 1

Figure 1

Simple math dictates, unless you have all the same resources on your new physical server as you did on all your physical servers before the consolidation, you won’t get the same backup performance.  I have spoken with customers who aimed to do a 25 to 1 virtual to physical server consolidation, who  were only actually able to get a 15 to 1 consolidation ratio in reality because their backup application couldn’t handle 25 virtual servers on one physical server, leaving some unprotected.

People could argue that if you properly schedule each virtual machine to backup in a window when all the other systems are not backing up, then perhaps you could get by with traditional backup.  The flip side is, IT has been telling me they don’t want to manage the backup process anymore than they have to.  So how do you ‘fix’ this problem?

The issue is that backup is a very intensive I/O application therefore there is only one way to fix the problem.  You need to reduce the amount of I/O generated and sent through the physical devices that house the virtual servers during backup.  Virtual servers were designed to provide a lot of benefits but high I/O capabilities is not one of them.  (This is okay, every technology implementation has its tradeoffs.  When the positives outweigh the negatives, especially in a substantial way, as they do with virtual servers, you usually have a paradigm shift, and this is what we are seeing with virtual servers.)

So how do you change the I/O pattern of backup?   You do so by decreasing the amount of data that is utilizing the shared resources during backup.  There are a couple of ways to do this.  One way is to leverage the storage array and snapshot the data.  Snapshots allow you to make copies of virtualized server data and mount this snapshot to a proxy host and off-load the backups from the physical server that house the virtual servers.  The downsides are:

1)      This becomes a new set of processes to manage unlike traditional backup processes

2)      You need extra storage capacity with this solution

3)      You will need to manage another physical server (proxy server)

4)      You will need more backup agents from your backup software provider

The most efficient way, however, is to take advantage of a new backup software application that leverages data reduction (data deduplication) on the client.  Your processes stay the same, there is no need for additional primary storage hardware and by leveraging a ‘smarter’ backup client, you will reduce the I/O tax on your physical server devices and thereby have the ability to maximize your TCO / ROI for your new virtual server environment.

Additionally, a number of these technologies have additional offerings that truly make them next generation.  Backup licensing is slowly moving to a capacity based license model.  One great feature of these new products is the fact that there is no charge for clients or agents.  This allows you to create a virtual server template with the backup agent embedded within it.  You no longer have to worry about proliferating backup clients and then paying for all those clients when it is time to ‘true up’ with your backup software vendor.  Data deduplication technologies also offer the ability to replicate the backup data efficiently to disk at a remote site so you can develop a more efficient disaster recovery plan that reduces the reliance on a tape and increases your overall operational efficiency.

Regardless of which path you choose, each requires IT to rethink their backup strategies when it comes to protecting virtual server environments.

I encourage you to do two things as you consider moving to a virtual server infrastructure:

1)      Make sure you are thinking about data protection when architecting your new virtual server environment

2)      Check out some of the new technologies and best practices offered by vendors for protecting virtual servers.

Hopefully this will help put your virtual server world back on the Road to Recovery!

Post to Twitter Tweet This Post

Scridb filter

Betamax Redux

April 9th, 2009 Steve Kenniston 6 comments

I often joke w/ customers that when my friends were growing up they would dream of being a professional baseball player or a rock star and I used to dream of becoming a data protection technologist.  Recently I read something very profound in Chuck Hollis’s internal EMC blog. Chuck said, “Decide what you’re passionate about …and write about it… it is hard to write about stuff you don’t care about.”  I am passionate about data protection.  Not because data proteciton is “cool” or anything, but it is one of the most important practices in the data center.  It is also one of the most challenging practices in the data center and it involes not just technology but people and process as well.  I had an old boss once who said, “Where there is chaos, there is cash.”  and given the fact that the data protection market is a $10B market, I would say he was correct.  I have started this blog along with my colleagues because we truly believe in what we do, who we work for, the challenges we solve and benefits we bring to a customers challenging world around data protection.  We write because we are passionate about data protection, not because we are being paid to.

Something I read a while ago in Tony Assaro’s blog, Leaders Dilemma as well as Setting the Record Straight really got me charged up but I wasn’t sure how I wanted to comment. Tony, you see, writes for money (not passion), which means he has to write ‘for’ the company that is paying him and at the same time, spend time ‘Manufacturing Confusion’ in the market. (Sorry Tony, I liked you better as an analyst when you heard all the vendors product messages and would form an opinion about what was really going on in the market.) What I am referring to are the comments specifically about “EMC is the one big player going after this market in earnest with three different products (which will confuse the market and themselves)”. Quite frankly, EMC’s philosophy and message to its customers regarding data deduplication isn’t confusing at all. In fact when I speak with our customers, they believe we have one of the more thoughtful and consistent messages around this topic.  So in an effort to educate, let me share EMC’s data deduplication philosophy and how EMC will take backup, beyond.  EMC will:

  1. Provide deduplication as a pervasive & architecturally consistent service
  2. Coordinate deduplication throughout combinations of data storage and data movement
  3. Deduplicate at the highest level of abstraction
  4. Deduplicate as close to the source as practical

When these values are leveraged, the entire spectrum of data protection morphs into methods that will be used to protect data well into the future.

Back to the subject of the blog. Data Domain will continue to sell good products to customers. Data Domain will continue to innovate their existing technology to meet customers’ demands. But they will do this at the expense of a lack of innovation. Remember, the hardest thing to change in IT is process, not technology. Backing data up to disk targets is nothing new and now, backing data up to disk devices that perform deduplication is not innovative. However, the paradigm of using traditional backup software to move full files across an expensive network is beginning to evolve. It MUST evolve, and when it does, what happens to the companies that have interesting features that are just one small morsel in the food chain? If you don’t own any significant IP in the extended processes that is data protection, then you will be left out of the backup buffet.  And as Maslow would say, “If all you have is a hammer then everything looks like a nail.”

EMC has taken a leadership position in the data deduplication space not because they offer multiple products but because of the way we look at technology.  Data deduplication is made up of different components:

  1. Data ‘chunking’
  2. Compression / Encryption
  3. Assign Content ID
  4. Store

The goal is to be able to leverage these components across multiple storage platforms providing deduplication at the highest level of abstraction as possible and as close to the sorce as practical based on the requirementsof the application .  Preserve the content by deduplicating content instead of data.  The objective, over time, is to provide deduplication as a pervasive and architecturally consistent service across EMC’s entire storage portfolio.  When you do this the entire paradigm of protecting information evolves and this is why EMC is the leader in data deduplication.  Not because we have 3 (or however many) products, but because of the way in which we look at data deduplication.

At the end of the day EMC has over 2000PB of deduplicated data under protection utilizing both source and target based deduplication solutions. And, I would venture to estimate that if you include NetWorker, RecoverPoint etc… EMC has exabytes of data under protection. EMC has a long history of changing with the times, listening to their customers, investing in new technologies and protecting customers data they way they want and need it to be protected.  That is taking backup, beyond.

Posted by Steve Kenniston

Post to Twitter Tweet This Post

Scridb filter

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.