Archive

Archive for May, 2009

Data Deduplication 2.0

May 29th, 2009 Steve Kenniston No comments

Well, now that the dust has settled and all of the predictions have been made regarding NetApp’s acquisition of Data Domain, it is really time to talk about two topics that have a great deal to do with data deduplication.  The first is backup and the second is capacity optimization (which leverages data deduplication technology).

Backup

“Everyone hates backup.” “Tape is unreliable.”  “Tape is dead.”  “Archive first, then backup.”  “Do more with less.”  “It’s not about the backup, it’s about the recovery.”  ”The backup process has to change.”  We have all heard these words many, many times and while most all of them are true, backup, its processes, and the medium we very frequently write it to has been consistent for decades.  So if these phrases are true, and most people believe them to be true, what will it take to change the existing backup environment?

The answer starts with a question.  What do customers want from their backups?  Customers want simple copies of data, on inexpensive storage, that are easy to find, and easy to recover anywhere and everywhere.  Basically they want their data, where they want it, when they want it.

The next question is; how can this be accomplished?  Without trying to reinvent the wheel, it starts by shrinking the data as close to the source as possible.  Data deduplication is a game changing technology that enables this capability.  Given the amount of data growth year over year and the percentage of that data that is duplicate, if the data can be reduced at the source before I move it, and therefore move less of it, there can be a significant impact to:

  1. Backup times, because less data is moved
  2. Backup capacity, because there is less data to store

Data deduplication helps to reduce capital expenditures (driving the costs of disk and tape closer and closer and lowering network impact), as well as operational costs.  Data deduplication also helps to facilitate the main objectives of backup, which are to get the data where I want it, when I need it and making it more accessible (disk vs. tape).

The fundamental flaw, as I pointed out in Betamax Redux, is that, as an IT vendor, if you don’t own any IP at the front of the backup process (to optimize capacity before, during and after data is sent over the wire) then the choices of solutions that address both the process and infrastructure diversity in the customer’s environment are limited and really don’t provide the vendor with ‘a seat at the backup buffet.’

So if we agree that the hardest thing to change in IT is process, not technology, it stands to reason that current approaches to solve backup challenges have been modest, incremental approaches.  Customers often change out their target backup device from tape, to disk (that emulates tape), to deduplication targets, keeping their existing backup software infrastructure (which they have spent a good deal of money on) in place and having only incremental impact to the overall process.  It becomes a burden of proof for a new technology to make dramatic improvements in order to implement a disruptive approach to data protection.  Data deduplication does just this.

Capacity

Capacity optimization also plays a significant role in the data center, specifically for primary storage, and has a significant impact on secondary processes in IT such as backup.  As we have learned over the last 24 months, there is no ‘one size fits all’ strategy when it comes to deduplicating data.  There are also many techniques that can be used to deduplicate data.  Hence there is a very complex matrix between the right data deduplication technique, data type and performance requirements.

Investments and acquisitions in technology are key to the growth of any technology company, but it is only one component of a successful growth strategy.  The real trick is what can be achieved with the IP (short term and long term), vision, and a lot of hard work.  Developing a capacity optimization strategy that rationalizes all of the capacity optimization technologies into a set of services that can be leveraged by each device in a portfolio, adds a great deal of value to customers.  Optimizing storage capacity as close to the source as practical to achieve the proper balance between optimization and performance allows users to see benefits in primary storage right away and save on space (storage and footprint), power, and cooling.   Next, if the devices in the environment all speak the same capacity optimization ‘language’ (a form of it) then passing data from one device to another can reduce the impact on the network and open up new use cases for the technology such as reducing the reliance on tape within a backup process.  Finally, if the devices that sit at the next tier in the storage infrastructure can receive optimized data, and further optimize the capacity according to the SLA requirements in the next tier, then users can achieve maximum value.

This is not easy to achieve.  It takes a lot of vision, planning, buy in, and hard work.  EMC has a two year head start on this vision.  The goal is to provide a pervasive and architecturally consistent set of capacity optimization services across the storage ecosystem that includes hardware and software.  If you happened to go to EMC World, this was the premise of the presentation I delivered there.  Capacity optimization leverages a set of technologies that can enable new and foster changing business requirements such as new data protection requirements, changing recovery point objectives, cloud storage and security.

As you speak with vendors who supply technology for your infrastructure it is important, especially in these hard economic times, that you are asking them the right questions, such as “What is your integrated capacity optimization strategy?”  It will be interesting to see how NTAP rationalizes all of their technologies in this space.  I know they can’t say anything at this point but I will be paying close attention when they do.

IT is always fighting the latest fire, however, from a strategic perspective, if you really want to ‘do more with less’ and want to protect your infrastructure investments, think about how important technology features, such as data deduplication, can play a much larger role in your environment and help you achieve a much better TCO and get a much better ROI.  Data deduplication and hence capacity optimization can enable new processes which can have a dramatic impact on the overall backup environment especially when they are comprehensive and cumulative in nature.  Then and only then can it take backup beyond and put you on the road to recovery.

Post to Twitter Tweet This Post

Scridb filter

Now ‘it’ Makes Sense

May 21st, 2009 admin No comments

In a day and age where people are trying to understand the value of social media, no one can argue that when it is used to save a life, there doesn’t need to be a business plan.

Over the past 48 hours I have seen Bloggers, Tweeters and Facebook’ers all spread the word to help save a life. Nick Glasgow, a 28 year old in California was recently diagnosed with Leukemia and is in desperate need of a bone marrow transplant. Rather than go into detail, please click on the image in the right side panel of our blog and find out how you can help Nick.

So what is ‘it’ that makes sense? Social Media… Our blog name…

Let’s help with Nick’s ‘Road to Recovery’

Post to Twitter Tweet This Post

Scridb filter
Categories: EMC Tags:

Accelerating Backup Efficiency

May 19th, 2009 Steve Kenniston No comments

EMC’s announcement on accelerating your backup efficiency hits some very important concepts to help users make significant progress in solving some key backup challenges.

A lot has been said over the last 18 months regarding an inflection point, where the growth of data is out pacing the capabilities of traditional backup technologies.  This has driven the ‘one size does not fit all’ belief when it comes to backup technology for your infrastructure.  Vendors talk about utilizing new technologies such as disk based backup, VTL (virtual tape libraries), deduplication and data protection management in order to improve the backup process.  While each of these technologies can help to improve the process, customers need to act faster in order catch up with the growth of data.

It’s not to say run out and buy one of each of these technologies and collectively they will solve your all your backup challenges.  The first and perhaps the most important thing is to assess your backup environment.  The reason there is not a one size fits all policy when it comes to backup is because different data types behave differently with different backup technologies.  Data deduplication is great, but it can work much better when it is applied in the proper manner.  A combination of source and target deduplication can complement one another to maximize your backup efficiency.  As an example, by leveraging source based deduplication for the proper data in your environment can give you a significant number of cycles back to your traditional backup software and improve performance on data types that aren’t a good fit for source based deduplication.  So the message is, use assessment services to help you gain a realistic understanding of your data profile that allows you to choose the right deduplication for your environment.  Additionally, make sure the tools that you use to understand your deduplication efficiency utilize similar algorithms as the products you will use in your environment so there are no surprises.

Once you have a better understanding of the data types and data profile in your environment, the next message is to accelerate the use of data deduplication technologies that will allow you to best protect all of the data in your environment as efficiently as possible. Invest wisely.

 Another important thing to point out from EMC’s announcement is the simplification of the data protection environment.  It may take multiple different technology components in order for IT to get their arms around their backup issues, but it shouldn’t be hard to acquire, deploy, leverage or manage these technologies.  EMC has invested quite a bit of money in their products in order to simplify this process.  One example is how EMC’s NetWorker product has the ability to manage traditional backup, source based backup (with the integration of Avamar), target based deduplication (with the integration of Disk Library), bare metal recovery (with the integration of Homebase) and the ability to meet all of your recovery point objectives with CDP and the integration of RecoverPoint.  Additionally, you can leverage Data Protection Advisor to actively monitor your entire backup environment and see the successes as well as the failures and make decisions faster on how to fix any issues.  The faster you know you have an issue, as well as what the issue is, the faster you can address it and address it the right way.

EMC has also made it easier to protect application environments.  EMC NetWorker now has source based deduplication capabilities for Microsoft applications such as SharePoint, Exchange and SQL as well as Oracle databases.  Through the integration with Microsoft VSS you have the opportunity to use hardware based clones of your application data and mount that data on a proxy system where data deduplicaton can run and not impact the production host.  Additionally, restores are seamless as you can recover data right to the original host as it is needed.  Protecting files is very important, but it’s usually the applications that run your business.  The ability to more effectively protect these applications ensures a higher degree of business success in the event of a system failure.

Finally, by leveraging an integrated data protection portfolio you can take your backup beyond and put yourself on the road to recovery.

Posted by Steve Kenniston

Post to Twitter Tweet This Post

Scridb filter

EMC World Kicks Off with Clouds and Virtualization

May 18th, 2009 Steve Kenniston No comments

EMC World kicked off this morning first with a presentation from yours truly on Data Deduplication 2.0 – Comprehensive Capacity Optimization.  We discussed how data deduplication 1.0 is morphing into all areas of EMC’s storage ecosystem in order to optimize capacity everywhere.  I talked about data deduplication as well as single instancing and compression are technology components that will help EMC achieve this goal.

Next Joe Tucci spoke in his keynote about how data deduplication as well as compression are key technologies for the data center of the future and how these technologies will aid in delivering a more efficient cloud computing strategy.  Not only will these technologies help in building out a cloud infrastructure, they will also help to protect a cloud infrastructure (which is what we are all about here).

Finally, Paul Maritz gave his keynote on how the virtual infrastructure will help to fulfill the goals of a private cloud.  He also discussed that it is time to invest in software and people and not hardware as VMware continues to drive value into their software to help make your data center, better, smarter, stronger and faster for less.

Each of these initiatives will have an impact on how data is stored and ultimately protected but new storage services will enable more efficient storage and protection across the virtual data center and the cloud and ultimately take backup beyond and put you on the road to recovery.

Stay tuned for more updates about the show.

Post to Twitter Tweet This Post

Scridb filter

Backup Takes Off!

May 8th, 2009 Mark Sorenson No comments

There has been a lot written about the airline industry and its ongoing challenges.  Bankruptcies and mergers have been frequent topics on the business pages of the newspapers. Stranded passengers and jets sitting on runways for hours make are page news.

I travel a lot and have been doing so for 20 years.  It’s an interesting industry I have observed first hand, often times painfully.  I would postulate, the air travel industry is one of the few industries that effectively hasn’t improved in any measurable way over the past 20 years.  Consider:

- It still takes you six hours to fly from Boston to London, just as it did 30 years ago. Despite the brief, and ultimately failed foray into speed improvements via the Concorde, jets still fly +/- 500 MPH and get you to your location no faster than they did a quarter of a century ago

- Customer satisfaction has been steadily declining – across the board

- Flight delays and lost / mishandled baggage continue to increase

- The food is still awful, if you get any at all

It is, however, worthy to note that air travel continues to have excellent safety records. So if the airlines haven’t improved in speed, comfort, or the basics like taking off and landing on time, what the heck have they been focused on? After all, look at another travel related industry, automobiles, over the past 25 years. More features (how many of you still “roll-up” your windows), safer (airbags, traction control), more fuel efficient (introduction of hybrids), and more dependable (Six-sigma, Kaizan!). Airlines? Well, no such luck. They’ve been focused on “cost.” With de-regulation and the entrée of low cost carriers coupled with the price of fuel, cost savings is where all of their focus has been. Let’s think about these two facts:

- Fuel efficiency in the airline industry increased 21% over the last five years

- But the price of oil, increased 130% over the last five years

If you’ve flown recently you probably have noticed all the cost savings efforts. Paying for food, paying for extra bags, less amenities, etc… But here’s the real focus – stuff more people in seats to spread that cost of fuel over more tickets. Two more facts:

- In 1980 there were 433 billion available seat miles with a breakeven load factor of 59%

- In 2002 there were 893 billion available seat miles with a breakeven load factor of 84%

Just to state the obvious, in 1980, you needed 59% of the seats filled to break even (less, lose $, more, make $). In 2002 it took 84% of the seats being filled for airlines to make money (notice how this date correlates with the onset of airline chapter 11 filings). In the past few years, by flying more efficient planes and of course serving you fewer peanuts, the airline industry has taken its cost down a bit, but not much. In Q1 of 2006 the breakeven load factor stood at 77.2.

Now let’s take another “industry”…one close to my heart – “Data Protection!” Now, I don’t want to draw a complete parallel between air-travel and backup because for backup – there has indeed been progress in the technology. The problem with backup is that it hasn’t kept pace with demand, e.g. the requirements associated with the explosion of digital data being stored today. Consider the history…

1) We started by backing up individual systems, using backup software provided with the operating system; this was time consuming, decentralized, and you needed tools to backup and restore for each unique OS.

2) Progressed to network based backup (hence the names NetWorker & NetBackup”); one tool for the entire environment, centralized, sharing of resources (e.g. tape drives).  Still very time consuming and creates lots of network traffic.

3) Added the ability to perform backup over the SAN; reduced network traffic.

4) Began leveraging storage technologies like Snapshot based backup and increasing disk as a target; still, today, for the most part this is still effectively, “make a copy” and that “copy” usually goes to tape.

5) Introduce VTL and data deduplication; provide users the most efficient means to move data to a disk device (by making it look like tape) and give them the ability to store it, and subsequently move it, more efficiently, driving the costs of disk and tape much closer together and helping to reduce the reliance on tape all together.

Still here’s what users think about backup today:

Question: “What are the biggest problems with your current backup and recovery solutions (% of all users, multiple responses accepted) (Forrester Conslting on behalf of HP, December 2008)

1. Need to improve RPO / RTO (64%)

2. Need to improve recovery success rates (63%)

3. Need to better protect virtual servers (58%)

4. Need to manage data proliferation (57%)

5. Need to consolidate remote office backups (47%)

So, backup is indeed a little like the airline industry – it still takes a long time, and no one likes the service. But, is there a light on the horizon? I think so…and it’s embodied in EMC’s Data Protection Strategy. I think three key tenets of the strategy;

1. Backup as little data as possible

2. Use disk to store the backup data

3. Enable customers to use backup for other purposes

Let’s talk a bit about these concepts and the products and technologies that enable them.

How to backup as little data as possible?  A) Actively and continuously archive stale data; and B) use deduplication to minimize the bits and bytes that are required to represent information.

Use disk to store the backup data? Using disk is all about cost, since the benefits of disk vs. tape are pretty obvious e.g. random access, speed of recovery, reliability, to name a few. Active archive and deduplication, coupled with the continued march to bigger and cheaper disk drives enables the use of disk today. Cheaper “bulk storage” will only improve this in the future. EMC has a broad portfolio of archiving software for key applications (e.g. email, file servers, SAP), and the leading platform for storing it (Centera). Deduplication techniques are increasing embodied in our backup and archive as appropriate, e.g. at the object level (Centera); file/attachment level (EX); and the sub-file level (Avamar). In the future, you’ll see a unified deduplication “service” that will bring some of these techniques together and will be embedded across EMC’s product lines (Celerra). Someday, deduplication may be as ubiquitous as RAID

Enable customers to use backup for other purpose? Wouldn’t it be great to periodically replicate your backup data to another site to use as a cheap and easy recovery site? How about doing eDiscovery for compliance purposes on your backup data? These are great ways to leverage backup data for additional purposes that we are working towards. We’re not there yet, and by-the-way, while this sounds simple, it is not. File this under “the vision thing.”

Evolutionary or Revolutionary. We’re hearing from customers today who want to re-vamp their entire backup strategy and start from scratch. Others want to attack “hot-spots” and evolve to a new approach. We can do it either way. Our EMC Disk Libraries are a very effective way to get disk based backup benefits while fitting into customers existing backup paradigms. EMC Avamar gives you disk based backup, enabled by state-of-the art de-dupe, which replaces traditional backup software (though Avamar will co-exist and complement traditional backup too).  NetWorker has embraced B2D as well as deduplication.  It also has EMC Disk Library integration.

Let’s not forget data protection management.  Today we have the Data Protection Advisor product, which effectively gives us a Dashboard that provides backup monitoring, reporting and analytics across most of our data protection products – NetWorker, EMC Disk Libraries, Avamar, plus popular backup products from Veritas, CommVault and IBM. Look for us to add Centera and other products in the future.

So, we’re at an inflection point. Backup is indeed changing and EMC is leading the way. The strategy is solid, the customer need is clear, and we have most of the pieces today. It’s ours to win.

For the air travel industry, I’m afraid I am not so optimistic. Here’s a story that sums it all up… On a trans-continental flight a passenger was sitting way back in economy. As it was a long flight, the flight attendants came by with the meal carts. The passenger was asked if he would like a meal. Being hungry he said, “Sure, what are my choices?” The flight attendant answered, “Your choices are ‘yes’ or ‘no.’”

Until next time, _Mark

Post to Twitter Tweet This Post

Scridb filter

Process vs. Technology

May 1st, 2009 Steve Kenniston 1 comment

The hardest thing to change inside IT is not technology, it is process!  I say this because all too often there are technologies available that provide a far superior solution to a complex IT problem, however, this new technology may not fit into your existing business process.  Need proof?  Let’s take data protection as an example.  Did you know that VTLs (virtual tape libraries) and data deduplication technologies came out at the exact same point in history, 10 years ago?  Which technology had faster market adoption?  VTLs of course because implementing them didn’t cause a major disruption in processes.

Let’s take a look at a simple backup environment.  We won’t worry about archiving or compliance for the moment, just operational backup and recovery.  Today’s backup has a number of complexities.  There are some data sets that have weekly full backups and daily incremental backups.  There are some data sets that sit under applications that, for faster recovery capabilities and simplicity, require daily full backups.  Once the backups are done, in order to ensure true data protection reliability, a process of checking the backup logs to ensure every system was successfully protected begins.  Next, backup tapes are either created (if it is a disk based backup) or tapes are taken from the library and moved to a transportable box, hopefully a secure box.  Finally, a third party vendor comes to pick up the tapes and take them off site for safe-keeping.  Additionally, if the data is backed up using encryption, then the encryption keys are also kept off site for security purposes.

 Customers face these standard backup challenges:

1) Backups take too long and cannot meet backup windows as a result of too much data.

2) Backups fail due to poorly configured (networked) backup environments.

3) Backups at remote offices are ‘unreliable’. (Don’t follow best practices set in the data center.)

a. No one with the appropriate skill set is available to monitor these backups.

b. No one with the appropriate skill set is available to troubleshoot these backups.

c. No one with the appropriate skill set is available to perform data recovery.

4) New applications / processes cause additional challenges; does this application need incremental backups, full backups, what is the RPO / RTO???

5) Managing backup tapes is too difficult and costly.

However, the reality is that in this particular IT shop, no one has ever been fired for data loss. Each time there is a recovery request, data is recovered.  It may not be the absolute most recent data, or it may take 48 hours to recover, but eventually, the data is recovered. The question is, has everyone’s business objectives been met? Chances are the answer is “no” but when the issue of what it would cost to meet everyones’ needs comes up, there is usually no money in the budget for ‘backup’ and it’s right back to the same old way of doing things. Backup is not really strategic to a business (unless of course you’re in the business of providing backup solutions to customers) but it is more of an insurance policy. There is no doubt you need it, but you want it for the lowest possible price, hope you never have to call on it, and when you do, you better get good service.

Maybe that is why EMC is now the GEICO of data protection.

 That aside, when there is money in the budget, it usually comes in small doses so backup administrators have to make the biggest impact in the ‘easiest’ way possible. This means, implement something that allows them to meet most of their challenges and doesn’t:

1) Change process because they already have run books established for data recovery and because everyone is already trained on the existing technology.

2) Change configuration because they have already invested a great deal of time and money to sort out their issues with the existing products.

3) Cost a lot of money

That usually means, augmenting the existing backup software technology with something that allows them to gain some efficiencies on the backend because they already have significant investments in their backup software. This was one of the main reasons for the success of VTL (virtual tape libraries). It is way easier to unplug the slow, serial tape library and replace it with fast, parallel disk. The backup administrator gets all the advantages of disk and doesn’t have to change a single process, except for maybe adding a step of cloning the data from the disk that looks exactly like tape, to an actual tape in order to offsite the data. Additionally, this is why companies with target deduplication devices became so popular so quickly. When VTL was having challenges solving backup data capacity issues, deduplication became the next popular thing.  The big issue was plugging into the existing infrastructure without disruption.  If I have to change too much about my process, I can’t ‘afford’ to make it work.

The trouble is backup administrators are at an inflection point. They can no longer continue to use the same old technology at the front of the backup process and meet the needs of the business. We are at a time when new technologies such as source based deduplication technologies can really have a significant impact on a number of the backup challenges. The problem is that it goes against the grain of why IT doesn’t want to change technology, because it forces a change to the process. For example, out come the traditional backup agents and new ones are put into place. Since data no longer is stored in tape format, new processes must be utilized for getting tape offsite. When backup administrators hear this, they tend to shy away from it. It costs money and it changes processes right when they had all the original processes figure out.  It is only now that source based deduplication solutions have gained significant momentum as it is really solving a number of the key data protection challenges for more than 70% of the data in most data centers.

  • Remote offices can now experience the same set of data protection best practices that are used in the data center. (Keeping in mind, IT is accountable for 100% of the data created in the corporate, local or remote.  This is good piece of mind.)VMware environments tend to ruin a TCO when using traditional backup applications. Leveraging source based deduplication can bring up your TCO and ROI.

This is not to say that source based deduplication is the savior of the backup world. It is not. There are places where source based deduplication technologies are not the best fit. Very large environments with very high change rates and little duplicate data don’t tend to be good fits. However, if you attack the places that are a good fit for source based deduplication, you will create relief in your backup environment at the target and that will be good for everyone.  It is time to take backup, beyond.

Posted by Steve Kenniston

Post to Twitter Tweet This Post

Scridb filter

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.