Archive

Posts Tagged ‘Process’

No More Tiers / Tears

July 16th, 2009 Steve Kenniston 2 comments

The great thing about blogging and independence is that we can post things that add value that we want to share as long as we give the proper recognition.  One of my colleagues, Mike Dutch from the CTO office of SSG and long time SNIA member had some thoughts as it pertained to storage tiering that were insightful  so together we decided to share this post.  I hope you enjoy it.

I’m guessing that many people define a storage tier by its particular storage technology (like SATA). While this may be a useful working definition it obscures the essential notion of what a storage tier really is and leads to confusion when a new technology like data deduplication comes around.  A precise definition may also lead to some interesting innovations if we were to take a slightly different path.

Should deduplicated storage be considered a storage tier?  I would say “no” and here’s why: because a technology such as deduplication can span, and optimize across all tiers.

A storage tier is storage space that has availability, performance, and cost characteristics different enough from other storage tiers as to economically justify the movement of data between it and other storage tiers based on the importance (value, performance need etc…) of the data. While storage tiers are often thought of as being tied to a particular type of hardware,

e.g.,  Flash, FC, SAS, SATA, VTL, PTL, COM (Computer Output Microfiche), or even paper, this is not necessarily the case. For example, highly available cloud or network-based virtual disks could leverage multiple technologies within their single tier.  Since a variety of technologies can be used to provide a particular storage service level, you should not think of a specific technology as a specific storage tier, but should instead evaluate what technology, or combination of technologies would deliver the availability-performance-cost point that I need for this level tier.  “SATA” is not a storage tier, it just happens to be one “technology-set” that can deliver for a single storage tier.

Note that storage tiers are not defined by their capacity, per se, but there is usually less capacity of more expensive tiers precisely because it is more expensive. Deduplication is “simply” a method to save and access data on a storage medium which is why capacity optimization techniques are best considered features of storage platforms rather than standalone products. (Of course deduplication can also be used as part of a WAN optimization solution but here we’re talking about deduplication in relation to storage tiers, and dedupe engines without storage aren’t very interesting storage tiers).

In other words, deduplication lets you lower the cost/GB associated with a particular storage tier, but it isn’t a storage tier in and of itself.  The same rationale applies on why other space efficient storage technologies (e.g., compression) are not tiers unto themselves.  It’s the mixing and matching of both old and new technologies to create a new “availability -performance -cost” point, that makes up a new storage tier.

So who cares what a storage tier is anyway?  On one hand, as long as you can help your customer affordably satisfy their business requirements it doesn’t matter.  But at another level, it profoundly matters.  If you don’t have the knowledge to think about a subject precisely, you may not only be unable to solve problems related to the subject.  Even more, you may not even be able to recognize there is a problem.  Having the right knowledge lets us understand our challenges and more importantly find alternative solutions to them.  After all, isn’t storage tiering really about helping to deliver on a “no more tears” promise?

The efficiencies that data deduplication and storage tiering bring to data protection enable businesses to reduce risks as well as costs.  Information that was previously protected on an adhoc basis, if at all, can now affordably be brought into the ILM umbrella as a full fledged corporate citizen.  The Storage Networking Industry Association defines Information Lifecycle Management (ILM) as “The policies, processess, practices, services, and tools used to align the business value of information with the most appropriate and cost-effective infrastructure from the time information is created through its final disposition.”  Data deduplication and storage tiering are two arrows in the ILM quiver that can be used pervasively within the enterprise to score a bull’s eye in backup… and beyond.  Limiting our thoughts about how any technology can be used, whether it be data deduplication, Flash, or whatever the Next Big Thing is, simply limits the solutions we can find.

Should deduplicated storage be considered a storage tier?  No.

Should deduplicated storage be used as a storage tier?  Pervasively.

Thus endeth the sermon for the day.

Post to Twitter Tweet This Post

Scridb filter

Process vs. Technology

May 1st, 2009 Steve Kenniston 1 comment

The hardest thing to change inside IT is not technology, it is process!  I say this because all too often there are technologies available that provide a far superior solution to a complex IT problem, however, this new technology may not fit into your existing business process.  Need proof?  Let’s take data protection as an example.  Did you know that VTLs (virtual tape libraries) and data deduplication technologies came out at the exact same point in history, 10 years ago?  Which technology had faster market adoption?  VTLs of course because implementing them didn’t cause a major disruption in processes.

Let’s take a look at a simple backup environment.  We won’t worry about archiving or compliance for the moment, just operational backup and recovery.  Today’s backup has a number of complexities.  There are some data sets that have weekly full backups and daily incremental backups.  There are some data sets that sit under applications that, for faster recovery capabilities and simplicity, require daily full backups.  Once the backups are done, in order to ensure true data protection reliability, a process of checking the backup logs to ensure every system was successfully protected begins.  Next, backup tapes are either created (if it is a disk based backup) or tapes are taken from the library and moved to a transportable box, hopefully a secure box.  Finally, a third party vendor comes to pick up the tapes and take them off site for safe-keeping.  Additionally, if the data is backed up using encryption, then the encryption keys are also kept off site for security purposes.

 Customers face these standard backup challenges:

1) Backups take too long and cannot meet backup windows as a result of too much data.

2) Backups fail due to poorly configured (networked) backup environments.

3) Backups at remote offices are ‘unreliable’. (Don’t follow best practices set in the data center.)

a. No one with the appropriate skill set is available to monitor these backups.

b. No one with the appropriate skill set is available to troubleshoot these backups.

c. No one with the appropriate skill set is available to perform data recovery.

4) New applications / processes cause additional challenges; does this application need incremental backups, full backups, what is the RPO / RTO???

5) Managing backup tapes is too difficult and costly.

However, the reality is that in this particular IT shop, no one has ever been fired for data loss. Each time there is a recovery request, data is recovered.  It may not be the absolute most recent data, or it may take 48 hours to recover, but eventually, the data is recovered. The question is, has everyone’s business objectives been met? Chances are the answer is “no” but when the issue of what it would cost to meet everyones’ needs comes up, there is usually no money in the budget for ‘backup’ and it’s right back to the same old way of doing things. Backup is not really strategic to a business (unless of course you’re in the business of providing backup solutions to customers) but it is more of an insurance policy. There is no doubt you need it, but you want it for the lowest possible price, hope you never have to call on it, and when you do, you better get good service.

Maybe that is why EMC is now the GEICO of data protection.

 That aside, when there is money in the budget, it usually comes in small doses so backup administrators have to make the biggest impact in the ‘easiest’ way possible. This means, implement something that allows them to meet most of their challenges and doesn’t:

1) Change process because they already have run books established for data recovery and because everyone is already trained on the existing technology.

2) Change configuration because they have already invested a great deal of time and money to sort out their issues with the existing products.

3) Cost a lot of money

That usually means, augmenting the existing backup software technology with something that allows them to gain some efficiencies on the backend because they already have significant investments in their backup software. This was one of the main reasons for the success of VTL (virtual tape libraries). It is way easier to unplug the slow, serial tape library and replace it with fast, parallel disk. The backup administrator gets all the advantages of disk and doesn’t have to change a single process, except for maybe adding a step of cloning the data from the disk that looks exactly like tape, to an actual tape in order to offsite the data. Additionally, this is why companies with target deduplication devices became so popular so quickly. When VTL was having challenges solving backup data capacity issues, deduplication became the next popular thing.  The big issue was plugging into the existing infrastructure without disruption.  If I have to change too much about my process, I can’t ‘afford’ to make it work.

The trouble is backup administrators are at an inflection point. They can no longer continue to use the same old technology at the front of the backup process and meet the needs of the business. We are at a time when new technologies such as source based deduplication technologies can really have a significant impact on a number of the backup challenges. The problem is that it goes against the grain of why IT doesn’t want to change technology, because it forces a change to the process. For example, out come the traditional backup agents and new ones are put into place. Since data no longer is stored in tape format, new processes must be utilized for getting tape offsite. When backup administrators hear this, they tend to shy away from it. It costs money and it changes processes right when they had all the original processes figure out.  It is only now that source based deduplication solutions have gained significant momentum as it is really solving a number of the key data protection challenges for more than 70% of the data in most data centers.

  • Remote offices can now experience the same set of data protection best practices that are used in the data center. (Keeping in mind, IT is accountable for 100% of the data created in the corporate, local or remote.  This is good piece of mind.)VMware environments tend to ruin a TCO when using traditional backup applications. Leveraging source based deduplication can bring up your TCO and ROI.

This is not to say that source based deduplication is the savior of the backup world. It is not. There are places where source based deduplication technologies are not the best fit. Very large environments with very high change rates and little duplicate data don’t tend to be good fits. However, if you attack the places that are a good fit for source based deduplication, you will create relief in your backup environment at the target and that will be good for everyone.  It is time to take backup, beyond.

Posted by Steve Kenniston

Post to Twitter Tweet This Post

Scridb filter

Lean Six Sigma Your Backups

March 25th, 2009 Steve Kenniston No comments

Last week I took a course offered by EMC entitled ‘Lean Six Sigma’ – Yellow Belt. This is a training course that is used to help ‘solve problems’ in a given process, typically work related. When I think about where the biggest problem is in IT its in the Backup arena so I thought, what a better place to test it.

backup4

Enterprise Strategy Group 2008

There are two components to Lean Six Sigma. Lean or Leaning a process is about removing excess from a process to make it more efficient. For backup, moving as much data out of the backup stream as possible would increase backup efficiency.  Deleting unnecessary data or archiving static data in the production storage can cut down on as much as 50% of the data in the backup, ‘leaning’ the process.

Next, when looking at Six Sigma, we learned about the DMAIC process. That is:

  • Define – Business case, scope, problem statement, goals
  • Measure – Process flow, run charts, Pareto charts
  • Analyze – Cause / Effect, waste identification
  • Improve – Waste removal, improve plan, control charts
  • Control – Monitor to prevent repeat failure, control charts, control plan

First, as I was thinking about this, I kept coming to the measure phase. If you don’t currently measure your backup process, unless of course only when there is a recovery failure, then perhaps its time to invest in a tool to help measure the current process.  This measurement will allow you to identify current problems, serving as a benchmark against wich you can measure the success of your ‘leaning’.  So, if we apply the steps in the DMAIC process to your typical backup environment, here is what it may look like.

Define

In the first step or the define step the objective is to describe the business case and problem statement and set SMART (Specific, Measureable, Attainable, Realistic and Timely) goals. Again, the key will be in the measurements but typically with backup you want to measure recovery success, which ends up being a result, most of the time, of backup success. The objective is to take a look at your existing recovery success rate, and hence your backup success rate and identify what you would like the percentage of successful backups and recoveries to be. I would guess in most cases shooting for 100% would be the requirement, but perhaps 99% is fine. So the problem statement would be: data recoveries fail more than 54% of the time and this data loss contributes to employee frustration and can translate into significant risk for the company during a legal disclosure process. The reasons these recoveries fail are due to a flawed, multi-step process that needs to be examined and fixed in order to yield a 99% success rate when it comes to recoveries. This process specifically affects backup administrators on a daily basis. When the process is fixed and recoveries yield a 99% success rate, the customer benefits, the end users, executives and customers satisfaction will keep the company’s corporate costs low and drive  repeat business.  Additionally, in the define phase, it may be good to create an IPO diagram. IPO stands for; Input, process, output.

Measure

ipo1

In the Measure phase you will want to make sure you have metrics that can identify the following; recovery success rates, backup success rates, dollars lost due to failed data recovery, customer complaints due to failed data recovery, and the costs to backup and recover data for the environment. These would be the key metrics to understand and to fix the problems that are uncovered. It will be important to establish a baseline to improve upon, and this is where having tools in place (such as DPA) can help tremendously throughout this process. This will also be a good place to ‘map out’ the current process flow and make sure to identify what is in scope and what is not in scope in order to avoid ‘scope creep’. To review the process flow it may be useful to create a process flow chart using a whiteboard and post it notes. Brainstorm all of the steps and put them on the whiteboard in random placement. Organize in time sequence and then fill in the missing steps and review for completeness. This will be helpful for the analyze phase.

Analyze

Next comes the Analyze phase. This is one of the best places to start to identify ‘waste’ in the process and see how the process can be ‘leaned’. (It is also, particularity for the backup process, a good place to see where the data can be ‘leaned’.) Identify waste and poor performing areas of the process for both the backup and recoveryflow. Brainstorm as to where the process breaks down and what pieces may fail. It will be important to take a look at the overall daily trends as well as the weekly trends to see if there are any anomalies in the process. Typically backups are daily incremental and weekly fulls, so you want to make sure that there are no flaws in either process in order to achieve 99% data recoverability. It may be useful to develop a ‘Cause & Effect Diagram’, such as the one shown below, to find all of the problems.

Improve

candediag1Now comes the Improve phase. Utilize the whiteboard and your Post Its again to review a new process flow. Don’t let the existing tools limit where you mind may go. Think out of the box. If part of the problem is to recover data 99% of the time ‘company wide’ and that includes remote offices, there may be a reason to use other tools at these offices in order to meet the objectives. Build out a ‘mistake proof’ process. Don’t worry, you can analyze the costs afterwards, but identify the ‘best case scenario’. Be sure to document the new process.  The next step is to implement some of your changes and see how your new process is working out. It will be very important to utilize the same tools and  measure the new process against the old . You will want to utilize the same charts as before with the new data to show improvement. Based on your results you still may want to take a look at refining the process a bit more now that it is in action and new, unexpected issues pop up as a result of the new process.

Control

Finally, you will want to make sure you control the new process. It will be important to use the same tools to continually monitor and manage the process and to make sure you stay within the new specifications of 99% recovery. If there is ever a situation where you fall outside of the range, you will need to review the process again, identify where the process broke down, fix it and go through the whole DMAIC process again. There is no sense going through all of the prior work to not manage it afterwards to make sure the process stays in compliance.

Keep in mind, requirements may change and other outside factors such as data growth can, and will, have an impact on this process and may force you to re-look at the process or the tools used to manage the process. By continuing to measure and control the process, you will see when you start to fall outside of the critical success criteria and need to make adjustments, but it will also allow you to operate at a much higher recovery level than you have in the past.  Trust me, follow the lean Six Sigma process, it will help you take backup, beyond and put you on the “Road to Recovery”.

Posted by Steve Kenniston

Post to Twitter Tweet This Post

Scridb filter

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.