Archive

Archive for March, 2009

Road to ‘Data’ Recovery – 12 Steps

March 25th, 2009 Steve Kenniston 2 comments

Hi, my name is Steve and I have a recovery problem.  Well, a data recovery problem that is.  So, I think it is about time that I apply the ‘12 steps’ to help me with my data recovery problem.

Step 1 – It is time that I admit that I am powerless over my backup environment and my data protection world is unmanageable.

Step 2 – I have come to believe that there is a Technology greater that I that can help me restore (my sanity).

Step 3 – I have made a decision to put our company’s data and the process of recovery into the hands of a true data protection specialist.

Step 4 – I have helped to create a classified inventory of our company’s data.

Step 5 – I will admit to our CEO that I have failed at 63% of my recovery attempts costing the business $MMs.

Step 6 – I am prepared to have the new data protection administrator remove all of my defective technologies.

Step 7 – I will humbly ask ‘him’ to remove all of my failed processes.

Step 8 – I must make a list of all the people I have been unable to recover data for and be willing to try to restore their lost information.

Step 9 – I must make amends to all the people I have been unable to recover data for.

Step 10 – I must continue to take an inventory of all the tapes we have and promptly convert them to a newer technology to enable faster recovery.

Step 11 – I will seek out best of bread technology, parnters and vendors to improve our company’s capabilities for daily operational recovery.

Step 12 – Having had this spiritual awakening as a result of these steps, I will carry this message to all IT administrators who are challenged with data recovery issues.

I believe that by following these 12 steps, I will have put our company back on… the Road to ‘Data’ Recovery.

Posted by Steve Kenniston

Post to Twitter Tweet This Post

Scridb filter

Lean Six Sigma Your Backups

March 25th, 2009 Steve Kenniston No comments

Last week I took a course offered by EMC entitled ‘Lean Six Sigma’ – Yellow Belt. This is a training course that is used to help ‘solve problems’ in a given process, typically work related. When I think about where the biggest problem is in IT its in the Backup arena so I thought, what a better place to test it.

backup4

Enterprise Strategy Group 2008

There are two components to Lean Six Sigma. Lean or Leaning a process is about removing excess from a process to make it more efficient. For backup, moving as much data out of the backup stream as possible would increase backup efficiency.  Deleting unnecessary data or archiving static data in the production storage can cut down on as much as 50% of the data in the backup, ‘leaning’ the process.

Next, when looking at Six Sigma, we learned about the DMAIC process. That is:

  • Define – Business case, scope, problem statement, goals
  • Measure – Process flow, run charts, Pareto charts
  • Analyze – Cause / Effect, waste identification
  • Improve – Waste removal, improve plan, control charts
  • Control – Monitor to prevent repeat failure, control charts, control plan

First, as I was thinking about this, I kept coming to the measure phase. If you don’t currently measure your backup process, unless of course only when there is a recovery failure, then perhaps its time to invest in a tool to help measure the current process.  This measurement will allow you to identify current problems, serving as a benchmark against wich you can measure the success of your ‘leaning’.  So, if we apply the steps in the DMAIC process to your typical backup environment, here is what it may look like.

Define

In the first step or the define step the objective is to describe the business case and problem statement and set SMART (Specific, Measureable, Attainable, Realistic and Timely) goals. Again, the key will be in the measurements but typically with backup you want to measure recovery success, which ends up being a result, most of the time, of backup success. The objective is to take a look at your existing recovery success rate, and hence your backup success rate and identify what you would like the percentage of successful backups and recoveries to be. I would guess in most cases shooting for 100% would be the requirement, but perhaps 99% is fine. So the problem statement would be: data recoveries fail more than 54% of the time and this data loss contributes to employee frustration and can translate into significant risk for the company during a legal disclosure process. The reasons these recoveries fail are due to a flawed, multi-step process that needs to be examined and fixed in order to yield a 99% success rate when it comes to recoveries. This process specifically affects backup administrators on a daily basis. When the process is fixed and recoveries yield a 99% success rate, the customer benefits, the end users, executives and customers satisfaction will keep the company’s corporate costs low and drive  repeat business.  Additionally, in the define phase, it may be good to create an IPO diagram. IPO stands for; Input, process, output.

Measure

ipo1

In the Measure phase you will want to make sure you have metrics that can identify the following; recovery success rates, backup success rates, dollars lost due to failed data recovery, customer complaints due to failed data recovery, and the costs to backup and recover data for the environment. These would be the key metrics to understand and to fix the problems that are uncovered. It will be important to establish a baseline to improve upon, and this is where having tools in place (such as DPA) can help tremendously throughout this process. This will also be a good place to ‘map out’ the current process flow and make sure to identify what is in scope and what is not in scope in order to avoid ‘scope creep’. To review the process flow it may be useful to create a process flow chart using a whiteboard and post it notes. Brainstorm all of the steps and put them on the whiteboard in random placement. Organize in time sequence and then fill in the missing steps and review for completeness. This will be helpful for the analyze phase.

Analyze

Next comes the Analyze phase. This is one of the best places to start to identify ‘waste’ in the process and see how the process can be ‘leaned’. (It is also, particularity for the backup process, a good place to see where the data can be ‘leaned’.) Identify waste and poor performing areas of the process for both the backup and recoveryflow. Brainstorm as to where the process breaks down and what pieces may fail. It will be important to take a look at the overall daily trends as well as the weekly trends to see if there are any anomalies in the process. Typically backups are daily incremental and weekly fulls, so you want to make sure that there are no flaws in either process in order to achieve 99% data recoverability. It may be useful to develop a ‘Cause & Effect Diagram’, such as the one shown below, to find all of the problems.

Improve

candediag1Now comes the Improve phase. Utilize the whiteboard and your Post Its again to review a new process flow. Don’t let the existing tools limit where you mind may go. Think out of the box. If part of the problem is to recover data 99% of the time ‘company wide’ and that includes remote offices, there may be a reason to use other tools at these offices in order to meet the objectives. Build out a ‘mistake proof’ process. Don’t worry, you can analyze the costs afterwards, but identify the ‘best case scenario’. Be sure to document the new process.  The next step is to implement some of your changes and see how your new process is working out. It will be very important to utilize the same tools and  measure the new process against the old . You will want to utilize the same charts as before with the new data to show improvement. Based on your results you still may want to take a look at refining the process a bit more now that it is in action and new, unexpected issues pop up as a result of the new process.

Control

Finally, you will want to make sure you control the new process. It will be important to use the same tools to continually monitor and manage the process and to make sure you stay within the new specifications of 99% recovery. If there is ever a situation where you fall outside of the range, you will need to review the process again, identify where the process broke down, fix it and go through the whole DMAIC process again. There is no sense going through all of the prior work to not manage it afterwards to make sure the process stays in compliance.

Keep in mind, requirements may change and other outside factors such as data growth can, and will, have an impact on this process and may force you to re-look at the process or the tools used to manage the process. By continuing to measure and control the process, you will see when you start to fall outside of the critical success criteria and need to make adjustments, but it will also allow you to operate at a much higher recovery level than you have in the past.  Trust me, follow the lean Six Sigma process, it will help you take backup, beyond and put you on the “Road to Recovery”.

Posted by Steve Kenniston

Post to Twitter Tweet This Post

Scridb filter

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.