By Zeb Ahmed, CLOUD, XaaS & BCDR Leader – IBM:
While the importance of choosing the right disaster recovery solution and cloud provider cannot be understated, having a disaster recovery runbook is equally important (if not more). I have been involved in multiple conversations where the customer’s primary focus was the implementation of the best-suited disaster recovery technology, but conversation regarding DR runbook was either missing completely or lacked key pieces of information. Today, my focus will be to lay out a frame work for what your DR runbook should look like.
“Eighty percent of businesses affected by a major incident either never re-open or close within 18 months.” (Source: Axa Report)
What is a disaster recovery runbook?
A disaster recovery runbook is a working document that outlines a recovery plan with all the necessary information required for execution of this plan. This document is unique to every organization and can include processes, technical details, personnel information, and other key pieces of information that may not be readily available during a disaster situation.
What should I include in this document?
As previously stated, a runbook is unique to every organization depending on the industry and internal processes, but there is standard information that applies to all organizations and should be included in every runbook. Below is a list of the most important information:
- Version control and change history of the document.
- Contacts with titles, phone numbers, email addresses, and job responsibilities.
- Service provider and vendor list with point of contact, phone numbers, and email addresses.
- Access Control List: application/system access and physical access to offices/data centers.
- Updated organization chart.
- Use case scenarios based on DR testing, i.e., what to do in the event of X, and the chain of events that must take place for recovery.
- Alert and custom notifications/emails that need to be sent for a failure or DR event.
- Escalation procedures.
- Technical details and explanation of the disaster recovery solution (network layouts, traffic flows, systems and application inventory, backup configurations, etc.).
- Application-based personnel roles and responsibilities.
- How to revert back and failover/failback procedures.
How do you manage and execute the runbook?
Processes, applications, systems, and employees can all change on a daily basis. It is essential to update this information in the DR runbook on a regular basis to ensure the accuracy of the document.
All relevant employees should receive DR training and should be well informed of their roles and responsibilities in a DR event. They should be asked to take ownership of certain tasks, which should be well documented in the runbook.
In short, we all hope to avoid a disaster. But when it happens, we must be prepared to tackle it. I hope the information above will be helpful in taking the first step towards preparing a DR runbook. Please feel free to contact me via email for additional information or guidance.
Zeb Ahmed is a Senior Manager Product Management for IBM with responsibility for overseeing and managing the Backup and Disaster Recovery portfolio and partner eceosystem for IBM Cloud.