Operational Resilience: Is it Just Business Continuity Done Properly?

By Charlie Maclean-Bristol:

This week a couple of things have come together to inspire this bulletin. I have been working on an operational resilience exercise for a client, which is based around taking a “severe but plausible scenario” and then checking whether the scenario breaches the organization’s impact tolerances. The other event is that Castellan have launched their new single business continuity software package, and so I have been listening to a number of webinars which talk about the software and operational resilience. One of the phrases repeated on the webinars by Brian Zawada, the COO of Castellan, was that operational resilience was “just business continuity done properly.” I thought in today’s bulletin I would explore this statement and come to my own conclusion as to whether his statement was right or wrong.

Resilience, operational resilience and organizational resilience mean lots of different things to different people and there is no definitive definition as to what they are, their scope and what subjects are contained within them. Many different people in books, ISO standards and papers have tried to quantify them. I am going to explore the definition and requirements put forward by the Bank of England’s Discussion Paper of 2018: “Building the UK financial sector’s operational resilience.”

In addition, the Financial Conduct Authority (FCA) Consultation paper of 2019 states: “Building operational resilience: impact tolerances for important business services and feedback to DP 18/04.”

In these two papers, the Bank of England and the FCA lay out how they want financial institutes to carry out a series of operational resilience activities to make the whole of the UK’s financial market more resilient, and that an incident in one organization could have a major impact on other companies, leading to financial collapse.

The first thing I noted when reading the two papers is that what the financial regulators are trying to achieve by implementing operational resilience and what we are trying to achieve in business continuity are very different. In business continuity, our focus is on our own organization and its survival and that is the primary driver. In implementing business continuity, we want to make sure that the Maximum Tolerable Period of Disruption (MTPD) identified in our BIA does not occur, causing irretrievable damage to our organization.

With operational resilience, the focus is on the experience of the customer, which they want to be maintained, and making sure their requirements are met, and then on the stability of the market as a whole. Of course, in business continuity we know that if we do not look after our customer and provide them with the products and services we are contracted to, our organization will fail. For many organizations, the customer and organization are very closely entwined, so it is difficult to find another supplier without lots of time and effort. Therefore, within our business continuity strategies, we can inflict some of the pain associated with the incident on our customers, by them knowing that it is difficult to find another supplier. The Bank of England and the FCA in pushing operational resilience have a different aim than an organization implementing and maintaining business continuity.

There are several tasks laid out in the paper which they want regulated organizations to carry out. One of the first of these is to identify “important business services,” these are the key services delivered to customers. You should also identify the resources which underpin them. Once this is done, you should look for single points of failure and vulnerabilities, which if improved would add to the organization’s overall resilience. This is the classic “analysis” phase of the business continuity lifecycle, and so in this case Brian is right, it is “business continuity done properly.”

When I first came across business continuity, dare I say 20-25 years ago, the focus was on scenarios which could impact our organization. Generally accepted practice was that you had to write a plan for every different scenario that could possibly occur. LDRPS, the business continuity software from SunGard, popular at the time, was a classic example. They encouraged software users to write plans for a fire, a flood, an accident on the nearest road, a tornado, a rail crash, a nuclear meltdown, the list went on and on. You still on occasion see this type of plan and when you read them, they almost all have the same content, just with a different title. Today business continuity plans work in the premise that it doesn’t matter why your head office is unavailable, whether it is through a flood, fire or pandemic, we write a plan of how the activities within the office will be recovered. Operational resilience takes what we are planning for, back full circle. The regulators want organizations to plan for “severe, but plausible” scenarios and look at the impact of these scenarios on our organisation. They then want to know whether our defined “impact tolerances” can cope with them.

Implementing business continuity, means that many organizations then feel they have “done it” and so now they won’t have a disruption. Operational resilience comes from the angle that disruptions are inevitable, and the role of operational resilience is to ensure that the impact of the disruptions does not have major impact on the customer or market. In this case, I think that both business continuity and operational resilience are trying to achieve the same end, their way of achieving it is slightly different.

In the Bank of England’s Discussion Paper there is a requirement to set the organization’s “impact tolerances” and then to test them against a number of different scenarios to affirm whether the tolerances would be breached. They are also there to help identify gaps, which if filled, would improve the organization’s resilience. In defining impact tolerances, we can see that they are similar to defining an organizations MTPDs, but there are a number of key differences. MTPDs are usually defined in terms of their impact on the organization as a whole, rather than the impact on customers. They are also defined as a time bracket as to when an ‘unacceptable impact’ would occur. When defining impact tolerances, it must be done in terms of time but also of level of service to customers. This makes defining them more complex, requires more analysis and more nuance.

Carrying out scenario testing, defining important services, identify vulnerabilities in our organization and identifying a number of “severe but plausible scenarios” are all very close to business continuity, as defined in the Good Practice Guidelines (2018) and in ISO 22301, so in these areas Brian is right. A different way of thinking and analyzing tools needs to be adopted for defining impact tolerances and then testing them against the plausible scenarios. This requires new skills and learning or developing methodologies and is outside the remit of existing business continuity practices, and so in these instances I think Brian is wrong. Finally, we have to remember that in implementing operational resilience, the goal is not protecting the organization, as it is in business continuity, it is about protecting customers and the market as a whole.

About the Author: Charlie Maclean-Bristol is the author of the new book, Business Continuity Exercises: Quick Exercises to Validate Your Plan. Learn more here via Rothstein Publishing and download free Business Continuity Exercises.