By Imad Mouline, Everbridge:
Many organizations are being asked to respond more quickly and more decisively to critical events, but with fewer resources. However, without an end-to-end process for dealing with critical events, it’s nearly impossible to satisfy this mandate. They are often using manual processes and disjointed systems. As a result, they are unable to efficiently and effectively manage these events.
In addition, security, operations and risk professionals are lacking the time needed to react or even avoid the negative consequences of these events. This paper explains why the traditional approach to managing critical events is outdated and offers a holistic approach to Critical Event Management (CEM) that enables a more unified, efficient, distributed, automated and collaborative process.
Why Critical Event Management (CEM) Matters
Disruptive safety and operational events occur every day: think active shooters, IT outages, supply chain disruptions, to name a few. In fact, these events are on the rise.
Twenty-one states in the United States saw active shooter incidents in the two-year period from 2016 to 2017, ten more than in the previous two-year period, according to a new FBI report. Per The Economist, the number of weather-related disasters worldwide has more than quadrupled to around 400 a year since 1970. Moreover, terrorism attacks and risks are projected to increase around the world.
Whatever their nature, in the simplest terms, events are considered critical when they impact one or more of the assets that matter to an organization (see figure 1).
Figure 1. Critical events impact an organization’s key assets
Remember: A critical event doesn’t necessarily equate to a major breakdown. For some businesses – such as financial services firms and retailers – a website that performs milliseconds slower is a critical event. While each organization will define critical events differently, the aim is to minimize or even mitigate the impact.
Unfortunately, many organizations struggle to achieve this goal. They are being asked to respond more quickly and more decisively, but with fewer resources. However, without an end-to-end process for dealing with critical events, it’s nearly impossible to satisfy this mandate.
As a result, security, operations and risk professionals gain more time to react or even avoid the negative consequences of these events. Critical Event Management (CEM) takes it a step further by enabling a unified, efficient, distributed, automated and collaborative process for managing critical events.
Unfortunately, the old way does not always work
In most cases, organizations are trying to deal with critical events using manual processes and disjointed systems. As a result, they are unable to efficiently and effectively manage these events.
In an environment characterized by disjointed processes and information, organizations are often grappling with too much data, making it extremely challenging to arrive at a basic understanding about an event. Complicating matters is that few organizations excel at keeping track of their people and assets in transit. Think workers and equipment on the road and in the field.
Figure 3. Disjointed processes lead to a slow, reactive response
Combined, these make resolution slow, unclear and reactive, actually increasing risks. Employees might be in harm’s way and operations could be disrupted. In turn, customers lose confidence in the organization, threatening brand value and revenues. Just as important, a slow, reactive response increases the costs posed by these events. Consider that just the cost of IT downtime averages $8,900/minute. No wonder businesses worldwide suffered $535 billion in losses in 2016 due to critical events.
By adopting a proven, 9-step approach, organizations can improve their response to critical events.
Step 1: Devise a plan
It seems simple enough to develop a CEM plan, but the plan must be comprehensive enough in order to be effective. It starts with a general plan and then expands to cover various types of crises differing in scope and mapped to appropriate resources and response. According to recent research from the Business Continuity Institute, 86% of organizations have emergency plans. The graph below highlights the most frequently activated plans from that same research report.
“Have any of the following events triggered the activation of your emergency communications plans?” (N=436, answers expressed as percentage and multiple responses allowed)
Given the business and the types of threats it may face, organizations should:
- Appropriately categorize critical events by type, predictability, cause and scope, while differentiating between routine emergencies and crisis events.
- Determine how the organization will deal with each event, and who will take the lead.
The plan should include severity levels that dictate the composition of the relevant response teams so it can be activated as quickly as possible.
If the organization has crisis management plans in place, ensure it is operational in nature based on predictability. For example, if the organization operates in a hurricane-prone region, develop a hurricane plan, including one to deal with office closings. If the organization operates in multiple locations, ensure the plans are standardized. In today’s mobile world having digitized plans readily accessible by mobile device is very desirable when responding to critical events while out of the office.
does not mean “easy.” On the contrary, a routine emergency can be very difficult and challenging. In this context, “routine” refers to the relative predictability of the situation that permits advanced preparation. The risk presented by the situation was included in your risk profile and you probably have created appropriate plans, developed relevant training, and completed exercises for routine emergencies. In short, your business and technology continuity and disaster recovery plans are filled with strategies to manage them.
In contrast, a “crisis” emergency is a much different animal. These events are distinguished by significant elements of novelty. This novelty makes the problem much more difficult to diagnose and then deal with. This type of emergency often has one or more of the following characteristics:
1. The threats have never been encountered before, which means there are no existing plans to manage it.
2. The situation may be a familiar event, however, it is developing at unprecedented speed; therefore, developing and
executing an appropriate response (including notifications and ongoing coordination) is severely challenging.
3. The incident may represent a confluence of forces, which, while not new individually, in combination, pose unique
challenges to the response.
The novel nature of a crisis emergency becomes a game-changer. Plans, processes, training, and exercises that may work well in routine emergency situations are frequently grossly inadequate in a crisis and may even be counterproductive. When this type of incident occurs, we realize that we have to start from scratch. The crisis emergency also requires different response capabilities; in other words, the plans and behaviors used for routine emergencies just won’t work. Even the most experienced responders can make the mistake of assuming they understand the nature of a problem based on their initial observations. Handling a crisis emergency may feel like you’re building an airplane while flying it at the same time. It’s not pretty, but it may be necessary. Lastly, in a crisis emergency, responses must be creative and, at the same time, be extremely adaptable as new and improvised solutions are being executed.”
Excerpted from “From Routine to Crisis”, Regina Phelps, CEM, RN, BSN, MPA and Kelly David Williams, http://go.everbridge.com/ITAlerting-ReginaPhelps-Sept292015EMEA.html]
Step 2: Build partnerships with leadership
Critical events can impact different areas of the business, and often impact more than one. This is why more companies are changing their organizational structure to enable a consolidated approach toward handling major incidents. Sometimes it starts with an overlay team that deals with major incidents. The ideal is to build a fusion center – a collaborative effort of two or more agencies that provide resources, expertise, and/or information to the center with the goal of maximizing the ability to detect, prevent, apprehend, and respond to critical events, regardless of scope.
However, if that’s not possible, the best practice is to build alliances across the Chief Security Officer (CSO), Chief Information Security Officer (CISO), and Chief Information Officer (CIO) at the very least. Combining the experience, insights, and intelligence from across the organization makes it possible to more quickly understand the root cause of an event so it can respond rapidly and ensure business continuity.
Step 3. Assess your risks and sources of information
With a plan and partnerships in place, it’s time to assess how well the organization can navigate critical events. One of the biggest issues is not knowing when a threat develops and then not being able to confidently vet what happened.
As people are assessing events and risks, they typically call upon a range of information from a variety of sources that might be lacking details or even contradictory. The goal is to confirm the threat event and ensure the appropriate team has all needed input and contextual feeds in one place to make the appropriate decisions. That means lining up trusted information sources and all risks.
This undertaking can get complex, especially in larger organizations. Start by understanding the event in the context of the five key assets: people, buildings, IT systems, supply chain, and brand/reputation. In some cases, organizations might even associate a particular value to these assets in order to better determine risk.
Step 4. Identify critical assets and functions
During every event, it’s essential to know where employees, travelers, visitors, offices, manufacturing facilities and other critical assets are located. It’s also critical to know how they are interconnected and the dependencies between them. Ideally, organizations can visualize this at a glance.
Common examples of business assets include:
- Building, Branches and Retail Stores
- Product Inventory
- Supply Chain
- Machinery & Specialty Equipment
- IT Assets
Beyond knowing the location and interdependencies, organizations need an idea of how much it will cost if these are impacted by an event. For instance, perhaps a critical business application goes down resulting in a thousands of dollars’ worth of losses every minute. Look at it based on the overall use case, such as how many employees are going to be impacted.
Organizations must be careful not to overlook assets such as their overall brand. A firestorm of Tweets can cause far more damage than a physical attack on the company or its infrastructure.
Step 5. Quantify and prioritize your risk
The next step is to figure out what is critical and what isn’t. Answer the big question: What is the impact, what is the exposure?
An effective approach is to differentiate between threats and risks across the board, and to then quantify risk based on:
- The threat
- The threat’s nature
- The organization’s overall vulnerability or exposure across the board
- The overall impact, which may go beyond the immediate assets, people and elements that are in harm’s way
Unfortunately, it’s not a simple equation because organizations must factor in a few more variables. Consider the overall timeline, which is often dynamic. For instance, it’s not sufficient to say, “How many employees are in HQ right now?” since employees are constantly on the move. Or perhaps a geopolitical issue or event is going to cause a disruption to the supply chain, but the organization won’t feel the impact for two months.
While it’s critical to quantify risk, keep in mind that the impact from a single event can differ across the company and can impact different assets in different ways. For instance, a labor strike in Paris is not a critical event for local employees who know how to deal with it, but it is for expats and traveling employees who aren’t accustomed to this.
In other words, context matters, and risk will change based on that. The key is to understand risk based on all variables to determine the best response to any event.
Step 6. Identify and locate all stakeholders
Quickly locating, communicating with and assisting employees in a crisis is a priority. To that end, typically in any type of critical event, organizations will be dealing with three groups of stakeholders:
- The people who can do something about the event. These people can put context around the situation and can help assess the threat to determine who’s impacted. They might be called responders or resolvers. In larger organizations, this might be an incident response team. When creating a list of responders, organizations should take into consideration schedules, rotations and locations.
- Those impacted. In addition to identifying impacted people, organizations must know where they are located so they can be quickly notified. Automating communication can save even more time.
- Those needing to know about the event. Who needs to know about the event? Should the CEO be woken at 2 am? Should the Governor or other high-ranking officials be involved? Can the event be handled regionally? Determining this ahead of time is key to reducing the impact of the event.
To avoid alert fatigue or “the boy who cried wolf” syndrome, only inform those who need to know. At the same time, make sure people aren’t bombarded with updates. If possible, let the appropriate people see all necessary information in one place. To that end, set up profiles indicating who can do what based on skill sets and experience, and who is involved under what circumstances. Be sure to include a secondary protocol, including ‘identifiables’ for people who may be limited in dealing with a crisis – such as those in wheelchairs.
Step 7. Visualize with a common operating picture
To minimize confusion and accelerate an effective response, it’s necessary for everyone to share and operate from the same set of information about the situation: How many people, supply routes, buildings are impacted by the situation right now, and how many people are headed there? It’s also important that people are viewing the right information to make the right decisions and not wasting time trying to keep everyone up to speed. Along those lines, here are three best practices:
- Know when to engage the appropriate levels of people as well as the appropriate functions.
- Launch the appropriate protocols.
- Be prepared to deal with more groups and workflows for more complex situations.
Step 8. Automate workflows
Only organizations mature their processes the opportunity exists to automate many of the previous steps to prevent errors in workflows and handle response more quickly. As a result, they are able to execute their CEM plans by feeding minimal information based on an assessment of the situation into a CEM system. Some take it a step further by adding elements like checklists. Even checklists need to be dynamic in nature, but they help ensure nothing is overlooked as events are occurring.
A good place to start with workflow automation are with the most frequent activities a team addresses. Consider the list below as a starting point:
- EOC Activation Notices
- Employee Accountability Checks
- Executive Sitrep reports
- Executive Conference Bridge Activation
- Response Team Callouts
Step 9. Analyze performance
The final step is to close the loop by analyzing how well the organization responded. By classifying and tracking all assets in a centralized, visual and correlative way, it’s possible to assess each event’s impact and the response to it.
Data tells us organizations that perform after-action reviews improve their future response by understanding the following:
- Has this happened before?
- What was the impact?
- What did we do well?
- What could we have done better?
- What slowed us down?
- Who was involved?
- Who responded fastest?
The ideal is readily accessing all this information to analyze it in different ways, such as by determining the incident response commander for the three times the organization responded fastest. Just remember: the key is to not only perform these reviews but to close the loop by learning from experience and continually improving the plan and response.
As organizations face the prospect of dealing with a growing range of threats, they are smart to formalize and consolidate their operational response. Businesses are expected to not only know where their employees are at all times, but to quickly and easily gather information about critical events in order to anticipate the business and life safety impact.
In turn, organizations understand the need to simultaneously protect their people, buildings, IT systems, supply chain and brand/reputation. Harnessing the right CEM technology, they can ensure the latest intelligence on all of these areas is at their fingertips and visualize the threats to their assets. At the same time, they can coordinate the appropriate resources based on reliable information, and quickly mitigate critical events of any kind to reduce the impact to safety and business resiliency. Just as importantly, the right CEM system makes it possible to audit response rates for continual improvement.
Just as companies grasp the need for CEM, employees are starting to see the importance of this process and how it can keep them safe in a crisis. In a tight job market where companies are competing for talent, organizations that can point to employee protection will stand apart.
About the Author: Imad Mouline is the chief technology officer for Everbridge. He is responsible for Everbridge’s market strategy, product roadmap, innovation, and research and development. Previously, he was co-founder and CTO of CloudFloor, an enterprise cloud management company acquired by Everbridge. Prior to CloudFloor, Mouline served as CTO of Compuware’s Application Performance Management Solutions division, which was formed when the company acquired Gomez, a provider of web performance management solutions, where Mouline was CTO. Earlier he served as CTO of S1 Corporation, a provider of financial services solutions.
Active Shooter Incidents in the United States in 2016 and 2017 FBI, May 7, 2018, https://www.fbi.gov/file-repository/active-shooter-incidents-us-2016-2017.pdf/view
The Economist, Weather-related disasters are increasing, August 29, 2017 https://www.economist.com/graphic-detail/2017/08/29/weather-related-disasters-are-increasing
Ponemon Institute – Cost of IT Downtime, 2016
Institute for Economics and Peace, Global-Terrorism-Index-2015; Swiss Re, Preliminary sigma estimates for 2015: global catastrophes cause economic losses of USD 85 billion; Lloyd’s, Cyber attacks cost companies $400 billion every year
“BCI Emergency Communications Report 2017” Business Continuity Institute, https://www.thebci.org/news/bci-emergency-communications-report-2017.html
Managing Crisis: Responses to Large-Scale Emergencies, Arnold Howitt and Herman Leonard, CQ Press, page 5