What is a cyber-physical risk assessment?

What is a cyber-physical risk assessment?
Sinclair

At the coming S4 conference I will present on the topic of cyber-physical risk assessment. On stage 2 you are allowed to lose your audience a bit. I dislike that idea, so create an accompanying white paper that discusses various topics such as conditional risk and Rings Of Protection Analysis (ROPA) in more detail than the presentation time allows me. The white paper will be available in the conference app.

For this blog, I took a section and publish it as a separate topic as I will do also for the other topics, so this is kind of topic 1 which deals with the importance of cyber-physical risk assessments and their link to the regulations.

What is a risk assessment and how does it differ from a vulnerability assessment?
Risk assessments and vulnerability assessments may seem the same at first glance, but these two concepts are very different. OT cyber risks are potential/viable threats or hazards associated with a plant’s use of technology, processes, and procedures. Vulnerabilities, on the other hand, are weaknesses in design or technology that can be exploited if the vulnerabilities are exposed to a threat actor.
Risk assessments are designed to identify potential threats/hazards associated with a new project, major change, or ongoing changes in the threat landscape. The idea is to identify areas of incomplete knowledge, fill those gaps, and then take steps to mitigate the potential threats.
A risk assessment looks at cybersecurity from the attacker’s perspective, evaluates attack scenarios, and provides cost-benefit information to justify the investment in security measures. In addition, we use a risk assessment to determine whether we meet legal requirements regarding individual, social and environmental risks when we build or operate an industrial production installation. With a risk assessment, we define the requirements for the security design of such an installation.

Vulnerability assessments aim to identify existing asset, channel, or security control weaknesses that could be exploited by threat actors to cause damage. By conducting a vulnerability assessment, a plant can identify vulnerabilities and security gaps and then take action to eliminate them.

Essentially, risk assessment involves looking outside an organization (hypothetically into the future) to identify threats that could potentially lead to problems, while vulnerability assessment involves looking within the organization for today’s structural flaws and weaknesses in the installation. The former evaluates which armies can approach the castle gates, what weapons their soldiers wear, and what their intentions are, while the latter checks the locks on the doors.

As such every production installation/process automation system needs a risk assessment to evaluate the potential threats there are, to make a security requirements specification, and create a risk register for managing the system. While a vulnerability assessment checks if the process automation system has the security measures in place. The overlap is that both a risk assessment and the vulnerability assessment consider vulnerabilities, the difference is that the vulnerabilities of the vulnerability assessment exist in the system while the vulnerabilities of the risk assessment potentially exist in the system. As such a risk assessment addresses the potential/viable threats, whereas a vulnerability assessment is limited to the current threats.

What is a cyber-physical risk assessment?

A special form of risk assessment is the cyber-physical risk assessment, this risk assessment extends the cybersecurity risk of a process automation system to the physical domain of the production process/process installation. Cyber-physical risk connects the cyber security of the process automation functions with the process security of the entire production installation and thus forms the link between deaths/injuries of individuals or society, environmental damage because of a cyber attack, and the legal criteria that apply to these losses.

Countries define criteria for the extent to which an individual, society or the environment may be exposed to risk. These kinds of risk criteria also apply to the petrochemical, refining and offshore industry and are normally linked ‘to a site location permit to operate’. This “site location permit” includes Environmental Impact Reports (EIRs), the various planning and zoning public review procedures, and the evaluations on the impacts to the area.

The criteria for the permit are defined in what are called frequency-number curves (F-N curves), curves that specify the risk tolerance for a certain loss. It differs by country how these “permits” are enforced. In Europe, the permits are linked to land use policies set by the government, but in other countries, these regulations can be set by agencies.

Following is an example of how Flanders (a district within Belgium) uses quantitative risk criteria for new and existing installations that are classified as major hazard establishments as defined within the “Seveso III” regulations. The risk criteria are typically split into location-based risk and societal risk and specify the risk tolerance for potential fatalities.

Figure 1 – Individual (location-based) risk tolerance Flanders (fatality risk per annum)

For example, for hazards limited to within the plant’s perimeter (fence), the table defines a maximum incidence frequency of 1E-05 for a fatal accident. This is the maximum risk exposure for employees and hired contractors working within the plant, independent of the cause of the incident.

Adjacent office buildings are normally built outside the plant’s fence and as such fall under the residential area specifying an incidence frequency of 1E-06. Sometimes a separate risk category is defined for adjacent office buildings, called ‘aggregated risk’. For individual risk the risk tolerance is set for the incidence/event, not differentiating between the number of fatalities per event. This differs for societal risk, in societal risk different specifications are defined for different sizes of loss, this is done using an F-N curve.

cyber-physical risk
Figure 2 – F-N curve for societal risk in Flanders / Belgium

The actual risk tolerance, therefore, differs per type of plant when it comes to societal risk. An accident at a nuclear power plant can cause far more deaths than an accident at a conventional power plant using coal or natural gas. Similar differences also exist between different petrochemical processes, depending on, among other things, the volume and harmfulness of the materials used in the production process, the number of potential victims can vary considerably.

cyber-physical risk
Figure 3 – Difference between where individual risk and where societal risk applies

Process safety risk analysis determines the number of possible fatalities per event. Most plants fall within the 1E-05 category, but it depends very much on the local situation. Critical industrial facilities such as petrochemical plants, refineries, power plants, and offshore production platforms/vessels are subject to these risk tolerance criteria. Not only for the safety of the employees/contractors and community but also for the environment.

To account for these differences, plants categorize the loss on an impact scale and define what is called the target mitigated event likelihood (TMEL) for each level of loss and type (Safety, environment, financial) of loss.

cyber-physical risk
Figure 4 – Risk criteria

The importance of this discussion is to realize that these criteria are not specified for process safety only, they apply to the production process. Therefore, the criteria are not exclusively specified for process safety incidents, every type of incident that can cause a similar loss is subject to these criteria.

Therefore, cyber-physical risk also needs to meet these criteria, since the TRITON/TRISIS attack against a safety instrumented system (SIS) showed that functional safety can’t guarantee the installation criteria without appropriate cyber security reducing the incidence frequency of successful cyber attacks causing a similar loss.

Since we can’t aggregate process safety risk (stochastic/random events) and cybersecurity risk (systemic/intentional events) they need to meet the criteria independently of each other.

This requirement enforces that a cyber-physical risk assessment is (semi-)quantitative because we need results that can be compared to the quantitative risk criteria defined by the regulations.

This results in what I called in a previous blog the “1st law of cyber-physical risk”, stating that:

“The risk criteria for process safety and cybersecurity have the same targets for the same type and severity level of loss, independent of the cause of the loss.”

This discussion is especially prominent when building a greenfield installation, at that point authorities verify the risk criteria and are increasingly aware of the cyber-physical risk. For brownfield installation projects, many companies avoid the added complexity of quantitative analysis by opting for the simpler qualitative cyber risk assessment, a type of assessment that largely ignores the cyber-physical dimension. So far, there have been no fatalities from a cyber-attack, but with the TRITON/TRISIS attack, we got close to scenarios where this could have happened.

Another aspect to consider is the completeness of the process safety analysis. It is important to realize that process safety analysis does not analyze all possible incidence scenarios, the scenarios identified by HAZOP and LOPA analysis are typically considered “mutually exclusive”.

In other words, these scenarios are assumed to occur independently of each other. An assumption that is fine when we look at the stochastic/random events in process safety scenarios, the probability of that these events occur in a very special combination or sequence is negligible. However cyber-physical events are systemic and malicious, they occur at a time determined by the threat actor and in a predetermined order.

So, these events are what we call mutually inclusive[1] and capable of causing consequences not analyzed (or excluded as not realistic scenarios) during the process safety analysis.

Full or semi-quantitative cyber-physical risk analysis is important to understand all hazards but is often traded in for a “tick the box” type of qualitative risk assessment based on a limited number of mainly cyber-driven scenarios. “Ransomware” scenarios are very important to consider, but in a well-designed process installation, a full failure of the control system should not lead to process safety-related hazards causing deaths. Process installations are designed in such a way that they can be brought into a safe state if the control function is lost. This would be different if there is ransomware that targets the safety instrumented system, but so far this is a hypothetical threat.

Today we have come to a point where risk assessments are considered an essential activity in any project. With accompanying documents such as IEC 62443-3-2 completely ignoring quantitative risk criteria (actually the 3-2 document ignores the subject of risk criteria as a whole) and also considers (semi-)quantitative risk assessments impossible, many of the risk assessments are treated as qualitative assessments, ignoring the maximum incidence rates specified by the regulatory authorities.

A proper risk assessment must analyze both the cyber attack and process safety loss scenarios for assessing risk. I believe ISA 84 is going to fill the gaps that the ISA 99 workgroup has created, but we have to wait until Q1 2023 for this.


[1] As multiple contingencies in an orchestrated way

A complimentary guide to the who`s who in industrial cybersecurity tech & solutions

Free Download

Related