Skip to main content

What is the Ideal level of system architecture for safety?


The System Architecture is a key input document for the technical safety concept (TSC)/Requirements (TSRs). Interestingly, it is also a work product that gets refined by the TSC, often leading to defining new System elements and Interfaces to satisfy Safety requirements. However, several people do not understand why a System Architecture is needed to create the TSC! Shouldn’t it not be that the System Architecture implements the TSRs? Also, what is the right level of information required in a Safety System Architecture? 

This article is an attempt to bridge this knowledge gap and support the creation of more ‘Safety friendly!!” System Architectures 😃


What is the purpose of a System Architecture with respect to Safety?

To understand the purpose for System Architecture with respect to Safety, we first need to know the steps involved in doing a technical safety concept. Please refer the flowchart below.


Figure 1: Steps involved in creating a TSR.

To begin the technical safety concept, we need a basic system Architecture.

The purpose of System Architecture is to
  1. indicate the elements that perform the different functions of the System
  2. to describe the Interfaces provided/consumed by the elements (static description) 
  3. to describe how elements interact with one another to achieve the safety function (dynamic description)
The Static and Dynamic description is used in the System Safety analysis step of TSC (Step 3 in Figure 1) to identify and analyze failure modes and effects and to derive safety mechanisms.

Let us consider an example of a Simple Warning System that receives some CAN inputs and based on the CAN inputs turns on a tell-tale. The System has an ASIL-B Safety goal to ensure that the tell-tale is turned ON if a specified set of CAN signal conditions are fulfilled.

A super-high level System Architecture for this system looks like the one below. For understanding purposes, let’s call this a Level-0 diagram.

Figure 1: Steps involved in creating a TSR

At this point, we need to ask the question “Is there sufficient information available about the failure modes of each system element to perform a high-quality safety analysis?”. “Sufficient information” means whether we have the knowledge of the Interfaces provided by that element and the behavior of the element. If the answer is YES, then the system architecture should stop at this level since it is sufficient to develop the right safety concept for this system. If the answer is NO, then the system architecture should go 1 level deeper i.e., to the sub-elements within the warning system.

In case of the Warning system, it is not possible to develop a safety concept using the level-0 diagram, because the Warning system is a black box and there is no information about the System elements and its interfaces. Hence, we need to go 1 level deeper into the system. The Warning System’s Level-1 architecture now looks like this.

Figure 2: Level-0 System Architecture diagram for a Warning System. Note: this is not drawn as per modelling guidelines. It is just for a reference to throw an idea of the concept. Level-0 is just a term we are coming up with, and not to be interpreted as a standard level in System Architecture

To achieve the safety goal of this system, the first step, as per the TSC flow chart, is to identify the safety critical path. We could then take the approach to implement all the system elements in the safety critical path as ASIL. This can be represented as follows:

Figure 3: Level-1 System Architecture diagram for a Warning System. For ease of understanding, we have only shown the safety relevant system elements in this diagram. this is not drawn as per modelling guidelines. It is just for a reference to throw an idea of the concept.

The next step as per the TSC flow chart (Step 3 in Figure 1) is to perform the System Safety analysis. Let’s assume that the outcome of this analysis is to implement safety mechanisms to cover the Incoming CAN message faults and output telltale HW faults. This leads to adding new ASIL elements. Please refer the diagram below with the new ASIL elements added.
Figure 4: Level-1 System Architecture diagram for a Warning System with ASIL Safety elements

The ISO26262 standard states that the technical safety requirements must be clearly assigned to SW and HW. In other words, the TSRs allocated to every system element must be implemented by SW or HW or both.

Consider the below diagram where each system element is mapped to SW, HW or both.
Figure 5: Level-1 System Architecture diagram for a Warning System with new ASIL elements added based on System Safety analysis

The next step as per the TSC flow chart (Step 5 in Figure 1) is to perform a dependent failure analysis to identify the safety mechanisms needed to achieve Freedom from Interference and Independence. For this, we need to know the sources of common cause and cascading failures.

For example, do the ASIL and QM parts of the System run in the same micro? Do they use same or different clock sources and power supplies and memories? Is the same OS/Scheduler executing both the SW?

To find out the answers for these questions, we must identify these System Elements that are sources of common cause and cascading failures. Let us assume that in our example, the Power supply, MCU, and Memories are the System Elements that are sources of common cause and cascading failures. We have used the term ‘Base System Elements’ to represent them in the diagram below. 

Figure 6: Level-1 System Architecture diagram for a Warning System with ASIL elements assigned to HW and/or SW

With the addition of the Base Elements, it is possible to perform a system-level safety analysis of these elements as well as a Dependent failure analysis (DFA) to identify the Safety mechanisms needed to achieve FFI and Independence. For example, based on the Safety analysis, the Safety concept could propose that the Power supply and MCU should be implemented as ASIL, while a decomposition can be performed for the Memories (Memory function as QM, Memory check as Safety mechanism). 

Figure 7: Level-1 System Architecture diagram for a Warning System with Base System elements added

We used the Warning System example to demonstrate how the System Architecture gets refined in the process of creating the TSR. By having the right level of details in the System Architecture, it is possible to define the right safety concept for the program. Also, crucial decisions such as the purchase of ASIL Compliant HW or SW can also be made on time.

What is the “ideal” level expected in a System Architecture?

What is the ideal level for a System Architecture? Is it Level-1 or Level-2 or should we get into a Level-3 in some cases? Or does even Level-0 work? There is no 1 single solution that fits all.

There are 3 key aspects that determine the level of depth in System Architecture.
1. System Complexity
2. Safety Competence of the different disciplines 
3. Process Maturity

System Complexity:

Typically, a simple system needs only a Level-1 architecture while a highly complex system like autonomous driving may even need a Level-3 architecture or more. Even within the same system, 1 safety goal may be achieved with a Level-1 depth, while another safety goal might need a Level-3.

A Level-0 system architecture is sufficient if the entire System can be treated as a black box and decomposed with an external monitoring solution.

If the product uses a proven Safety platform with all the required solutions, then the system elements inside that platform need not be elaborated in the product’s system architecture (because that elaboration would have happened in that platform’s system architecture). See the below diagram for an example where the Base System elements stay at Level-0.

Figure 8: Level-1 System Architecture diagram for a Warning System with ASIL Base System elements identified.

Safety Competence of the different disciplines: 

Typically, if a program does not have sound safety knowledge in HW or SW disciplines, it may be required to get 1 level deeper to identify SW and HW Safety challenges upfront. This means, thinking about SW or HW elements in the System Architecture level. For example, how does the program achieve an ASIL power supply? Should an ASIL PMIC be used? Does the program have an ASIL compliant BSW and RTOS? These are ideally what the HW/SW teams must think about, however it may have to be thought of by the Systems Safety Engineer doing the TSC. This is not at all a preferred approach, but it does happen a lot in the real world.

Process Maturity of the Organization:

An organization that has a good functional safety process maturity would provide clear guidelines for creating System architectures and the level of details that are expected in them. This process will lead to an acceptance amongst all stakeholders (SW, HW, Systems, Safety) on the work product. Without such a process, there will be varying expectations from each stakeholder on what an architecture should contain. This can lead to having a System architecture that is not at the right required level of detail or probably over-detailed than needed.

Conclusion

As we learnt above, the right level of system architecture depends on various aspects. Hence, it may be hard to standardize the same level of system architecture across all programs.

We recommend creating a Level-2 architecture across all programs, and then deciding on a case-by-case basis whether to go deeper or stop at a lower level.