This is a frequently asked question in the world of Automotive Safety, especially for programs up to ASIL B. There are contradicting answers to this question even amongst Safety Industry experts and every answer is based on its own rationale. In this blog, we have provided a background on what is core self-test and why is it needed from a functional safety standpoint.
Typically, when we develop an item that is required to be ASIL compliant, it is state-of-the-art to choose a Microcontroller that is designed for ISO 26262 compliance. Microcontrollers that are ASIL certified are developed as per the HW development processes specified by ISO26262. They also incorporate safety mechanisms to detect, correct or prevent (if possible) systematic and random faults. These safety mechanisms provide sufficient diagnostic coverage against faults and enable the Microcontroller to achieve a FIT rate that is sufficiently low, so that the Item that is integrating it can meet its required FIT. Typically, Controllers provide mechanisms like ECC, Parity, CRC, Memory Protection Unit, Write-protected registers, peripheral locks, Clock supervision, voltage supervision, diagnostic mechanisms such as Self-tests of peripherals etc. To detect faults in the CPU Core, ASIL D Microcontrollers widely use dual Lockstep processing to achieve maximum diagnostic coverage. However, for ASIL B controllers, the widely used approach is to rely on Software to detect faults in CPU core. Hence, in the Safety manual of the ASIL B Controller, the supplier typically places a requirement for the Item developer that a Software Core self-test must be integrated in the program.
What is Software Core self-test?
Software Core self-test or Software based Self-test (SBST) as it is widely called in the Semiconductor industry, is a safety mechanism that checks the correct functioning of the CPU Core. It checks the correct functioning of the various components of the processor. The various components of the processor are
- Functional Computational components – which perform specific arithmetic, logic operations on data - such as Arithmetic and logic unit (ALU), Floating point unit (FPU), Multipliers, Shifters, dividers etc
- Other functional components such as Memory protection unit, Interrupt controller etc
- Interconnect components – that interconnect various processor components and enable the data flow through a processor – such as multiplexers
- Control components – that control the flow of instructions/data within the processor core such as the processor control unit, or the memory and data controllers that implement instruction fetching and memory handshaking
- Internal processor registers (System registers, general purpose registers)
- Hidden components – that are available in the processor architecture to increase its performance such as pipeline logic, components for branch prediction, instruction and data caches
The aim of Core self-test is to detect structural stuck-at faults in the processor core. It is a non-intrusive method of testing the CPU that is aimed to run continuously during the operation of the Item, along with the primary functions of the Item. That is, if you are designing Software for an item, you would put in core self-test as a SW component that would keep running cyclically in a background task. From the Semiconductor industry perspective, SBST is a great low-power, low-cost, highly flexible and efficient alternative to other conventional ways of Processor IC testing and hence, there is ever growing interest in terms of how to leverage it at best to reduce the cost and time of manufacturing and field testing.
In Core self-test Software, various test patterns are executed to test the different components of the CPU. Test patterns are nothing but a series of instructions, from the Instruction set Architecture of the CPU. For every test pattern, the response is calculated in the form of a signature. These calculated signatures are compared against expected signatures that are pre-computed at development time and stored in ROM. If there is a signature mismatch, it is taken as an indication of a failure of that particular component of the CPU that was tested using the test pattern. Since the fault is non-repairable, a safe state must be taken.
In very simple terms, Software Core self-test performs exhaustive computations using the various data processing instructions (logical, shift, compare, arithmetic, floating point arithmetic etc) to ensure that the ALU and FPU are working correctly. It executes the branch instructions, load/store instructions etc to ensure correct functioning of registers. It executes Interrupt related instructions to ensure correct functioning of the Interrupt controller.
Given that the software must exhaustively handle instruction sets, Core self-test software is developed as assembly language routines . It is best developed by the CPU Core supplier itself since they have the best knowledge of the Core. For sure, no one else will know the Core as well as they do. Several Microcontroller suppliers provide their own self-test library. ARM as well provides its own test library for verifying its cores.
Since Core self-test is executed cyclically during run time, it is designed as a series of several small test patterns or modules that can be executed individually or as a small group, so as to have minimal performance overhead.
There are many different strategies that CPU suppliers use for self-test design in terms of deciding these test patterns. Instructions are grouped based on its characteristics (e.g., arithmetic instructions, logic instructions), based on what CPU component it tests, a combination of both, or in other potential ways. The key question is “How to effectively test the various components, control paths and data paths within the processor so as to achieve the maximum possible diagnostic coverage in the least possible size of the self-test code?”
How to determine if Software Core self-test is required in a program?
From a ‘Safety’ perspective – Software core self-test is required if the diagnostic coverage that it provides for the CPU failure modes has a significant contribution towards reducing the FIT rate of the CPU core, and thereby the FIT rate of the Micro, and this consequently leads to being able to achieve the FIT required for the Item.
In other words, it is required if the contribution of Software core self-test towards improving the FIT of the Micro is significant enough to improve the overall FIT of the item, and to bring it to the required number.
If the item is already able to meet its FIT target with the existing safety mechanisms defined in the Technical safety requirements, we think it is not mandatory to integrate software core self-test into the item.
Or, if the Software Core self-test has low diagnostic coverage, in the sense that it does not reduce the FIT sufficiently (for e.g., for a FIT target of 100 FIT, it only brings a reduction of <2%, ~2 FIT), you may want to consider other perspectives such as Engineering effort, costs etc to decide if you want to go ahead with it.
There are scenarios where a micro supplier does not provide a software core-self test. Or a core-test solution is available but its diagnostic coverage for CPU faults is not available or cannot be clearly determined. For example, let us assume the self-test solution is developed not by the micro supplier but by a SW team in the tier 1 or tier 2 organization. In this case, it is not possible to determine the diagnostic coverage because the tier 1/2 does not have the knowledge of the CPU core failure modes or the Instruction set architecture. Such a solution should not be integrated with any claim that it brings down the FIT of the item.
From a ‘Software’ perspective, the aspects that must be considered are the constraints placed by the self-test solution on the overall system, and its Integration requirements. For e.g., how much of RAM, ROM would the component need? What is the CPU time it will take? Will the execution of self-test in any way interfere or place a restriction on the normal application? Typically, self-test solutions keep the RAM ROM footprint and CPU run time very low.
The state-of-the-art argument
The technical state-of-the-art refers to the highest level of development (i.e., technical solutions, processes etc) that is reached at a particular point of time. Software Core self-test is considered state-of-the-art for ASIL B, because it is already deployed and operational in several ASIL B and even ASIL A items. According to German law, car producers are generally liable for damage to a person caused by the malfunction of a product. If the malfunction could not have been detected by the technical state of the art, the liability is excluded (German law on product liability, source: https://iclg.com/practice-areas/product-liability-laws-and-regulations/germany). Hence, some Safety Industry experts strongly mandate that a core self-test SW is made available for ASIL-B and even ASIL-A items, irrespective of whether the OEM asks for HW metrics or not.
With this high level overview of software core self-test and the differing thought perspectives around it, we hope we have provided you the required information that you need to make the best decision for your program. If you have experience with this topic, please feel free to share your views in the comments.