Pitfalls Tier 1s fall into while Integrating ASIL Micros

One of the most challenging topics for Tier 1s in a Safety program is the Integration of “ASIL” micro(s) into the System and fulfilling the Assumptions of Use (AoUs) provided in the Safety manual to ‘truly’ achieve the ASIL-ness of the Micro. Despite our experience over years integrating different ASIL Micros, we have, time and again, been surprised at what we have found. In this article, we have talked about the top 4 pitfalls we fell into. In the conclusion, we have given our recommendations on how to avoid these pitfalls.

Even though we have focused only on ASIL Micros in this article, you may fall into the same pitfalls while integrating any other ASIL HW Components such as PMICs, Sensor ICs etc.

Since the subject of our article is the Tier 1, wherever we have used the “you” pronoun, it refers to a person in the shoes of a Tier 1.

Top 4 Pitfalls:

1. Micro is ASIL “suitable” but not ASIL “compliant”.

2. The Supporting SW (Drivers, stack, libraries) provided for the Micro are not ASIL compliant.

3. Challenges when using higher ASIL Micro for a lower ASIL System

4. Struggles in balancing Safety and Performance (well, not really a pitfall but a challenge)

Micro is “ASIL” suitable but not “ASIL” compliant

Some Micro suppliers develop the Micro for a Safety program but do not develop it according to ISO26262 standards. The Micro supplier provide safety specific documentation that talks about the safety related functionalities that can be achieved using that Micro and what kind of Safety mechanisms are provided for that functionality. For e.g., if the Micro is meant for an Instrument Cluster ECU, the documentation describes how the system can set Safety telltales or indicate Warnings/Alerts in the display. It also specifies the in-built HW mechanisms that the Micro has for ensuring that the telltale or Warnings are indicated correctly. Besides this, the Micro could also have standard safety mechanisms such as ECC, Parity, Internal watchdogs, clock monitoring and MPU (Memory protection unit). When you read all this and are deeply convinced about how perfect this Micro is - to achieve the Safety goals of your program, you fall into the trap. The Micro is not really “ASIL” compliant. i.e., it is not developed as per ISO26262 process. It is only ASIL “suitable” and has been developed only as per Standard QM processes.

How would this work? Doesn’t HW Evaluation in the ISO Standard state that complex ICs such as Micros must be developed as ASIL compliant to be used in a Safety program? So how can you use a QM Micro to achieve the Safety goals of the program?

One path that is chosen by Micro suppliers to handle this is “ASIL Decomposition”. i.e., The Micro is decomposed as QM along with another ASIL Micro or External watchdog that monitors the Micro.

There are two aspects one needs to be aware of:

If the Micro’s documentation states that it is “ASIL suitable”, you could assume that it is an ASIL Micro. If you do not clarify the ASIL level of the Micro upfront, you may find out the gap only after the program has started and it will be too late to go back and ask for a Micro change. You may be forced to either take approaches like HW Component Qualification, which is tedious and highly not recommended. Or worst, you may have to deem the System itself as Not-Safety compliant, which leads to loosing your reputation with the OEM.
If the Micro’s “safety” concept has used an ASIL Decomposition strategy, you need to think about and question the technical correctness of that solution for your system. Can you decompose all the Safety related failures of your QM Micro with an External monitoring? For e.g., if the Safety functionality still runs in this “ASIL suitable” QM Micro and you use its inbuilt mechanisms like MPU or ECC to protect your ASIL SW, what happens if these mechanisms don’t work correctly? How can the failure of MPU or ECC be detected by External monitoring?

Another related use case can be that the Micro suppliers have actually developed the Micro as per ISO26262 but they have not got it “certified” by a sufficiently independent organization. In this case, the Tier 1 needs to make an agreement with the Micro supplier regarding the evidence(s) that the Micro supplier must provide the Tier 1 for the latter to gain confidence on the “ASIL ness” of the Micro.

The Supporting SW (Drivers, stack, libraries) provided for the Micro are not ASIL compliant

If you successfully chose an ASIL Micro for the program, the next aspect to think about is – Do you expect the Micro supplier to provide you any Supporting SW, such as MCALs/Driver, Peripheral stacks, or any supporting libraries? If yes, do you expect this Supporting SW to be ASIL compliant, to achieve your System’s Safety concept? If yes – Welcome to the next pitfall. While you may think that if the Micro is ASIL compliant, the Supplier would also develop the supporting SW as ASIL compliant, this may not be the case. Micro Suppliers have their own strategies for deciding whether the SW they provide must be ASIL compliant or not, or to decide which of their SW must be developed as ASIL compliant.

If the Tier 1 wants ASIL Drivers to achieve the System Safety concept, but the Supplier provides only QM Drivers, then the Tier 1 would have to choose alternative paths such as developing their own ASIL Drivers or to perform SW Qualification of the Supplier’s QM Drivers, which is complicated and needs additional support from the Supplier.

Challenges when using higher ASIL Micro for a lower ASIL System

During the development of the Micro as an SEooC, the Tier 2 Supplier identifies all the Safety mechanisms that must be implemented to achieve the required FFI, qualitative measures and quantitative targets (i.e., FIT, SPFM, LFM) for the Maximum ASIL level supported by that Micro.

You fall into the pitfall especially if you are using the Micro to achieve a lower ASIL level in your system than what the Micro is capable of (but you may experience this even if your System and Micro’s ASIL levels match). For example, the Micro is capable of up to ASIL D, but you want to achieve only ASIL B in your System. Now, you either have the choice to still implement all the Safety mechanisms for ASIL D, which will be an overkill for your System or implement only what is required for ASIL B. However, if you want to implement only what is needed for ASIL B, then you must figure out which amongst the safety mechanisms proposed by the Micro supplier you should implement and which you can skip.

For example, if there are 10 Safety mechanisms that must be implemented for a peripheral to achieve an ASIL-D metric, you expect the Micro supplier to tell you if it is sufficient to implement, say 7 of those mechanisms for ASIL-B. In this case, you might want to know the failure modes covered by each of those 10 Safety mechanisms and the diagnostic coverage of each SM.

Another scenario is when you want to implement an alternative mechanism in the system, instead of the one proposed by the supplier. For e.g., the Supplier proposes that system should implement a periodic readback of all the registers of a peripheral, but you wonder if you can implement a memory protection on those peripheral registers which you think might be an alternative simpler solution. In this case, you need to know about the intent behind why the Supplier asked you to implement the periodic readback (i.e., what failure modes does it cover) for you to decide if the same intent can be achieved by implementing memory protection.

A Third Scenario is when the Micro supplier uses the same safety mechanism for coverage of both Single point fault and latent fault (really, that happens too!). You are left wondering how the same mechanism can cover both.

However, from a Micro supplier perspective, the information about the Intent of the Safety mechanisms, such as its associated failure modes and diagnostic coverage might be treated as confidential. This conflict of interest between Tier 1 wanting to know the Intent, and Micro Supplier not being able to reveal it, leads to delays in deciding which Safety mechanisms are really needed for that system. Worst case, you may have to make the decision about the Safety mechanisms by making assumptions about the Supplier’s intent.

The other aspect here is the FMEDA. In case of adapting the Safety mechanisms, you need to adapt the FMEDA of the Micro. Hence, it is ideal if the Supplier provides you a “customizable” FMEDA for the Micro, so that you can adapt the Safety mechanisms for the lower ASIL level of the System and calculate the impact on the HW Metrics.

However, if information on failure modes and DC is confidential, then the FMEDA is only partially customizable and hence, the Tier 1 will be dependent on the Micro supplier to update the FMEDA for the modified/removed Safety mechanisms and to re-compute the HW Metrics of that Micro. This leads to delays in finalizing the FMEDA for the Tier 1’s System and to achieve the required HW metrics for the Safety case.

Difficulties in balancing Safety and Performance

By Performance, here, we mean the functional and non-functional expectation on the Intended functionality. For e.g., if you take an Instrument cluster, if there is an Airbag failure , how quickly do you expect this error or warning to be indicated on the display of the cluster is a functional aspect of Performance. From an architectural perspective, how much memory consumption or CPU load is allowed is a ‘non-functional’ performance aspect. Safety refers to the Achievement of functional safety, which requires you to do all the checks of the System. The Safety aspects always takes up performance since most checks must be run periodically and additional memory might be needed to achieve redundancy.

When integrating an ASIL Micro, the Tier 1 would have to implement all the assumed Safety mechanisms stated in the Micro’s safety manual and this may lead to a significant resource consumption/performance impact if these mechanisms are to be fully implemented in SW.

Let’s take an example. The Micro Safety manual could give you a big list of SMs that must be implemented to cover latent faults, such as:

Check if the ECC mechanism works correctly.
Check if clock monitoring works correctly.
Check if MPU works correctly.
Check if all the Safety relevant peripherals of the Micro are working correctly.

Implementing all these checks once at startup (which is how Latent fault mechanisms are typically implemented) can lead to not meeting the System Startup timing requirements. Start up timing requirements are typically non-negotiable performance requirements coming from the OEM, and often it is driven by regulations (for e.g., Rear view camera has a startup timing requirement in the FMVSS111). Meeting the Start-up timing requirement is a constant challenge that every Tier 1 faces, always breaking their head over from where and how to get those extra milliseconds. For e.g., if your System must be up and running in 2 sec, but if these checks themselves need ~500ms, you will be in a fix what to trade-off. When you think if the checks can be implemented during run time, you will be surprised to realize that the checks are self-destructive, i.e., they are performed by injecting a fault and checking if the Safety mechanism detects it and throws an exception or reset. Hence, it cannot be executed at run time, even if you just must run it only once.

Hence, successfully implementing all the assumed Safety mechanisms stated in the Safety manual without foregoing System performance becomes a nearly impossible task.

Conclusion

So, how can we avoid getting into such pitfalls and challenges? There is no fool proof solution in the real world, but here are few practices that Tier 1s can follow to make it a little less hard:

Ask the supplier about the ASIL level of the Micro and associated SW, and whether it is already certified, or if not, its plan for certification. Clarify this topic right at quote stage of the program.
Create an agreement with the Micro Supplier right at program start and state your expectations on the Micro-related SW, FMEDA expectations etc.
Ask the Micro Supplier about the very high-level assumptions made for the Micro SEooC and the constraints that it will place in your System (such as assumed external System/HW requirements, rough estimate on required start up time or run time for Assumed Safety mechanisms, restrictions in using peripherals when implementing some of the safety mechanisms etc.). Evaluate right at start if these constraints are acceptable for your system and identify alternate solutions if it is not.
Discuss about the communication model to be followed between Tier 1 and Tier 2 during the program. In the phase of Safety mechanisms Implementation and Integration, Tier 1 will often need speedy support from the Tier 2. There may be also several technical questions on the safety mechanisms or how a HW’s inbuilt mechanism works, and this kind of support might require a different communication model. Think about all possible use cases upfront and agree on the best working model for a timely and successful integration of the ASIL Micro into the System.

What is your opinion about these challenges? Have you experienced something like this? What best practices do you suggest? What other challenges have you encountered? We would love to hear from you!

Search This Blog

Automotive functional safety ISO26262