Dependability

Posted on May 20, 2022

Measuring reliability

Probability of failure on demand
Rate of errors
Mean time to failure
Availability

Attributes of dependability

Availability: Ready for use when envoked
Reliability: Likelihood of providing service for a given period of time
Safety: Operation without damaging or endangering environment
Confidentiality: Nondisclosure of undue information
Integrity: Endures improper alterations
Maintainability: Probability of repair taking less than t time

Systems may prioritise these differently, eg. pacemaker needs reliability > security.

Failures

Fault: Cause of error
Error: Fault manifestation
Failure: Error propogated over system boundaries
- Hardware failure: Components do not function
- Software failure: Errors in specification, design or implementation
- Operational failure: Error between the chair and the keyboard

Errors propogate through the fault-error-failure cycle.

Fault -> error -> Failure
-> Fault in another module
-> System failure (as error exceeds system boundaries)

Providing dependability

Fault avoidance: Careful development (dependable processes)
Fault detection and correction: Validation before program deployment
Fault tolerance: Any faults that occur get managed, system can continue

Detection and recovery

Graceful degredation: Allows the system to continue, with reduced capacity
Redundancy: Spare capacity, used to take over from failures
Diversity: Redundant components are different types, since unlikely to both fail in the same way

The entire system should be considered for dependability. Software failure must me contained as much as possible.\

Dependable processes

Documentable, standardised, auditable, diverse and robust processes to produce dependable software
Well defined, repeatable, testable, fault tolerant development processes
Can involve:
- Requirement reviews & management
- System modelling
- Program inspection
- Automated testing
- Test management

Dependable system architectures

Protection system: A parallel system which monitors the main system, and performs actions if it is in an unstable state.
Self-monitoring: Carries out computations in seperate channels, compares results to itself to see if a fault has occurred
N-version programming: System developed n times, outputs computed on each system, final result determined by voting

These architectures all rely on diversity.