• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/53

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

53 Cards in this Set

  • Front
  • Back

System Model

How we view our system.

Fault model.

What are the potential problems?

Specification

Oracle for system behavior.

System

A set of elements that work together to provide a service.

Element

An entity that provides a predefined service and is able to communicate with other elements.

Service and Trustworthy Service

A behavior that a user perceives as their system interface. Trustworthy - correct and timely.

Failure

Occurs when the services provided by a system observably deviate from specification.

Error

An error is an erroneous system state that can lead to a failure. May be detectable and may not necessarily lead to a failure.

Fault

An undesirable event, hypothesized cause of error. Can be permanent, transient or intermittent.

Fault Prevention

Focus on eliminating the conditions for faults to be activated. e.g. Modular Software Design, Software Development Methods

Fault Removal

Three stage process: validation, diagnosis, correction. e.g. debugging, code reviews.

Fault Forecasting

Concerned with estimating the number, likelihood of activation and wider consequences. A quantitative estimate of how good software is. e.g fault injection

Fault Tolerance

Actively handling the occurrence of faults/errors such that a system is able to still meet it's specification.

Reliability

Probability of a service to provide correct services over a specified time period.

Fail-safe

System becomes safe when it cannot operate. Ensures safety.

Fail-operational

System continues to operate when it fails. Ensures liveness, sometimes violates safety.

Fail-secure

Fail-secure systems maintain maximum security when they cannot operate. Potentially large safety violations.

Fail-silent

System either functions correctly or stops functioning after an internal failure is detected.

Fault-tolerant

System continues to operate correctly when subsystems operate incorrectly.

Availability

The probability a service provided by a system is operating correctly at a given time.

Safety

The extent to which a system can operate without damaging or endangering it's environment.

Confidentiality

Concerned with the non-disclosure of undue information to unauthorized entities.

Integrity

The capacity of a computer system to ensure the absence of improper system alterations, with regard to the withholding, modification and deletion of information.

Maintainability

The probability that a failed computer system will be repaired in t time or less.

Fail-Safe Fault Tolerance

Program satisfies safety only.

Non-masking Fault Tolerance

Program satisfies liveness only.

Fault Intolerant

Program satisfies neither safety nor liveness.

Masking Fault Tolerance

Program satisfies both safety and liveness.

Detector

A class of program components that asserts the validity of a predicate in a running program. Necessary and sufficient for fail-safe fault tolerance.

Corrector

A class of program components that imposes a given predicate on a running program. Necessary and sufficient to design non-masking fault tolerance.

Phases of Fault Tolerance



Run-time Checks

Replication Checks, Timing Checks, Reversal Checks, Coding Checks, Reasonable Checks, Structural Checks, Validity Checks

Exception Handlers

Interface exception, local/internal exception, failure exception

Forward error recovery

Upon detection of error, program attempts to get into a state which is no longer erroneous. e.g. recovery blocks

Backward error recovery

Upon detection of error, program rolls back to a previously “recorded” good point from which it can restart execution. e.g. checkpoints

Dependability Analysis

Tries to identify acceptable levels of safety and reliability.

Hazard

A hazard is a state or set of conditions of a system, that, together with environmental conditions, will lead to an accident. Important to know the severity and potential frequency.

Hazard level

Combination of hazard severity and likelihood of occurence

Risk

The hazard level combined with the likelihood of the hazard leading to accidents and the hazard exposure or duration.

Safety Case

Needed as part of certification. Should communicate a clear, comprehensive argument that a system is acceptably safe to operate in a particular context.

Failure Modes and Effects Analysis

Considers the failure of a component with the effect of this failure being assessed at system-level to detect hazardous situations.




+ Can detect hazard situations arising as a result of single failures.




- Does not consider multiple failures.


- Expensive to perform exhaustively.

Failure Modes, Effects and Criticality Analysis

Similar to FMEA, except the important of each failure is ranked by accounting for the consequences and likely frequency of occurrence.




+ FMECA permits meaningful cost-benefit analysis.

Hazard and Operability Studies

+ Effective for situations where "What if ...?" questions are constrained.




- Time consuming, especially for the experts used in question generation.

Event Trees

Starting from events that can affect system, tracking forward to determine their effects.




+ Allows outcomes of events to be determined.




- Exponential Complexity

Fault Trees

Fault trees track backwards. Commonly used for safety-critical systems.




+ Resultant trees are simplified, compared to event trees.




- Identification of high-level hazards is challenging.

Cut Set

Combinations of elements in which simultaneous failures would lead to a system failure. Minimal cut sets provide lower bound on reliability.

Tie Set

Combinations of elements in which simultaneous operations would lead to a correctly working system. Minimal tie sets provide upper bound on reliability

Recovery Line and Consistency

A set of checkpoints across all processes to which the programs can be rolled back, in the event of failure, to ensure consistent error-free state of the system.




Said to be consistent if there are no messages that originate after the line and terminate before it.

Recovery Blocks

One hardware channel and N software components.




Secondary components perform similar function but in a different way.




Acceptance testing decides whether results are acceptable.




If primary component fail acceptance test:, we reload a checkpoint, and execute a secondary from a “clean” state.




Represent redundancy in space and time.

N-Version Programming

N hardware channels and N software channels.





NVP Axiom

NVP is the most efficient when version failures occur independently.

Fault Injection Analysis

Fault injection analysis involves subjecting a system to abnormal conditions, i.e., introducing faults / errors, so that behaviour can assessed

Coverage

The coverage of dependability mechanisms is the ratio of number of test cases successfully passed to the number of test cases subjected to.