What Is Fault Tolerant Computing?

Improved Essays
I. INTRODUCTION
The performance of present computing systems has increased at the cost of considerably enlarged power consumption. The increased power consumption either reduces the operation time for battery powered systems, such as hand-held mobile systems, or generates extreme amount of heat and requires expensive sophisticated packaging and cooling technologies, especially for complex systems that consist of several processing units. The generated heat, if not efficiently removed, can also reduce system reliability, since hardware failure rate increases with higher temperature [1][2]. In multiprocessor systems, such as space-based control systems or life maintenance systems, where a failure may cause catastrophic results, reliability
…show more content…
During the execution of an application, a fault may take place due to various reasons, such as hardware failures, software errors and electro-magnetic effects. Therefore, fault-tolerance is an inherent requirement of systems when accurate results are needed even in the occurrence of faults. In the fault-tolerance area, redundancy is employed to mask or otherwise work around these faults, in this manner preserving a certain desired level of functionality. Generally, redundancy is defined as the deployment of spare resources (spatial) for the application. Permanent faults are generally tolerated by hardware redundancy, which is also known as modular redundancy (MR), where cloned tasks are running concurrently on multiple processing units. Broadly, three different techniques are used for implementing temporal redundancy based fault-tolerance in task scheduling: checkpointing, recovery block and recovery through …show more content…
In a checkpoint, the state of a system is checked and correct states are saved to a stable storage. When faults are noticed, the execution is rolled back to the most recent correct checkpoint and re-computes the faulty section by exploring the temporal redundancy. With the huge number of checkpoints, the time overhead caused by this method may be unaffordable.
2) The recovery block approach is another method providing a task with one or more backups. Once the original copy of the program fails, the system switches to the executions of its backup [15][12][13]. The execution times of the original task and its backups may be different.
3) Recovery through re-execution technique is used to tolerate transient faults, by re-executing the original task if a fault occurs. As soon as faults are detected, the system restores the system state to a previous safe state and the recovery task is send out, in the form of re-execution

Related Documents

  • Improved Essays

    1. What is the mission of corrections? The mission of the corrections has traditionally been to implement court-prescribed sentences for criminal violators or to carry out the sentence of the court. 2.…

    • 569 Words
    • 3 Pages
    Improved Essays
  • Decent Essays

    Nt1310 Unit 4 Test

    • 315 Words
    • 2 Pages

    Have you ever noticed that your PC keeps slowing down during the day? If ‘yes’, it might happen as because many programs simply remain in an execution mode and keeps sucking up a considerable amount of processing power. Such a type of fault remains specific to a fault specific to Windows 8.1 Task Manager Startup and the best way of tackling the same is with the help of taskmgr.exe fixing software. In this context, it could be said that the free online tools hardly prove to be of any benefit in fixing all analogous types & genres of issues. Adyne Roberts posted a series of queries one after the other.…

    • 315 Words
    • 2 Pages
    Decent Essays
  • Decent Essays

    Keyword: 0xC0000022 Meta: Title: Guide to Solving the Error 0xC0000022 The computer needs to be error free in order to run properly. An error-ridden computer makes it difficult for the user to work with them.…

    • 539 Words
    • 3 Pages
    Decent Essays
  • Improved Essays

    Central Processing Unit (CPU) which is made up of three major components, the arithmetic/logic unit (ALU), the control unit (CU) and memory by combining the ALU and CU together you get the CPU), (Englander, I. 2014). The arithmetic/logic unit holds data temporary and where calculation are processed the control units controls and deciphers the execution command and follow the instruction that goes with the sequence of actions. The control unit determines the particular instruction to be executed by reading the program counter (PC) (Englander, I. 2014). Primary memory holds program instructions and data and interacts directly with the CPU during program execution. The control unit also reads and interprets instructions from memory and transforms them into a series of signals to activate other parts of the computer.…

    • 567 Words
    • 3 Pages
    Improved Essays
  • Improved Essays

    Concurrency loss due to IO path critical sections (IPCS) Time lost in waiting queue (WT) to enter a critical section affects an application concurrency (AC), and impedes the application performance from scaling (AC ∝ 1 / WT) if a workload is bursty and highly parallelized. It is apparent that the wait time to enter a critical section is a function of the CS’s size (CSS) and the number of waiting threads (NWT) i.e. WT ∝ CSS * NWT. Furthermore, the work done (IO completed) is proportional to the trips made through the CSSs (i.e. IOs completed ∝ NWT). Therefore, existence of CSSs in IO path, high wait time, and large CSS’s size affects an application concurrency, and demands the measures to minimize their effects.…

    • 560 Words
    • 3 Pages
    Improved Essays
  • Improved Essays

    RAID level 5 works with any number of disk equal or greater than 3 and places a parity sum on one disks in the set to be able to recover from a disk failure. (Striped blocks with distributed parity.) The parity calculations are done in a RAID 5 set using XOR. We assume a small RAID 5 set of four disks and some data is written to it. On the first three disks we have the binary information 1010, 1100 and 0011, here representing some data, and now calculate the parity information for the fourth disk.…

    • 572 Words
    • 3 Pages
    Improved Essays
  • Improved Essays

    The problem or the issue addressed is on how to parallelize the computation, distribute the data, and handle failures conspire to obscure the original simple computation with large amounts of complex code to deal with these issues. Contributions are simple powerful interface that gives parallelization and distribution of large scale systems. So to tackle the issue of parallelization, fault tolerance and distribution of data, they acquired the map and reduce primitives. The use of a functional model with user-specified map and reduce operations allows us to parallelize large computations easily and to use re-execution as the primary mechanism for fault tolerance.…

    • 868 Words
    • 4 Pages
    Improved Essays
  • Improved Essays

    Nt1310 Unit 8 Lab Report

    • 494 Words
    • 2 Pages

    After taking the elapsed clock cycles between two successive calls to RDTSC instruction, we can divide it by the CPU clock frequency to get the absolute time. We have run this operation for 50,000 iterations and found the minimum, maximum, average and steady point(mode). We are convinced that “Steady Point” is the best method to measure a quantity because this excludes the outliers and variations that occur in the system. An alternate method is to exclude the 5 minimum and maximum values and compute the average of the remaining values.…

    • 494 Words
    • 2 Pages
    Improved Essays
  • Improved Essays

    Love to solve problems, I enjoy a challenge of identifying a situation at hand and being able to solve it. It brings me joy to restore something back to its initial glory and sometimes go back to previous problems knowing that I can fix it with giving it another try. Being restorative means being able to figure out what is wrong and fix it. Very skilled at dealing with problems, practical or personal. Being restorative is being able to say that I can go back and fix things that I’ve had problems with in the past and applying what I’ve learned to help understand it better.…

    • 1441 Words
    • 6 Pages
    Improved Essays
  • Decent Essays

    Week 5 Computer Errors

    • 557 Words
    • 3 Pages

    In this article we are discussing about some error code such as ‘0x0000001c’ which usually taken place in our entire machine once at least and we have failed to understand the reason behind such error message. And failed to take proper step within stipulated time and such issues are staying in my machine for long time and the system has started facing many other issues. That’s why some basic errors are described, have a look on those and try to get rid of such issues as soon as possible. What are the major issues?…

    • 557 Words
    • 3 Pages
    Decent Essays
  • Superior Essays

    Software maintenance could also be a potential problem with systems inaccessible due to scheduled maintenance…

    • 1517 Words
    • 7 Pages
    Superior Essays
  • Improved Essays

    Suppose your computer system was ruined or damaged by a simple power surge. Imagine your business needed to be shut down temporarily because an electrical surge damaged all your equipment. What could happen to your business if an employee error damaged your only production machine? Even broken water pipes can cause a lot of damage. You may not expect them, but any of these things could happen.…

    • 655 Words
    • 3 Pages
    Improved Essays
  • Improved Essays

    The root cause of the failure was identified as the memory card of the EMC DMX-3 SAN array. Even though the primary cause of the failure was the SAN array, the database…

    • 859 Words
    • 4 Pages
    Improved Essays
  • Superior Essays

    (1) Using at least 250 words, explain each of the guiding principles of restorative justice. Restorative justice is a process in which the offender repairs wrongdoings that were done to the victim and to the community. Instead of a traditional trial, the offenders are encouraged to take responsibility for their actions by expressing remorse and even apologizing to the victim. The restorative justice process gives the victim the opportunity to meet with the offender so the victim can explain the impact of the crime to the offender, while also giving the victim the opportunity to forgive the offender.…

    • 1198 Words
    • 5 Pages
    Superior Essays
  • Great Essays

    A Fault Tree Essay

    • 1481 Words
    • 6 Pages

    Nanjing University of Aeronautics and Astronautics Graduation Thesis Assignment Letter College College of International Education Major _ Aeronautic Engineering __ Topic Fault tree Analysis of Main Landing Gear System for Civil Aircraft Student Name Zeshan Ellahi ID Number191261230 Deadline _______2016.06.08____________ Location __Jiangning District_11501B__ Faculty Advisor Lu Zhong…

    • 1481 Words
    • 6 Pages
    Great Essays