The performance of present computing systems has increased at the cost of considerably enlarged power consumption. The increased power consumption either reduces the operation time for battery powered systems, such as hand-held mobile systems, or generates extreme amount of heat and requires expensive sophisticated packaging and cooling technologies, especially for complex systems that consist of several processing units. The generated heat, if not efficiently removed, can also reduce system reliability, since hardware failure rate increases with higher temperature [1][2]. In multiprocessor systems, such as space-based control systems or life maintenance systems, where a failure may cause catastrophic results, reliability also plays a significant role. Although substantial research has been conducted on fault-tolerant and energy-aware computing techniques independently, yet, comparatively less work is available for addressing the combined issue of energy and reliability management. The complexity of managing them together is somewhat due to the fact that there is a trade-off between energy consumption and reliability. When more resources are dedicated to fault-tolerant schemes for achieving higher levels of system reliability, resources left for energy management schemes are somewhat reduced [3].
A. Energy Awareness
