Failure detection algorithms for a reliable execution of parallel programs

Sophie Chabridon, Erol Gelenbe

Research output: Contribution to journalConference articlepeer-review

Abstract

We report on the design and simulation of novel algorithms which will ensure that application software runs correctly on a MIMD system in which processing units (PU) can fail. The effect of these algorithms is evaluated for random task graphs using simulation as failure rates increase. An example of a specific application is also examined (the Fast Fourier Transform) for which we construct the task graph and then simulate its execution under various values of the failure rates of processors.

Original languageEnglish
Pages (from-to)229-238
Number of pages10
JournalProceedings of the IEEE Symposium on Reliable Distributed Systems
Publication statusPublished - 1 Jan 1995
Externally publishedYes
EventProceedings of the 1994 IEEE 14th Symposium on Related Distributed Systems - Bad Neuenahr, Ger
Duration: 13 Sept 199515 Sept 1995

Fingerprint

Dive into the research topics of 'Failure detection algorithms for a reliable execution of parallel programs'. Together they form a unique fingerprint.

Cite this