Dependable parallel computing with agents based on a task graph model

  • Sophie Chabridon
  • , Eroi Gelenbe

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We discuss a novel technique for improving the dependability of parallel programs executing on a MIMD shared memory architecture. The idea is to empower certain tasks of each application program to carry out failure detection, and to reschedule the execution of those tasks which are considered to have failed. The technique we propose is based on a task graph representation of the parallel program, in which communications between tasks have been voluntarily isolated to the end of each task which is being considered. We propose and evaluate several algorithms which can detect failures and restart failed tasks. A discrete-event simulator is used to evaluate the performance under the effect of failures, with the use of our detection and restart algorithms, of a specific parallel application: the Fast Fourier Transform.

Original languageEnglish
Title of host publicationProceedings - Euromicro Workshop on Parallel and Distributed Processing
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages350-357
Number of pages8
ISBN (Electronic)0818670312, 9780818670312
DOIs
Publication statusPublished - 1 Jan 1995
Externally publishedYes
Event1995 Euromicro Workshop on Parallel and Distributed Processing - San Remo, Italy
Duration: 25 Jan 199527 Jan 1995

Publication series

NameProceedings - Euromicro Workshop on Parallel and Distributed Processing

Conference

Conference1995 Euromicro Workshop on Parallel and Distributed Processing
Country/TerritoryItaly
CitySan Remo
Period25/01/9527/01/95

Keywords

  • Dependability
  • Discrete-event simulation
  • Parallel computing
  • Software-based failure detection and restart

Fingerprint

Dive into the research topics of 'Dependable parallel computing with agents based on a task graph model'. Together they form a unique fingerprint.

Cite this