scispace - formally typeset
Journal ArticleDOI

A synchronous checkpointing protocol for mobile distributed systems: probabilistic approach

Reads0
Chats0
TLDR
A minimum process coordinated checkpointing algorithm where the number of useless checkpoints and blocking are reduced using a probabilistic approach that computes an interacting set of processes on checkpoint initiation.
Abstract
Coordinated checkpointing is a method that minimises number of processes to checkpoint for an initiation. It may require blocking of processes, extra synchronisation messages or useless checkpoints. We propose a minimum process coordinated checkpointing algorithm where the number of useless checkpoints and blocking are reduced using a probabilistic approach that computes an interacting set of processes on checkpoint initiation. A process checkpoints if the probability that it will get a checkpoint request in current initiation is high. A few processes may be blocked but they can continue their normal computation and may send messages. We also modified methodology to maintain exact dependencies.

read more

Citations
More filters

IJCSI Publicity Board 2010

TL;DR: This paper presents a systematic analysis of a variety of different ad hoc network topologies in terms of node placement, node mobility and routing protocols through several simulated scenarios.
Journal ArticleDOI

A low-cost hybrid coordinated checkpointing protocol for mobile distributed systems

TL;DR: A blocking algorithm is designed, where no useless checkpoints are taken and an effort has been made to optimize the blocking of processes for minimum-process checkpointing, and selective messages at the receiver end are delayed.
Journal ArticleDOI

Soft-Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile Distributed Systems

TL;DR: In order to balance the checkpointing overhead and the loss of computation on recovery, the authors propose a hybrid checkpointing algorithm, wherein an all- process coordinated checkpoint is taken after the execution of minimum-process coordinated checkpointing algorithms for a fixed number of times.
Posted Content

Minimum Process Coordinated Checkpointing Scheme for Ad Hoc Networks

TL;DR: Performance analysis shows that the proposed minimum process coordinated checkpointing scheme for cluster based ad hoc routing protocols outperforms the existing related works and is a novel idea in the field.
Journal ArticleDOI

Evaluating Overheads of Integrated Multilevel Checkpointing Algorithms in Cloud Computing Environment

TL;DR: This paper presents a methodology for providing high availability to the demands of cloud's clients by integrating checkpointing feature with load balancing algorithms and making multilevel checkpoint to decrease checkpointing overheads.
References
More filters
Book ChapterDOI

Time, clocks, and the ordering of events in a distributed system

TL;DR: In this paper, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Journal ArticleDOI

Distributed snapshots: determining global states of distributed systems

TL;DR: An algorithm by which a process in a distributed system determines a global state of the system during a computation, which helps to solve an important class of problems: stable property detection.
Journal ArticleDOI

A survey of rollback-recovery protocols in message-passing systems

TL;DR: This survey covers rollback-recovery techniques that do not require special language constructs and distinguishes between checkpoint-based and log-based protocols, which rely solely on checkpointing for system state restoration.
Journal ArticleDOI

Checkpointing and Rollback-Recovery for Distributed Systems

TL;DR: In this article, the authors consider the problem of bringing a distributed system to a consistent state after transient failures, and propose a distributed algorithm to create consistent checkpoints, as well as a rollback-recovery algorithm to recover the system from transient failures.
Proceedings ArticleDOI

The performance of consistent checkpointing

TL;DR: It is argued that these measurements show that consistent checkpointing is an efficient way to provide fault tolerance for long-running distributed applications.
Related Papers (5)