catwalk

On-demand file staging system

SYNOPSIS

scout -catwalk DIRS command_and_args scrun -catwalk=DIRS command_and_args

DESCRIPTION

Catwalk is an on-deman file staging syste. Here, "on-deman" means there is no need of file staging description bu user nor program. When a program tries to opean a file to read, then the file is automatically copied from the front-end host, where the scout or scrun command is invoked, to the compute host(s).

	% scout -g HOSTGROUP
	% cd DIR
	% cat > foo.dat
	% scout -catwalk DIR cat foo.dat

In the above example, the foo.dat file is firstly created on the front-end host and the cat command running on compute hosts in the SCOUT environment can read the file.

The files staged in or out must be the file name, not in an absolute or a relative path form. Conversely speaking, the file name should not contain any slash (/) character. This is because for files not to be mistakenly staged in or out. Thus, the files located on the current directory having the prefix of "./" are not considerd to be the object files of stage-in nor stage-out.

STAGE-IN

When a user program or a command calls the open() function in raed mode, and the file is unable to find (the open() function returns with the errno of ENOENT) then Catwalk tries to find the file on the host where the scout or scrun command is invoked and the files are serched in the directories specified with the catwalk option. The search order is left to right of the colon (:) separated list of directory names. When the file is found, the file is remotely sopied and then the open() function returns with the file descriptor of the file. The files located on compute hosts and staged in are deleted by Catwalk when the program or command terminates.

STAGE-OUT

The files which user program opens in write mode or read-write mode, is marked as the target of Catwalk stage-out. The files which is being staged in but modified after then, will also be staged out. The stage-out of Catwalk takes place when the user program (or command) which is the target program of the scout or scrun command is terminated. However, the files which are marked as staged-out but deleted in the program, then the stage-out of the deleted files are simply ignored.

When there exists the file having the same name with the file to be staged out, then the staged-out file name is suffixed with the '@' symbol, and the hostname of the compute node who originally create the file. If the stage-out takes place several times, then the filename is further suffixed with the '#' symbol and number, so that the staged-out files are not to be overwritten. In this case, the suffix number is to identify with the others, and it is not guaranteed that the files having the same name are the stage-out files from the same job.

SYSTEMCALL HOOKS

The on-demand nature of Catwalk is implemented with using the LD_PRELOAD mechanism of Linux, and there are Catwalk hook functions to be pre-loaded and started when a user program is invoked. Currently open(), creat(), stat(), fopen() and the family of exec() functions are hooked. Because of the restriction of the LD_PRELOAD mechanism of Linux, programs writenn in C++ or Fortarn can not work with Catwalk.

ENVIRONMENT VARIABLES

When the catwalk option values is terminated with the colon (:) symbol, then that values of the CATWALKPATH environment variable is appended to the option value. If the option value string consists of the colon (:) symbol only, ":", then the entire option value is replaced by value of the CATWALKPATH environment variable.

SEE ALSO

catwalk(1), scout(1), scrun(1), catwalk-romio(7).
CREDIT
This document is a part of the SCore cluster system software developed at PC Cluster Consortium, Japan. Copyright (C) 2003-2008 PC Cluster Consortium.