Skip to content

JGCRI/scalable

Repository files navigation

Scalable logo

Scalable

PyPI Python Docs

Scalable is a Python framework for orchestrating containerized, distributed workflows on HPC systems. It integrates container lifecycle management, scheduler-aware resource provisioning, and a Dask-based execution model so multi-stage scientific workflows can run consistently at scale.

Table of Contents

Documentation

Full documentation is available at jgcri.github.io/scalable.

Installation

Install from PyPI:

pip install scalable

Install from source:

git clone https://github.com/JGCRI/scalable.git
pip install ./scalable

If your shell cannot find installed scripts (for example, scalable_bootstrap), add the relevant scripts directory to PATH.

System Requirements

  • Scheduler: Slurm
  • Local host tools: Docker
  • HPC host tools: Apptainer

Platform guidance:

  • Linux is recommended for bootstrapping.
  • On Windows, Git Bash is recommended.
  • On macOS, Terminal works as expected.

Quick Start

Scalable includes a bootstrap process that prepares a local/HPC work environment and required containers.

  1. Choose a local working directory.
  2. Run the bootstrap command.
  3. Follow interactive prompts.
cd <local_work_dir>
scalable_bootstrap

After setup completes, the workflow environment is launched on the HPC side. From the work directory, start an interactive Python session or execute a script:

python3
python3 <filename>.py

SSH Recommendation

Bootstrap performs multiple SSH operations. For best reliability and usability, configure key-based passwordless SSH authentication in advance.

Usage

At runtime, create a cluster, register container targets, scale workers, and submit functions.

1. Create a cluster

from scalable import SlurmCluster, ScalableClient

cluster = SlurmCluster(
    queue="slurm",
    walltime="02:00:00",
    account="GCIMS",
    interface="ib0",
    silence_logs=False,
)

2. Register container targets

cluster.add_container(
    tag="gcam",
    cpus=10,
    memory="20G",
    dirs={"/qfs/people/user/work/gcam-core": "/gcam-core", "/rcfs": "/rcfs"},
)
cluster.add_container(
    tag="stitches",
    cpus=6,
    memory="50G",
    dirs={"/qfs/people/user": "/user", "/rcfs": "/rcfs"},
)
cluster.add_container(
    tag="osiris",
    cpus=8,
    memory="20G",
    dirs={"/rcfs/projects/gcims/data": "/data", "/qfs/people/user/test": "/scratch"},
)

3. Scale workers

cluster.add_workers(n=3, tag="gcam")
cluster.add_workers(n=2, tag="stitches")
cluster.add_workers(n=3, tag="osiris")

4. Submit functions

def func1(param):
    import gcam
    return gcam.__version__


def func2(param):
    import stitches
    return stitches.__version__


def func3(param):
    import osiris
    return osiris.__version__


client = ScalableClient(cluster)

fut1 = client.submit(func1, "gcam", tag="gcam")
fut2 = client.submit(func2, "stitches", tag="stitches")
fut3 = client.submit(func3, "osiris", tag="osiris")

5. Scale down when complete

cluster.remove_workers(n=2, tag="gcam")
cluster.remove_workers(n=1, tag="stitches")
cluster.remove_workers(n=3, tag="osiris")

Function Caching

Scalable provides a cacheable decorator to avoid recomputing expensive function calls across retries or interrupted runs.

from scalable import cacheable


@cacheable(return_type=str, param=str)
def func1(param):
    import gcam
    return gcam.__version__


@cacheable(return_type=str, recompute=True, param=str)
def func2(param):
    import stitches
    return stitches.__version__


@cacheable
def func3(param):
    import osiris
    return osiris.__version__

For reliable behavior, explicitly specify argument and return types whenever possible.

How to Contribute

Contributions are welcome.

  1. Fork the repository.
  2. Create a feature branch.
  3. Implement changes and add or update tests.
  4. Open a pull request with a clear summary and rationale.

For bug reports, feature requests, and support questions, open an issue:

https://github.com/JGCRI/scalable/issues

License

This project is licensed under the terms in LICENSE.md.

About

Python framework for orchestrating containerized, distributed workflows on Slurm-based HPC systems with scheduler-aware scaling, container lifecycle management, and Dask-powered execution.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors