Scalable is a Python framework for orchestrating containerized, distributed workflows on HPC systems. It integrates container lifecycle management, scheduler-aware resource provisioning, and a Dask-based execution model so multi-stage scientific workflows can run consistently at scale.
- Documentation
- Installation
- System Requirements
- Quick Start
- Usage
- Function Caching
- How to Contribute
- License
Full documentation is available at jgcri.github.io/scalable.
Install from PyPI:
pip install scalableInstall from source:
git clone https://github.com/JGCRI/scalable.git
pip install ./scalableIf your shell cannot find installed scripts (for example, scalable_bootstrap), add the relevant scripts directory to PATH.
- Scheduler: Slurm
- Local host tools: Docker
- HPC host tools: Apptainer
Platform guidance:
- Linux is recommended for bootstrapping.
- On Windows, Git Bash is recommended.
- On macOS, Terminal works as expected.
Scalable includes a bootstrap process that prepares a local/HPC work environment and required containers.
- Choose a local working directory.
- Run the bootstrap command.
- Follow interactive prompts.
cd <local_work_dir>
scalable_bootstrapAfter setup completes, the workflow environment is launched on the HPC side. From the work directory, start an interactive Python session or execute a script:
python3
python3 <filename>.pyBootstrap performs multiple SSH operations. For best reliability and usability, configure key-based passwordless SSH authentication in advance.
At runtime, create a cluster, register container targets, scale workers, and submit functions.
from scalable import SlurmCluster, ScalableClient
cluster = SlurmCluster(
queue="slurm",
walltime="02:00:00",
account="GCIMS",
interface="ib0",
silence_logs=False,
)cluster.add_container(
tag="gcam",
cpus=10,
memory="20G",
dirs={"/qfs/people/user/work/gcam-core": "/gcam-core", "/rcfs": "/rcfs"},
)
cluster.add_container(
tag="stitches",
cpus=6,
memory="50G",
dirs={"/qfs/people/user": "/user", "/rcfs": "/rcfs"},
)
cluster.add_container(
tag="osiris",
cpus=8,
memory="20G",
dirs={"/rcfs/projects/gcims/data": "/data", "/qfs/people/user/test": "/scratch"},
)cluster.add_workers(n=3, tag="gcam")
cluster.add_workers(n=2, tag="stitches")
cluster.add_workers(n=3, tag="osiris")def func1(param):
import gcam
return gcam.__version__
def func2(param):
import stitches
return stitches.__version__
def func3(param):
import osiris
return osiris.__version__
client = ScalableClient(cluster)
fut1 = client.submit(func1, "gcam", tag="gcam")
fut2 = client.submit(func2, "stitches", tag="stitches")
fut3 = client.submit(func3, "osiris", tag="osiris")cluster.remove_workers(n=2, tag="gcam")
cluster.remove_workers(n=1, tag="stitches")
cluster.remove_workers(n=3, tag="osiris")Scalable provides a cacheable decorator to avoid recomputing expensive function calls across retries or interrupted runs.
from scalable import cacheable
@cacheable(return_type=str, param=str)
def func1(param):
import gcam
return gcam.__version__
@cacheable(return_type=str, recompute=True, param=str)
def func2(param):
import stitches
return stitches.__version__
@cacheable
def func3(param):
import osiris
return osiris.__version__For reliable behavior, explicitly specify argument and return types whenever possible.
Contributions are welcome.
- Fork the repository.
- Create a feature branch.
- Implement changes and add or update tests.
- Open a pull request with a clear summary and rationale.
For bug reports, feature requests, and support questions, open an issue:
https://github.com/JGCRI/scalable/issues
This project is licensed under the terms in LICENSE.md.
