Docker Tutorial: SLURM-Compatible Workflows with Docker, Singularity, and R/Python
Learn how to build a reproducible container, publish it to a registry, and execute it on a SLURM-managed HPC system with SingularityCE. Examples cover both Python and R stacks with an emphasis on portability, determinism, and thread hygiene on shared nodes.
1. Why containers for HPC?
Containers bundle your runtime (system libraries, compilers, interpreters) with your application dependencies. On HPC systems this prevents dependency drift and avoids requesting administrator installs. The recommended pattern is:
- Build locally with Docker.
- Publish to a registry such as Docker Hub.
- Pull and run on the cluster with SingularityCE under SLURM.
This workflow yields a single, auditable source of truth for your environment.
2. Prerequisites
- Local machine with Docker installed.
- An account on a container registry (for example Docker Hub).
- HPC access with SingularityCE available (for example
module load singularity). - SLURM for scheduling (
sbatch,srun, and related commands). - If your development machine is ARM based (for example Apple Silicon), cross-build for
linux/amd64so the image runs on typical x86_64 HPC nodes.
3. Authoring the Dockerfile
Below are the major sections of a Dockerfile suitable for Python or R geospatial and scientific workloads. Adjust packages to match your use case.
3.1 Base image selection
Python-first:
FROM python:3.12-slim
R-first (binary CRAN via r2u):
FROM rocker/r2u:4.4
Notes:
- Slim images are small and require you to add only what you need.
- For predominantly R workflows, starting from a Rocker image reduces compilation friction.
3.2 Noninteractive APT
ENV DEBIAN_FRONTEND=noninteractive
Prevents prompts that can block unattended builds.
3.3 System dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libgomp1 ca-certificates \
libgdal-dev gdal-bin libproj-dev proj-bin \
libgeos-dev libspatialindex-dev \
libexpat1 libexpat1-dev && \
rm -rf /var/lib/apt/lists/*
libgomp1provides the OpenMP runtime used by BLAS and machine learning libraries.- GDAL, PROJ, and GEOS support
sf,terra, and many Python GIS libraries.
3.4 Python packages
RUN python -m pip install --upgrade --no-cache-dir pip setuptools wheel
RUN pip install --no-cache-dir \
numpy pandas pyarrow \
shapely>=2 pyproj rtree \
fiona rasterio geopandas \
scikit-learn lightgbm xgboost scikit-optimize \
joblib tqdm requests pyimpute matplotlib
Prefer wheels to avoid compilers at build time. Pin exact versions via requirements.txt if needed.
3.5 R packages (optional)
If using rocker/r2u, most CRAN packages are available as Debian binaries:
RUN apt-get update && \
apt-get install -y --no-install-recommends \
r-cran-data.table r-cran-dplyr r-cran-readr r-cran-ggplot2 \
r-cran-sf r-cran-terra r-cran-lwgeom r-cran-glue r-cran-pak \
r-cran-renv && \
rm -rf /var/lib/apt/lists/*
On a generic R base image, use pak or install2.r with the Posit Package Manager binary mirror for fast installs.
3.6 Threading and geospatial defaults
ENV OMP_NUM_THREADS=2 \
OPENBLAS_NUM_THREADS=2 \
MKL_NUM_THREADS=2 \
NUMEXPR_NUM_THREADS=2 \
XGBOOST_NUM_THREADS=1 \
PROJ_NETWORK=OFF \
GDAL_CACHEMAX=512
- Caps threads to prevent oversubscription on shared nodes.
- Disables PROJ network fetches for reproducibility.
- Override these as needed in SLURM jobs.
3.7 Working directory
WORKDIR /work
Mount project code and data here at runtime.
4. Building and testing locally
On your development machine:
export IMG=scientific-pipeline
export TAG=0.1.0
export REPO=<dockerhub_user>/$IMG
# one-time: buildx builder
docker buildx create --use
# Cross-build for HPC architecture (usually linux/amd64)
docker buildx build \
--platform linux/amd64 \
-t $REPO:$TAG \
-t $REPO:latest \
--load .
# Quick sanity check
docker run --rm --platform linux/amd64 $REPO:$TAG \
python -c "import geopandas; print('Environment OK')"
5. Publishing to Docker Hub
docker login # enter credentials
docker push $REPO:$TAG
docker push $REPO:latest
Optionally record the immutable digest:
docker buildx imagetools inspect $REPO:$TAG
6. Retrieving the image on the HPC with SingularityCE
On the login node:
module load singularity
export IMG=scientific-pipeline
export TAG=0.1.0
export REPO=<dockerhub_user>/$IMG
mkdir -p $PWD/.sif
singularity pull --dir $PWD/.sif docker://$REPO:$TAG
# => ./.sif/${IMG}_${TAG}.sif
For private repositories:
export SINGULARITY_DOCKER_USERNAME=<dockerhub_user>
export SINGULARITY_DOCKER_PASSWORD=<access_token_or_password>
singularity pull --dir $PWD/.sif docker://$REPO:$TAG
Test the image:
singularity exec ./.sif/${IMG}_${TAG}.sif \
python -c "import geopandas; print('Singularity OK')"
7. Running containers under SLURM
7.1 Thread hygiene via environment forwarding
Use SINGULARITYENV_ prefixes so variables appear inside the container:
export SINGULARITYENV_OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-2}
export SINGULARITYENV_OPENBLAS_NUM_THREADS=$SINGULARITYENV_OMP_NUM_THREADS
export SINGULARITYENV_MKL_NUM_THREADS=$SINGULARITYENV_OMP_NUM_THREADS
export SINGULARITYENV_NUMEXPR_NUM_THREADS=$SINGULARITYENV_OMP_NUM_THREADS
export SINGULARITYENV_XGBOOST_NUM_THREADS=$SINGULARITYENV_OMP_NUM_THREADS
export SINGULARITYENV_PROJ_NETWORK=OFF
export SINGULARITYENV_GDAL_CACHEMAX=1024
7.2 Batch script template (CPU)
#!/bin/bash -l
#SBATCH --job-name=analysis
#SBATCH --partition=compute
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=08:00:00
#SBATCH --output=logs/%x-%j.out
#SBATCH --error=logs/%x-%j.err
module purge
module load singularity
SIF=./.sif/scientific-pipeline_0.1.0.sif
PROJECT=/path/to/project # contains src/process.py or script.R
INPUTS=/path/to/inputs
OUTPUTS=/path/to/outputs
# Forward thread/env settings into container
export SINGULARITYENV_OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export SINGULARITYENV_OPENBLAS_NUM_THREADS=${SINGULARITYENV_OMP_NUM_THREADS}
export SINGULARITYENV_MKL_NUM_THREADS=${SINGULARITYENV_OMP_NUM_THREADS}
export SINGULARITYENV_NUMEXPR_NUM_THREADS=${SINGULARITYENV_OMP_NUM_THREADS}
export SINGULARITYENV_XGBOOST_NUM_THREADS=${SINGULARITYENV_OMP_NUM_THREADS}
export SINGULARITYENV_PROJ_NETWORK=OFF
export SINGULARITYENV_GDAL_CACHEMAX=1024
# Execute
singularity exec --cleanenv \
--bind "$PROJECT":/work/project,"$INPUTS":/work/inputs,"$OUTPUTS":/work/outputs \
"$SIF" \
bash -lc "python /work/project/src/process.py /work/inputs /work/outputs"
Replace the final command with Rscript if running an R pipeline.
7.3 Job arrays
N=$(find /path/to/inputs -maxdepth 1 -type f -name '*.geojson' | wc -l)
sbatch --array=0-$(($N-1))%16 run_container.sh
Inside your script, use ${SLURM_ARRAY_TASK_ID} to index the input list.
8. Reproducibility and provenance
- Pin versions: use
requirements.txtfor Python andrenv.lockfor R. Restore them during the Docker build for deterministic environments. - Record digests: store the image digest (sha256) alongside job outputs.
- Log runtime environment: at job start, record
sessionInfo()(R) orpip freeze(Python). For geospatial stacks loggdalinfo --versionand (in R)sf::sf_extSoftVersion(). - Architecture awareness: build for
linux/amd64unless your HPC documents a different architecture.
9. Common pitfalls (and remedies)
- Oversubscription: set thread environment variables based on
--cpus-per-task. - Mixed GDAL/PROJ builds: Python wheels may bundle GDAL while CLI tools use system GDAL. Test both during build and validation.
- Private registries: authenticate via
SINGULARITY_DOCKER_USERNAME/SINGULARITY_DOCKER_PASSWORDbefore pulling images. - Network-restricted nodes: pull on the login node and reference the local
.siffile in jobs. - Permissions: Singularity runs as your user by default; write to scratch or project directories you own.
10. Minimal end-to-end checklist
- Write a
Dockerfilewith required system and language packages. docker buildx build --platform linux/amd64 -t <repo>:<tag> --load .docker push <repo>:<tag>- On HPC:
singularity pull .sif docker://<repo>:<tag> - Submit a SLURM job that binds project paths and forwards thread environment variables.
This pattern is robust across diverse clusters and workflows. Tailor package lists and SLURM directives to match your project.