Skip to main content

Attach Data as Volumes

When running containers in HPC or HTC environments like those at Jefferson Lab, containers are often ephemeral — meaning they are started fresh for each job and do not retain state or files between runs. This is a good thing: it improves reproducibility and allows the same container image to be shared across users and jobs.

However, ephemeral containers need access to external data — such as input files, shared software, or output locations. To do this safely and portably, you should mount host directories into the container at runtime.


Why Use Volume Mounts?

  • Separation of concerns: Keep code in the container and data on the host.
  • Preserve results: Anything not mounted into the container will be lost at the end of execution.
  • Portability: Mounted paths (like /group, /work, /volatile, etc.) match common JLab conventions and are available across farm cluster nodes.

Common Mount Options

OptionDescription
-v /path:/pathMounts a directory directly. Add :ro for read-only.
--mountMore explicit alternative; preferred for complex mounts.
--userns=keep-idEnsures file ownership inside the container matches your user ID outside.

Example: Interactive Session with Mounted Directories

podman run --rm -it \
--userns=keep-id \
-v /group:/group:ro \
-v /work:/work \
-v /volatile:/volatile \
-v /cvmfs:/cvmfs:ro \
-v /u/home/$USER:/u/home/$USER \
jlab/base:alma9.5

This mounts:

  • Shared project files (/group)
  • Writable scratch space (/work, /volatile)
  • Read-only software (/cvmfs)
  • Your home directory, preserving .bashrc, user configs, and shell history

⚠️ Opt to make your CUE_home /u/home/$USER directory available via /u/home/$USER. This explicitly avoids loading your ifarm .bashrc or .cshrc file. In order to keep the container environment reproducable, opt to write a specific .bashrc if needed.

Planning for Separation of Code, Data, and Configuration

One of the most effective ways to maintain reproducibility and portability in containerized environments is to separate your concerns — that is, keep your code, data, and configuration files logically and physically distinct.

This separation allows your container to stay focused and minimal: it should contain only the tools and environment necessary to run your workflows, not the workflows themselves or the data they operate on.


🔹 Step-by-Step Thought Process

QuestionGuideline
Where does your code live?Build it into the container or a Git-tracked directory under /u/home/$USER.
Where is your input data located?Mount it from /work, /volatile, or /cache depending on access patterns and size.
Where do you want output files to go?Write to /volatile or /work — then archive to mass storage using jput. If using swif2 opt to use -output tags for direct tape deposition.
Do you need custom shell configuration?Provide a minimal .bashrc inside the container image or mount one explicitly. Avoid using the host .bashrc.
How do you pass in run-time configuration (JSON, YAML, etc.)?Use bind mounts or pass environment variables when running the container. Avoid baking config into the container image.

Example Layout

Container Image:

  • Installed tools (ROOT, Python, CMake)
  • Default environment setup (ENTRYPOINT, ENV)

Mounted Paths at Runtime:

  • /work/myproj/input/ → Large input files, distilled root files, large models (read-only)
  • /volatile/myproj/$USER/job123/ → Scratch space for outputs (read-write)

🧠 Best Practices

  • Treat containers like analysis appliances — don't clutter them with temporary files.
  • Avoid surprises by not relying on host shell startup scripts (.bashrc, .cshrc) unless intentionally mounted.
  • Use config files or env vars to make your workflow jobs reproducible and adjustable without rebuilding.
  • Keep code in Git and data in /work or /cache, not baked into container images.
  • Test interactively with mounted volumes before submitting batch jobs.

💡 A clean separation helps you scale — whether that means submitting thousands of Slurm jobs, running workflows on OSG, or debugging from your laptop. The more predictable your container environment is, the easier your scientific workflows will be to share and sustain.

When to Use --mount Instead of -v

For scripts or reproducible workflows, the --mount flag is preferred:

podman run --rm -it \
--userns=keep-id \
--mount type=bind,source=/work,target=/work \
--mount type=bind,source=/cvmfs,target=/cvmfs,readonly \
jlab/base:alma9.5

This is clearer and reduces errors in automated scripts.

Tips

  • Avoid writing to /group. It is for not designed for output, but more for making and saving code.
  • Use /volatile for large file output or temporary data.
  • Never assume mounted paths exist inside the container unless you explicitly bind them.
  • Always save desired output using jput to the mass storage system for your group. The /volatile location is periodically cleared, and you could loose your data if not saved properly.

ℹ️ Ephemeral containers are stateless by design. Bind-mounting host paths ensures your container can access and write data during execution while staying portable and clean across environments.