Attach Data as Volumes
When running containers in HPC or HTC environments like those at Jefferson Lab, containers are often ephemeral — meaning they are started fresh for each job and do not retain state or files between runs. This is a good thing: it improves reproducibility and allows the same container image to be shared across users and jobs.
However, ephemeral containers need access to external data — such as input files, shared software, or output locations. To do this safely and portably, you should mount host directories into the container at runtime.
Why Use Volume Mounts?
- Separation of concerns: Keep code in the container and data on the host.
- Preserve results: Anything not mounted into the container will be lost at the end of execution.
- Portability: Mounted paths (like
/group
,/work
,/volatile
, etc.) match common JLab conventions and are available across farm cluster nodes.
Common Mount Options
Option | Description |
---|---|
-v /path:/path | Mounts a directory directly. Add :ro for read-only. |
--mount | More explicit alternative; preferred for complex mounts. |
--userns=keep-id | Ensures file ownership inside the container matches your user ID outside. |
Example: Interactive Session with Mounted Directories
podman run --rm -it \
--userns=keep-id \
-v /group:/group:ro \
-v /work:/work \
-v /volatile:/volatile \
-v /cvmfs:/cvmfs:ro \
-v /u/home/$USER:/u/home/$USER \
jlab/base:alma9.5
This mounts:
- Shared project files (
/group
) - Writable scratch space (
/work, /volatile
) - Read-only software (
/cvmfs
) - Your home directory, preserving .bashrc, user configs, and shell history
⚠️ Opt to make your CUE_home
/u/home/$USER
directory available via /u/home/$USER. This explicitly avoids loading your ifarm .bashrc or .cshrc file. In order to keep the container environment reproducable, opt to write a specific.bashrc
if needed.
Planning for Separation of Code, Data, and Configuration
One of the most effective ways to maintain reproducibility and portability in containerized environments is to separate your concerns — that is, keep your code, data, and configuration files logically and physically distinct.
This separation allows your container to stay focused and minimal: it should contain only the tools and environment necessary to run your workflows, not the workflows themselves or the data they operate on.
🔹 Step-by-Step Thought Process
Question | Guideline |
---|---|
Where does your code live? | Build it into the container or a Git-tracked directory under /u/home/$USER . |
Where is your input data located? | Mount it from /work , /volatile , or /cache depending on access patterns and size. |
Where do you want output files to go? | Write to /volatile or /work — then archive to mass storage using jput . If using swif2 opt to use -output tags for direct tape deposition. |
Do you need custom shell configuration? | Provide a minimal .bashrc inside the container image or mount one explicitly. Avoid using the host .bashrc . |
How do you pass in run-time configuration (JSON, YAML, etc.)? | Use bind mounts or pass environment variables when running the container. Avoid baking config into the container image. |
Example Layout
Container Image:
- Installed tools (ROOT, Python, CMake)
- Default environment setup (
ENTRYPOINT
,ENV
)
Mounted Paths at Runtime:
/work/myproj/input/
→ Large input files, distilled root files, large models (read-only)/volatile/myproj/$USER/job123/
→ Scratch space for outputs (read-write)
🧠 Best Practices
- Treat containers like analysis appliances — don't clutter them with temporary files.
- Avoid surprises by not relying on host shell startup scripts (
.bashrc
,.cshrc
) unless intentionally mounted. - Use config files or env vars to make your workflow jobs reproducible and adjustable without rebuilding.
- Keep code in Git and data in
/work
or/cache
, not baked into container images. - Test interactively with mounted volumes before submitting batch jobs.
💡 A clean separation helps you scale — whether that means submitting thousands of Slurm jobs, running workflows on OSG, or debugging from your laptop. The more predictable your container environment is, the easier your scientific workflows will be to share and sustain.
When to Use --mount
Instead of -v
For scripts or reproducible workflows, the --mount
flag is preferred:
podman run --rm -it \
--userns=keep-id \
--mount type=bind,source=/work,target=/work \
--mount type=bind,source=/cvmfs,target=/cvmfs,readonly \
jlab/base:alma9.5
This is clearer and reduces errors in automated scripts.
Tips
- Avoid writing to /group. It is for not designed for output, but more for making and saving code.
- Use /volatile for large file output or temporary data.
- Never assume mounted paths exist inside the container unless you explicitly bind them.
- Always save desired output using
jput
to the mass storage system for your group. The /volatile location is periodically cleared, and you could loose your data if not saved properly.
ℹ️ Ephemeral containers are stateless by design. Bind-mounting host paths ensures your container can access and write data during execution while staying portable and clean across environments.