GlueX Rucio Namespace
Overview
In Rucio, the smallest unit of data is a file. Files are grouped into datasets. A single file can be part of multiple datasets. Both files and datasets follow an identical naming scheme composed of scope:name, called a Data Identifier (DID).
Scopes
GlueX data is divided into multiple scopes that categorize data by type:
raw: Raw data from the detectorrecon: Production reconstructed datacalib: Calibration-related dataana: Analysis-level data
Name Structure
The name part of a DID follows specific patterns depending on the scope:
Raw Data (raw scope)
/<RunPeriod-YYYY-MM>/rawdata/Run<run_num>/<filename>.evio
Reconstructed Data (recon scope)
/<RunPeriod-YYYY-MM>/recon/ver<version_number>/REST/<run_num>/<filename>.hddm
Calibration Data (calib scope)
/<RunPeriod-YYYY-MM>/calib/ver<version_number>/<calibration_trigger>/Run<run_num>/<filename>.evio
Analysis Data (ana scope)
Note: Only merged files are stored at the analysis level.
/<RunPeriod-YYYY-MM>/analysis/ver<version_number>/<reaction_name>/merged/<filename>.root
DID Hierarchy
GlueX Rucio implements a two-level hierarchy: datasets contain files. There is no hierarchy above datasets or below files.
Dataset Level
A dataset DID combines the scope with a path that goes up to (but excludes) the filename. The dataset represents a logical grouping of related files.
Example dataset DIDs:
raw:/RunPeriod-2019-11/rawdata/Run051730recon:/RunPeriod-2019-11/recon/ver02/REST/051730ana:/RunPeriod-2019-11/analysis/ver01/omegapi_prime/merged
File Level
A file DID includes the complete path including the filename.
Example file DIDs:
raw:/RunPeriod-2019-11/rawdata/Run051730/hd_rawdata_051730_000.eviorecon:/RunPeriod-2019-11/recon/ver02/REST/051730/dana_rest_051730_000.hddmana:/RunPeriod-2019-11/analysis/ver01/omegapi_prime/merged/tree_omegapi_prime.root
Relationship Summary
- Each dataset contains one or more files
- Each file can belong to multiple datasets
- There are no containers or collections above the dataset level
- The path-like structure in names is purely organizational—the actual hierarchy is only dataset → files
Naming Conventions
<RunPeriod-YYYY-MM>: Run period identifier with year and month (e.g.,RunPeriod-2019-11)<run_num>: Six-digit run number (e.g.,051730)<version_number>: Two-digit version number (e.g.,02)<filename>: Base filename without path<calibration_trigger>: Specific calibration trigger type<reaction_name>: Physics reaction or analysis name